Big data is too big for most organizations. It’s not because there is too much raw material to deal with, nor is it because of a lack of applicable tools. It doesn’t even have anything to do with the inclusion of unstructured data (which doesn’t really exist but that’s a topic for blog posts that I’ve already written). No, big data’s too big because of what’s possible.
The possibilities to slice & dice are virtually endless. Numerical data is bombarding organisations from within and without. Text based data (I’m calling it data because until it’s put in context it’s not really information) is being generated almost at the speed of thought, in quantities heretofore unimagined. Every transaction, every search request, every Tweet, and every Like generates an entry in some repository that organizations may or may not be aware of, have control of, or have access to.
That all sorta sucks, but …
But what really sucks is that organizations jump onto the Big Data bandwagon with not an iota of a clue as to what they want to do with it. Boys and girls, Uncle Chris is gonna try and wise you up.
Like any other project that involves expending time and money, you need to know what you want to achieve when you’re done. E.g.: Many years ago I worked on a service management reporting project for a big organization providing managed network services to an even bigger organization. It came down to this … approximately 17,000,000 rows/day of raw data were collected, $pooploads/mo of revenue depended on meeting SLA targets, unmet SLA resulted in pooploads-alot being lost. The specific metrics and their data sources were identified prior to spending a dime on tools. My point is, you need to figure out what the business requirements are. That was true in 2002 and it is true today.
Much has stayed the same, and some has changed over the last 10+ years. We’ve even got some new stuff we can play around with thanks to social media, text analytics, sentiment analysis, etc. But knowing at least a few of the questions we’re trying to answer, before actually doing something, is still a valid and necessary first step.
It’s really cool that we can now ask “How does [demographic of choice] feel about our support organization?” in addition to asking about how many units of blue-widget-A we sold last quarter in the mountain time zone north of the 49th parallel. But before we ask the question we need to know to ask it and we need to know what we’re gonna do if we don’t like the answer (we should also have a social media strategy in place). We also need to know niggling little details like where the data is, whether or not we can access the data, and whether or not the data is reliable (whatever that means). Oh, we should also have some sort of governance in place to deal with all that personal and payment data we’re collecting, storing, massaging, analyzing, and interpreting to generate more profits than ever before.
I’d like to end today’s sermon with another little story …
Back in 2004 I was a project manager at a municipality. One of my periodic tasks was to compile the results detailing uptake of certain web-enabled municipal services related to planning and development. Each month I would get the results from the various sub departments, enter them into my fancy-schmancy reporting tool, compare the numbers against the projections, and then present them at a monthly meeting. We used a standard red-black-green thingy and it was all so easy. Easy until the dude in charge asked me if they were supposed to do anything about the red (bad) numbers. My question, and the take-away from this anecdote, to him was “If you’re not going to address the issues highlighted, why are you spending time and money on this?”
Big data is full of big possibilities. However, before you jump in make sure you have a plan. Understand what it is you’re trying to achieve. Have a plan for how you’re going to react to negative results as well as positive. Know that you won’t figure it all out on your first attempt, but that’s okay because a cool thing about analytics is that the more you play, the more you learn and then you discover more possibilities.
The bottom line is that if you don’t know what you’re trying to accomplish or what questions to ask, it makes no difference if you’ve got a couple gigs of data or multiple petabytes of data.