There’s No Such Thing As BIG Data in Business

I’m getting sick of all this talk about “BIG Data” which is nothing but Bullshit In Guise. The marketing mania has gotten more ludicrous than Ludacris so I’m not sure where to begin at this point, but let’s start with the basics.

  1. Data has always been big
    We’ve always had more data than we can process in real-time, or even in the time window that we want it processed. ALWAYS. Since we’ve always been able to store more data in physical / external storage than we can store in memory, we’ve always had more data than we can process in “real” time. Quod Erat Demonstrandum.
  2. In Business, Big has always been meaningless.
    We’re not doing protein folding, climate modelling, nuclear simulations, supercollider data interpolation, cosmological computations, or even trying to beat Deep Blue at Chess. We don’t need Deep Thought, HAL, or even The Architect to solve an average business problem which can probably be solved on the iPad 3 most of you are now carrying around.
    The reality is that even though you may have 100 Million transactions in your ERP, you don’t need to analyze them all at once. Analyzing all of your spend at once is akin to comparing your DVD Player to the kitchen sink to an apple. Does that make any sense? (It doesn’t. And if it does to you, please seek professional help.) Real insight comes from analyzing heterogeneous and related data, possibly through federation, and not from throwing everything into a number cruncher to see what comes out. You wouldn’t ask for the average temperature on earth over the course of a year would you? (And that’s exactly what you’re asking for when you ask for the average transaction size across your business which will include a $4.99 ream of paper to refill the printer to a $200,000,000 acquisition of a new steel plant.)
  3. If you’re smart about your technology acquisition,
    you can already analyze and drill-down into tens of millions of transactions in real time on an average quad-core laptop with 8GB of memory. You will have to get a product coded in C/C++ to take full advantage of the 128 Billion Instructions per second you can get on an Intel Core i7 2600K (as something programmed in an interpretive language like Ruby on Rails will go through so many layers of translation that you’ll be lucky to get 128 Million Instructions per second dedicated to your computations), but the reality is that we now have way more speed than most platforms can take advantage of because, in our quest to not only build cross-platform apps, but teach people to code as quickly as possible, we’ve lost the art of lower-level coding and optimizing an analytics engine to be as fast as possible.

There are “big” data problems, but they don’t exist in business. They exist in some of the areas I mentioned earlier, and that’s why we’re working on exa-flop super-computers, but not in business. Furthermore, your data problems scale in comparison to the data problems coming down the pipe with DOME, a collaboration between IBM and Astron to build a next generation radio telescope, known as the Square Kilometer Array (SKA), that will collect more data in a day than currently exists across the entire internet, factoring in the massive amounts of adult entertainment and multi-media content hosted by public file-share servers. You don’t have big data problems, and if some analyst, consulting, or technology firm tries to tell you that you do, they are talking out of their donkey.

And just so you know, if you have data problems, the cloud, which is NOT a magic mirror (but still full of sweet fluffy dreams for those who choose to join Coleridge in Xanadu), is NOT going to solve your problem. Pushing your data out to random servers on the internet with no IO, no bandwidth, and no computational guarantees is only going to exacerbate your situation. (The Cloud is cheap because most of the servers have less power than the iPad 3 and most of the low-cost providers give you no performance guarantees. If you want performance, power, and pipe – be prepared to pay three to thirty times as much [relative to a server’s useful life] as just buying a high-end server and sticking it in the janitor’s closet where you just happen to have a fiber feed. [A surpising number of small and mid-size non-technical business have their network feeds going into what should be the janitor’s closet because they think it’s a fit place for a server. Why, I don’t know. But they do!])