Daily Archives: April 28, 2014

Big Data = Big Mistake

FT.com recently published a great article on Big Data that asked Are We Making a Big Mistake which contains the best description SI has seen yet for Big Data: Big Mistake!

Why? Because, even though there are times we might want correlation to be causation (because then we could put an end to IE once and for all), it is not, never was, and never will be. never, Ever, EVER!* And, as pointed out in the article, just because a correlation algorithm works great for predicting trends, such as the spread of influenza, three years in a row, this doesn’t mean it’s going to work well the fourth year. Randomly identified statistical patterns in data are just that — randomly identified statistical patterns in data.

The Google example in the article is a huge example of how big data can fail in a massive, embarrassing way. In Nature 457, published 19 February 2009, Google published a paper entitled detecting influenza epidemics using search engine query data that detailed how they were able to track the spread of influenza across the US more quickly than the Centers for Disease Control and Prevention (CDC). Using a big data algorithm that detected a correlation between what people searched for and and whether they had flu systems, Google was apparently able to track the spread of influenza with only a day’s delay, compared to the week or more it took the CDC to assemble a picture based on reports from doctors’. This theory free approach worked for four years, and then failed spectacularly in 2013 when it drastically over-estimated peak flu levels, as chronicled in this article on When Google Got Flu Wrong over on Nature.com.

To put the issue of correlation vs causation into terms everyone can understand, if correlation was causation, Microsoft would be on trial as an accomplice to felony murder in every state in the United States, since the declining usage of internet explorer directly correlates with the declining murder rate in the US:

In other words, if correlation was causation, then using Internet Explorer invokes violent tendencies which leads to murder, and its continued existence is criminal.**

This is the problem with big data today. Everyone is using it to try and detect potentially useful correlations, instead of trying to support or disprove useful, actionable, theories. Why? Because, as the FT.com article states, figuring out what causes what is hard, and some would even claim it to be impossible.

Correlation might work in the short term, as it did for Google that was able to predict the spread of influenza for a few years, but it always fails in the long term. And if you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down. Just like a stock market trading algorithm, it might work for a year, a month, a week, a day, or a minute. You just don’t know.

That’s why relying on correlation-based big-data algorithms is a big mistake. While they will give you interesting patterns to examine, relying on them will lead you down a dark and winding road that leads to the edge of a deep canyon (that you are aren’t going to see until you fall in). Unless you can come up with a reasonable theory and support it with the data, it’s just an interesting pattern — and you should continue on your merry way until you find an interesting pattern you can actually explain unless you too want to end up with egg on your face.

That’s why Sourcing Innovation Still Prefers Big Brains to Big Data, and likely always will. We might be slaves to the corporations in the continuum, but that doesn’t mean we have to be slaves to stupidity.

* Everyone should know by now that correlation is not causation given that Pinky and the Brain gave you all a great Lesson in Statistics six years ago (when they were still in the employ of a certain Burlington sourcing provider …)

** It’s distribution was criminal for a while when Microsoft tried to create a browser monopoly by embedding it in the Operating System in a way that led Windows users to believe there was no other choice, as monopolies are illegal in many countries, but, I’m sorry to say, the continued existence of IE is not criminal, just sad and frustrating.