the doctor doesn’t like the phrase “big data” or the “big data” craze. First of all, as he has said time and time again, we’ve always had more data than we could process on a single machine or cluster and more data than we could process in the time we want to process it in. Secondly, and most importantly, just like the cloud is filled with hail, big data is filled with big disasters waiting to happen.
As the author of the article on on the big dangers of ‘big data’ astutely points out, there are limits to the analytic power of big data and quantification that circumscribe big data’s capacity to drive progress. Why? First of all, as the author also points out, bad use of data can be worse than no data at all. As an example, he cites a 2014 New York Times Piece on Yahoo and it’s Chief Executive which demonstrated the unintended consequences of trying to increase employee drive and weed out the chaff by way of scorecard-based quarterly performance reviews which limited how many people on a team could get top ratings. Instead of promoting talent and driving talented people together, it split them up because, if you were surrounded by under performers, you were sure to get the top score – but if you were surrounded by equals, you weren’t.
This is just one example of the unintended consequences of trying to be too data driven. Another example is using average call time in a customer support centre versus number of calls to close a ticket as a measure of call centre agent performance. If an agent is measured on how long she spends on the phone on average, she is going to try to take shortcuts to solve a customer’s problem instead of getting to the root cause. For example, if your Windows PC keeps locking up every few days and a re-boot fixes it, you will be told to proactively reboot every 24 hours just to get you off the phone. But that doesn’t necessarily fix the problem or guarantee that you will not have another lock-up (if the lock-up is a certain combination of programs opened at the same time that refuse to share a peripheral device, for example). As a result, the customer will end up calling back. Or, if she can’t solve your problem, you will be switched to another agent who “knows the system better”. That’s poor customer support, and all because you’re keeping track of the average time of every call and computing averages by rep and department.
Big data will let us compute more accurate economic forecasts, demand trends, process averages, and so on, but, as the author keenly points out, many important questions are simply not amenable to quantitative analysis, and never will be. The examples of where your child should go to college, how to punish criminals, and whether or fund the human genome project are just a few examples. Even more relevant are product design queries. 34% of users want feature A, 58% want feature B, and 72% want feature C, but how many want features A and B or A and C or B and C or all three features? And how many will be put off if the product also contains a feature they don’t want, is too confusing due to too many frivolous features, or doesn’t have all important feature D that you didn’t ask about, but now have to have because your competitor does?
And, even more important, McKinsey, which in 2011 claimed that we are on the cusp of a tremendous wave of innovation, productivity and growth … all driven by big data had to recently admit that there is no empirical evidence of a link between data intensity … and productivity in specific sectors. In other words, despite all of the effort put into big data projects over the last few years, none have yielded any results that are conclusively beyond results that would have been achieved without big data.
And, most importantly, as someone who has studied chaotic dynamical systems theory, the doctor can firmly attest to the fact that the author is completely correct when he says understanding the complexity of social systems means understanding that conclusive answers to causal questions in social systems will always remain elusive. We may be able to tease out strong correlations, but correlation is not causation. (And if you forget this, you better go back and take another read through Pinky and the Brain’s lesson on statistics.)