Decisions Should be Data-Derived – But They Should Not Be Big Data Driven!

In our recent post where we noted that it’s nice to see CNN run a piece that says big data is big trouble we noted that big data is big danger because more data does not automatically translate into better decisions. Better data translates into better decisions. And often that better data comes in the form of a small set of focussed data. For example, if one is trying to determine the right set of features to include in the next version of a product, the best data points are those that represent the desires of your best current customers who are most likely to buy the product. This is especially true if the most profitable market segment are enterprise business customers that buy thousands of licenses or units. If you only have a few dozen of these customers, these few dozen data points are more relevant than thousands of data points you’d get from a mass-market survey which would likely include hundreds of data points from customers who are only vaguely interested in your product (and who would likely never buy it).

Data does matter. But only the right data matters. That’s why only companies in the top-third of their industry in the use of data-driven decision making are 5% more productive and 6% or profitable than their competitors (as per “an introduction to data-driven decisions”. If it was just a matter of lots of data, then all companies would be more productive and half would be noticeably more profitable than their peers.

So how do you know if the data is good? Ask the right questions. In the HBR piece, the author lists six key questions that should be asked before acting on any data:

  1. What is the data source?
  2. How well does the data sample represent the population?
  3. Does the data distribution include outliers? Do they affect the results?
  4. What assumptions are behind the analysis? Are there conditions that would render the assumptions and model invalid?
  5. What were the reasons behind selecting the data and approach?
  6. How likely is it that independent variables are actually causing changes in the dependent variable?

And the answers that are received should be relevant to the problem at hand. For example, if we go back to our software / hand-held device example, the answers received should be along the lines of:

  1. Business Customer Surveys
  2. Over 70% of the organization’s largest accounts are represented
  3. Some small customers are included as well, but they are less than 10% of respondents and do not affect the results
  4. The assumptions are that the largest accounts provide the most relevant data. Currently, major account satisfaction is good and the data can be relied on so there are no current conditions that would affect assumptions.
  5. Large corporate customers represent over 60% of the company’s profit, so focussing on their needs first was the rationale.
  6. The surveys were designed to minimize the impact of independent variables, so the likelihood is low.

In this situation, you know the data is good, the approach is good, and the assumptions are relatively sound and you can likely count on the results. And, more importantly, the organization should act on them because it’s likely that any frequent correlation in the data indicates a causal hypothesis (if you add the indicated features, then the current customer base will buy the next version) and the benefits outweigh the risk (as a sufficient sales volume will cover the R&D costs).

And, just like the HBR article says, you don’t even have to like math to make the right decision. (Although there’s no reason not to like math.)

It’s Nice To See CNN Run a Piece that Says Big Data is Big Trouble

the doctor doesn’t like the phrase “big data” or the “big data” craze. First of all, as he has said time and time again, we’ve always had more data than we could process on a single machine or cluster and more data than we could process in the time we want to process it in. Secondly, and most importantly, just like the cloud is filled with hail, big data is filled with big disasters waiting to happen.

As the author of the article on on the big dangers of ‘big data’ astutely points out, there are limits to the analytic power of big data and quantification that circumscribe big data’s capacity to drive progress. Why? First of all, as the author also points out, bad use of data can be worse than no data at all. As an example, he cites a 2014 New York Times Piece on Yahoo and it’s Chief Executive which demonstrated the unintended consequences of trying to increase employee drive and weed out the chaff by way of scorecard-based quarterly performance reviews which limited how many people on a team could get top ratings. Instead of promoting talent and driving talented people together, it split them up because, if you were surrounded by under performers, you were sure to get the top score – but if you were surrounded by equals, you weren’t.

This is just one example of the unintended consequences of trying to be too data driven. Another example is using average call time in a customer support centre versus number of calls to close a ticket as a measure of call centre agent performance. If an agent is measured on how long she spends on the phone on average, she is going to try to take shortcuts to solve a customer’s problem instead of getting to the root cause. For example, if your Windows PC keeps locking up every few days and a re-boot fixes it, you will be told to proactively reboot every 24 hours just to get you off the phone. But that doesn’t necessarily fix the problem or guarantee that you will not have another lock-up (if the lock-up is a certain combination of programs opened at the same time that refuse to share a peripheral device, for example). As a result, the customer will end up calling back. Or, if she can’t solve your problem, you will be switched to another agent who “knows the system better”. That’s poor customer support, and all because you’re keeping track of the average time of every call and computing averages by rep and department.

Big data will let us compute more accurate economic forecasts, demand trends, process averages, and so on, but, as the author keenly points out, many important questions are simply not amenable to quantitative analysis, and never will be. The examples of where your child should go to college, how to punish criminals, and whether or fund the human genome project are just a few examples. Even more relevant are product design queries. 34% of users want feature A, 58% want feature B, and 72% want feature C, but how many want features A and B or A and C or B and C or all three features? And how many will be put off if the product also contains a feature they don’t want, is too confusing due to too many frivolous features, or doesn’t have all important feature D that you didn’t ask about, but now have to have because your competitor does?

And, even more important, McKinsey, which in 2011 claimed that we are on the cusp of a tremendous wave of innovation, productivity and growth … all driven by big data had to recently admit that there is no empirical evidence of a link between data intensity … and productivity in specific sectors. In other words, despite all of the effort put into big data projects over the last few years, none have yielded any results that are conclusively beyond results that would have been achieved without big data.

And, most importantly, as someone who has studied chaotic dynamical systems theory, the doctor can firmly attest to the fact that the author is completely correct when he says understanding the complexity of social systems means understanding that conclusive answers to causal questions in social systems will always remain elusive. We may be able to tease out strong correlations, but correlation is not causation. (And if you forget this, you better go back and take another read through Pinky and the Brain’s lesson on statistics.)

Societal Damnation 41: Fraud and Corruption

Fraud and Corruption is everywhere and running havoc on your organization and your supply chain. A recent Kroll Global Fraud Report in late 2013 found that 70% of companies were affected by fraud in the prior 12 months, which represented an increase of 15% over the previous twelve months. In other words, at the time, 7 in 10 companies were hit by fraud in the previous year. But it gets worse. The Economist at the same time also found that fraud was on the rise and predicted that it would continue to rise. If the rate of increase remained steady, then 4 of 5 businesses got hit with fraud last year and 9 out of 10 business will get hit with fraud this year. Yowzers!

Moreover, Procurement Fraud can be particularly costly and damaging, in both the public and private sectors. For example, a recent article over on Supply Management on how “Councils [were] told to do more to tackle Procurement Found” found that there were 107,000 cases of Procurement fraud detected by local authorities in 2012-2013 that combined accounted for £s; 178 million! And this is just a drop in the bucket when compared to the total amount lost by the UK public sector to fraudulent purchasing on an annual basis, an amount that was estimated at £s;2,300 million in 2012! Zoinks!

It’s harder to find good numbers for the US, but a 2011 report by Computer Evidence Specialists found that Fraud cost the US $1.32 Trillion in 2010, of which 733 Billion was Corporate (with 68% committed by corporations and 32% committed by employees). This number might sound surprising but when you consider that between 2000 and 2007 a small South Carolina parts supplier collected about 20.5 Million from the Pentagon between 2000 and 2006 in fraudulent shipping charges, including $998,798 for sending two 19-cent washers to an Army base in Texas, it puts things in a different light. (Source M4Carbine.net archives.) Hamana, hamana!

If your organization is not on full alert 24/7, it is going to get hit with fraud from somewhere in the organization or the supply chain. It’s just a matter of time before an attempt is made. This fraud can take many forms, which can include, but are not limited to:

  • invoices from non-existent suppliers
    usually submitted by an employee for services (not received) or goods of questionable origin to try and defraud the company of money (or by a random third party trying to hope a small invoice slips through unnoticed)
  • invoices from suppliers for off-contract goods and services
    usually for smaller dollar amounts for services “to be received” or for goods that are priced above standard list price for “emergency provision and delivery” where a supplier is trying to eek out more revenue or an employee is colluding to get a kickback
  • bait-and-switch
    where the supplier promises you the newest high-end laptop with the top-of-the-line processor and memory chips, but you actually get last year’s model which has depreciated 30% less (because, not being an IT shop, the supplier thinks you won’t know the difference) or charges you for Grade 5 Bolts when in fact they are only Grade 2 Bolts (and which you intend to use in commercial busses used to transport passengers, giving you a legal liability as well as a case of fraud)
  • inflated T&E claims
    where meetings across town are 50 miles instead of 10, all meals are $1 below the per diem limits, significant “entertainment” charges (especially on the first and last day where the employee or manager was actually entertaining friends and relatives), etc. (or, and this happened, the same receipt is accidentally submitted on consecutive expense reports)
  • inflated performance claims
    where a buyer “negotiates” a year-end rebate in exchange for guaranteed volume at unnecessarily higher prices next year so that he can exceed his savings target and get a bigger bonus
  • “lost” / “damaged” stock
    that is “walked” off the truck by an employee during a pre-lot entry inspection or, if the merchandise is un-returnable / too costly to return, declared damaged and purchased at pennies at the dollars by an employee who will resell the undamaged products on his own

In other words, fraud can happen anywhere, and at any time, and if a Procurement organization is not vigilant, it will happen to them. Fortunately, steps can be taken to reduce the chances of most of these frauds. Having a policy that invoices will only be accepted from approved suppliers, that all invoices from approved suppliers for non-contracted goods and services and/or for goods and services at non-contracted rates will prevent most external fraud from slipping through the system. (Collusion can still bypass the best of controls, but, unless the system is hacked, you know exactly who perpetrated the fraud in this instance.) Having T&E limits without budget manager approval, automatic zip-code based mileage checks, and fixed per-diems (while more costly) can weed out a lot of T&E fraud. Careful inspections and a two-step process can minimize the chances of a bait-and-switch and good stock being written off. And waiting a quarter to verify the numbers then and now before issuing a bonus will discourage many employees from trying to inflate their savings (or sales) claims.

However, no system is perfect and a lot of process transformation, and diligence, will be required to minimize the risk of fraud and corruption and limits its impact if it does happen. For Procurement, it’s another damned if you do (as the effort takes time and resources away from good category management that is often the largest source of value generation) and damned if you don’t (as the losses from a single fraud could wipe out most of the captured savings).