Correlations are Good, Causations are Better, but Models are Best!

the maverick recently penned a great post on Spurious Correlations, 150% Cost Savings and How to Justify Your Next Procurement Project that expounded upon the classic problem of second rate analysts, who didn’t read the classic post where the Brain gives Pinky a lesson in Statistics, using correlation to infer causation. If you listen to these analysts, you’ll find yourself in the same situation Roy Anderson was in when he was former MetLife CPO. Namely, the situation where if I added up all the promised savings from vendors and Aberdeen Reports, I’d be saving over 100% and suppliers would be paying me money! Not very likely, is it, but definitely the conclusion you’d draw if you stacked enough Aberdeen reports end to end!

After all, with enough data, you can find all sorts of near perfect correlations between (almost) completely unrelated data sets. For example, Pierre points out that if you go to tylervigen.com and check out the spurious correlations on the site, you’ll find out that there’s a near perfect correlation between

  • the number of people who drown after falling out of a fishing boat and the marriage rate in Kentucky (r=0.95),
  • the total number of computer science doctorates awarded in the US and the total revenue generated by arcades (r=0.99), and
  • the annual number of automotive suicides and the number of Japanese passenger cars sold in the US (r=0.94).

And while you might believe that computer science doctoral candidates spend all their free time in arcades, do you believe that buyers of Japanese passenger cars buy them to commit suicide or that somehow the marriage rate in Kentucky has something to do with the number of people who drown after falling out of a fishing boat? (I hope not!)

As Pierre notes, it’s important to have a good ROI model that really looks at the “R” and the “I” realistically. A model that uses ranges that allows you, as an analyst, to play around with assumptions and adjust the results based upon modifications to the assumptions. Procurement might want to be aggressive, but management might want to be conservative. Procurement might assume an abundance of supply, but Engineering, knowing that only a couple of suppliers are qualified to produce the refined raw materials needed, might want to assume a lack of supply based on the fact that these refined raw materials are becoming increasingly sought after. And so on.

Without a detailed model that captures all the cost components and the assumptions they are based on, Procurement can’t be realistic in its projections or its project requests. Analyst reports with benchmark data are a great starting point to identify where to look for savings, but the savings still have to be validated before a proposal is put forward.