Where the Brain gives Pinky a Lesson in Statistics

Pinky and the Brain
They’re Pinky and the Brain
Yes, Pinky and the Brain
One is a genius, the other is insane
They’re advertising guys
Their mind is on the prize
They’re dinky
They’re Pinky and the Brain, Brain, Brain, Brain,
Brain, Brain, Brain, Brain, Brain

Before each night is done
Their plan will be unfurled
By the dawning of the sun
Take over the sourcing world

They’re Pinky and the Brain
Yes, Pinky and the Brain
Their twilight campaign
Is easy to explain
To prove their sourcing grace
They’ll overthrow the space
They’re dinky
They’re Pinky and the Brain, Brain, Brain, Brain,
Brain, Brain, Brain, Brain, NARF!

Pinky	Gee Brain, what are we going to do tonight?
Brain	Same thing we do every night Pinky – try to take over the Sourcing World!
Pinky	How are we going to do that? Narf!
Brain	I don’t know, Pinky. I don’t know. Give me some time to think!
Pinky	Poit! Okay, Brain. Pinky goes back to reading the paper he has in his hands. Brain, this is really interesting.
Brain	What is it this time, Pinky? 64% of net surfers searched for information about Hannah Montana’s concert series last month?
Pinky	I don’t know … but I’d sure like to see her in concert. Narf! It’s this research report. Did you know that their research shows that international transportation processes are less likely to be automated than domestic transportation processes in 72% of firms? And that this is because there are more components to schedule, more trading partners to deal with, and more places where things can go wrong?
Brain	You do know those reports are riddled with statistical errors and misleading representations, don’t you?
Pinky	What do you mean, Brain?
Brain	For starters – did they ask the firms surveyed why their international transportation processes were less automated, or did they just ask them if their transportation processes were automated domestically and/or internationally?
Pinky	Uhm, I don’t know.
Brain	Was the report full of questions?
Pinky	Yes.
Brain	After reading that entire report, did you see a single question that asked why the transportation processes were less automated, and, more specifically, were there any sets of potential answers with associated response counts?
Pinky	Uhm, no.
Brain	Then they’re making a common mistake found in many research reports. They’re fabricating reasons for outcomes without actually studying those reasons. You see, Pinky, statistics, by nature, isn’t definitive. You cannot prove anything with statistics, merely indicate correlation and statistically likely causation, if the right questions are asked and follow up studies are performed to test hypotheses.
Pinky	So, their report suggests reasons, with the implication that that they’re correct, but there’s really no proof at all that their reasons have anything to do with the results?
Brain	That’s right, Brain. And we pay them money for the privilege of reading the report!
Pinky	But doesn’t everyone in the media jump to conclusions this way? And the analyst firm never really says that their results prove their hypotheses, so they’re not that bad, right?
Brain	Yes, Pinky. For the most part, all media misleads people in the same way, and in that respect, they’re not that bad. But the reports are still riddled with statistical errors, false conclusions, and misleading representations.
Pinky	What do you mean, Brain?
Brain	Read me something else.
Pinky	Did you know that best-in-class companies are 22% more likely to have the capability to divert goods in transit? As compared to average performers and laggards where only 17% and 18%, respectively, can divert goods in transit?
Brain	The issue here is much more subtle. They’re breaking the statistics down into three groups almost arbitrarily, probably by a measurement of “spend under management.” But is this really the best way to break the groups down? Did they do an exhaustive study and determine this is the best way to categorize companies? Maybe, due to different spend categories in different industries, it’s better to leave certain types of spend to the local business units rather than centrally manage it. And what does “spend under management” mean, anyway? Is it rigidly defined? But I digress. This is potentially erroneous because they compare best-in-class to average and laggards and not to the overall set. Doing the math, this means that, according to their arbitrary class sizes, approximately 18.3% of companies overall can divert goods in transit. Thus, whereas they are 29.5% better than average, overall, they are only 20% better. Thus, they are artificially inflating the statistics for best-in-class.
Pinky	Oh. Poink. But it’s still not wrong, right?
Brain	No, the math is perfectly valid, as long as you accept their methodology of division and believe the division is significant. But still misleading.
Pinky	So, this is everything wrong with the reports, right?
Brain	Far from it. Read me something else.
Pinky	62% of best-in-class companies that use third party logistics providers use several of them as opposed to a single company, so companies would do well to forge relationships with several partners based on their services and areas of operation.
Brain	That’s the one I was waiting for!
Pinky	What do you mean, Brain? Zoit!
Brain	They make that error at least once in every report I’ve ever read. It’s probably the most common statistical error in existence. They’re confusing correlation with causation. Just because the use of multiple logistics providers is correlated with better logistics performance, this doesn’t mean that the use of multiple logistics providers is the reason that these companies achieve better logistics performance.
Pinky	But it sure sounds convincing. Narf!
Brain	Remember last night when I talked about dropping a hammer on your foot?
Pinky	Gulp! Yes …
Brain	And remember how I said that you would yelp and hop on one foot?
Pinky	Pinky cowers and covers his feet. Gulp! Gulp! Yes …
Brain	And I said …
Pinky	That the dropping of the hammer, the yelp, and the hop were all correlated, but there were only two causal relationships. The dropping of the hammer caused the yelp and the hop, the reverse didn’t hold true, and the yelp didn’t cause the hop and the hop didn’t cause the yelp. Please don’t drop the hammer on my foot!
Brain	As long as you remember your lessons Pinky, there’s no need.
Pinky	So that’s everything wrong with the research reports, then?
Brain	Not even close.
Pinky	But what else could you possibly have a problem with?
Brain	How were the respondents selected?
Pinky	It says here they advertised the survey through e-mail and their web-site and used the results of those who responded.
Brain	And is that representative of the population as a whole?
Pinky	They’re studying companies, Brain. Not people.
Brain	Pinky!!!!
Pinky	But …
Brain	The term population is used in statistics to refer to the universe of entities under study. In this case, corporations.
Pinky	Oh.
Brain	And the answer is a definitive no.
Pinky	I don’t follow.
Brain	Of course not. First of all, not all companies necessarily use the web, or pay attention to the research company’s web site even if they did. Secondly, not all companies have fluent English speaking representatives. Thirdly, not all companies are aware of the research company conducting the survey, and therefore may disregard their emails. And, most importantly, the respondents are self-selecting. There’s no guarantee that the self-selecting population is even representative of those companies, yet alone the population as a whole!
Pinky	But the respondents respond randomly, and that’s the core requirement, right?
Brain	Yes, but it does not mean that they constitute an appropriate random set of the entire population of companies that use logistics companies. At best it’s a random set of self-selecting companies that use the internet that have fluent English capabilities that are aware of the research company.
Pinky	But they’re only reporting the significant results. That makes up for any variation in the set of respondents, right?
Brain	Well, there’s two issues there. First of all, how do you judge what’s significant? Let’s say they found that the same number of best-in-class, average, and laggard companies used a commercial WMS but failed to report this. That would be very significant, since it suggests that there is no correlation between use of a WMS and being best-in-class. And when compared against the earlier statistic you quoted, it implies that if you are best in class, you are more likely to be using a TMS than a WMS.
Pinky	But aren’t you then assuming causation and making the same mistake you’re accusing them of making?
Brain	No, Pinky. I’m simply pointing out that there is a stronger correlation between TMS and best-in-class than WMS and best-in-class. This is important because it would tell us that best-in-class and TMS tend to go together while WMS and best-in-class do not. Although you don’t know if one causes the other, you do know that they are correlated and, thus, if you have a choice, the better choice is a TMS. It might not make you best-in-class, but you know that if you were, you’d likely be using it anyway.
Pinky	But it doesn’t say anything about WMS, Brain.
Brain	And that’s my point. What is it leaving out?
Pinky	I don’t know.
Brain	Precisely, Pinky. Precisely. And back to your second point where you said that their process makes up for any variation in the set of respondents. Now you’re making an error – and a big one. If your sample set isn’t an adequate representation of the entire population, there’s nothing you can do to make up for it. Your research is flawed from the start. There’s no way to cancel an error in statistics. All you can do is propagate the error and make it worse.
Pinky	Oh. So that means …
Brain	The point I was trying to make in response to your previous query holds. The results only apply to the population the sample represents, and that might not be the entire population.
Pinky	And that means …
Brain	The applicability and usefulness of the results might not be all that broad or what you hoped for.
Pinky	But they got over 200 responses.
Brain	This is another very common error. 100 responses is 100 responses …
Pinky	200 …
Brain	A look of extreme impatience appears on Brain’s face. Okay! Have it your way! 200 responses is 200 responses, but how many people did they attempt to survey? How many e-mails did they send out? How many people saw the advertisement for the survey on their web-page? I’m betting it was over 2,000. In fact, I’m betting it was over 20,000. Did they say?
Pinky	Not that I can see.
Brain	Let’s say, conservatively, that it was only 20,000. That would mean their response rate is 1%. That tells us something.
Pinky	Like what?
Brain	That only 1% of self-selecting companies who had previously expressed interest in the firm’s research cared enough about the topic to respond to the survey. I’ll let you draw your own conclusions from that.
Pinky	Oh. But …
Brain	What else are they leaving out? I don’t know. I’m betting they didn’t include the original survey. That’s important. Psychologists have found that how you ask the question can be more important than what you are asking. For example, let’s take research into risk aversion. If you ask someone how much risk they are willing to take, dollar-wise, versus giving them a set of questions with two alternatives, you’ll get two different answers that become clear when you translate them into economic utility functions. You can even give them the same expected returns or losses, but just phrasing one question positively and one question negatively can lead to different results. People are more likely to settle for a fixed gain and more likely to risk a variable loss. I bet they didn’t provide the entire data set either. Depending on what statistical distribution, or even what statistical separation technique – such as the one they used to define best-in-class, you applied to the data, I’m betting you could come up with noticeably different results.
Pinky	But they usually have more respondents responding to their surveys than private companies or smaller firms do, so that must average out some of the error, right?
Brain	No. Again, Pinky – you can’t “fix” or “average out” errors. Only propagate them. And a bigger sample is not always better. If a sample gets too big, then even trivial conclusions can become “statistically significant,” even if they have no practical value. It’s just the way the math works.
Pinky	Oh. So statistically significant conclusions drawn from super-large samples can be totally meaningless?
Brain	Yes. But … An evil grin appears on Brain’s face. Pinky, are you pondering what I’m pondering?
Pinky	I think so, Brain. But last time we went ice skating I slipped and fell and bruised my tail-bone.
Brain	No, Pinky. This is one misconception that everyone shares! Even though a few people may understand that there’s a difference between correlation and causation, almost no one realizes that super-large sample sizes can produce meaningless results.
Pinky	And how does that help us?
Brain	Remember last night?
Pinky	Where you tried to steal the perfect survey? Narf!
Brain	Where I tried to obtain the perfect survey. Well, I don’t need the perfect survey. I just need the survey on the world’s largest data set.
Pinky	And how will that help you, Poit?
Brain	Because the masses will believe that whatever the results suggest must not only be right, but indisputable!
Pinky	And how does that help you take over the sourcing world?
Brain	Because we’ll twist the results into suggesting that I must rule the sourcing world.
Pinky	And how will we do that?
Brain	We’ll rely on the fact that the majority of our readers don’t understand the difference between correlation and causation.
Pinky	Wow! I get it! What a great plan! But how are we going to get enough responses to create the world’s largest data set?
Brain	We’re not! And that’s the beauty of it!
Pinky	Huh?
Brain	We’re going to create the world’s largest meta-survey!
Pinky	And how are we going to do that?
Brain	We’re going to use that huge stack of reports you’ve been collecting for the past 20 years and, just like academics, construct a meta-survey that we will answer using the responses to all of those reports.
Pinky	We’re going to aggregate the results?
Brain	Sort of. The mathematics involved to do a proper statistical meta-survey require more than just simple aggregation, but I won’t trouble your feeble brain with the details. Suffice it to say that our report will be a definitive guide to choosing the new leader of the sourcing world.
Pinky	Who is …
Brain	Me, you imbecile!
Pinky	Narf! That’s wonderful, Brain!
Brain	Yes, it is.
Pinky	And since you’re going to be doing approximately the same thing that the analyst firms will be doing in the creation of their State of the Market reports, there are plenty of precedents. I just know this plan will work!
Brain	What?!?
Pinky	The analyst firm that produced this paper. They’re doing their own meta-survey. It’s going to be the largest ever! I have the invite here somewhere… Pinky dives into his pile of papers and emerges a few seconds later.Here it is, Brain!
Brain	Brain takes the invite from Pinky. A look of extreme disappointment crosses his face. *No!*
Pinky	What’s wrong, Brain? Narf!
Brain	My plan. It’s useless.
Pinky	What do you mean? I think it’s brilliant.
Brain	We can’t do our own meta-survey now!
Pinky	Why not? Zoit?
Brain	We’ll be the laughing stock of the sourcing world! Not only will we look like lame copycats, but once the analyst firm releases their report and people scrutinize it and realize that it doesn’t contribute any significant new information, they’ll be turned off from the meta survey approach until they forget about it. Considering the length of time this analyst firm advertises their research, it will be close to a year before we can even think about trying this again.
Pinky	But I thought you said they’d believe it?
Brain	They’ll believe it. But since it won’t say anything new, they’ll judge it as a waste of effort.
Pinky	Oh. Poit.
Brain	Yes. Poit. Well, I guess it’s time to retire back to the marketing cage.
Pinky	Why, Brain?
Brain	To prepare for tomorrow night.
Pinky	What are we going to do tomorrow night? Narf?
Brain	The same thing we do every night, Pinky. Try to take over the sourcing world!

Before each night is done
Their plan will be unfurled
By the dawning of the sun
Take over the sourcing world

M	T	W	T	F	S	S
« Dec				Feb »
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Sourcing Innovation

Next Generation Supply Management Defined

Where the Brain gives Pinky a Lesson in Statistics