Daily Archives: April 18, 2008

Spend Analysis: Another Book Review … And This One’s NOT Positive!

Pandit and Marmanis recently published a book titled Spend Analysis: The Window into Strategic Sourcing that has received a fair amount of praise from prince and pauper alike. Since I am currently in the process of co-authoring a text on the subject (now that my first book, The e-Sourcing Handbook [free e-book version], is almost ready to go to press), I figured that I should do proper due diligence, obtain their book, and read it cover to cover. I did – and I was disappointed.

Although the book would have been interesting ten years ago, good seven years ago, and still have some relevance five years ago, today it adds very little insight. In fact, the book is filled with fallacies, incorrect definitions, and poor advice.

Problems start to surface as early as the third paragraph (page 5) where the authors attempt to ‘simplify’ the definition of Spend Analysis, stating that “spend analysis is a process of systematically analyzing the historical spend (purchasing) data of an organization in order to answer the following types of questions“. There are at least three problems with this ‘simplification’:

  • Spend analysis is NOT systematic. Sure, each analysis starts out the same … build a cube … run some basic reports to analyze spend distribution by common dimensions … dive in. However, after this point, each analysis diverges. Good analysts chase the data, look for anomalies, and try to identify patterns that haven’t been identified before. If a pattern isn’t known, it can’t be systematized. Every category sourcing expert will tell you that real savings levers vary by commodity, by vendor, by contract, and by procuring organization — to name just a few parameters.
  • Good spend analysis analyzes more than A/P spend. It also analyzes demand, costs, related metrics, contracts, invoices, and any other data that could lead to a cost saving opportunity.
  • The questions the authors provide are narrow, focused, and only cover low hanging fruit opportunities. You don’t know a priori where savings are going to come from, and no static list of questions will ever permit you to identify more than a small fraction of opportunities.

From here, problems quickly multiply. But I’m going to jump ahead to the middle of the book (page 101) where the authors (finally) present their thesis to us, which they summarize as follows:

A complete and successful spend analysis implementation requires four modules:

  • DDL : Data Definition and Loading
  • DE : Data Enrichment
  • KB : Knowledge Base
  • SA : Spend Analytics

Huh? I don’t know about you, but I always thought that spend analysis was about, well, THE ANALYSIS! A colleague of mine likes to say, when aggravated, “it’s the analysis, stupid”. And I agree. A machine can only be programmed to detect previously identified opportunities. And guess what? Once you’ve identified and fixed a problem, it’s taken care of. Unless your personnel are incompetent, the same problem isn’t going to crop up again next month … and if it does, you need a pink slip, not a spend analysis package. DDL? Okay – you need to load the data into the tool – but if you don’t know what you’re loading, or you can’t come up with coherent spend data from your ERP system, you have a different problem entirely (again, you’re in pink slip territory). Enrichment? It’s nice – and can often help you identify additional opportunities, but if you can’t analyze the data you already have, you have problems that spend analysis alone isn’t going to solve. Knowledge base? Are the authors trying to claim that the process of opportunity assessment can be fully automated, and that sourcing consultants and commodity managers should pack their bags and head for the hills? Last time I checked, sourcing consultants and commodity managers seem to have no difficulty finding work.

So let’s focus on the analysis. According to the authors,

an application that addresses the SA stage of spend analysis must be able to perform the following functions:

  • web-centric application with Admin & restricted access privileges
  • specialized visualization tools
  • reporting and dashboards
  • feedback submission for suggesting changes to dimensional hierarchy
  • feedback submission for suggesting changes to classification
  • immediate update of new data
  • ‘what-if’ analysis capability

I guess I’ll just take these one-by-one.

  • Web-centric? If the authors meant that users should be able to share data over the web, then I’d give them this one … but the rest of the book strongly implies that they are referring to their preferred model, which is web-based access to a central cube. I’m sorry, that is not analysis. That is simply viewing standardized reports on a central, inflexible warehouse. We’ll get back to this point later.
  • They got this one right. However, the most specialized “visualization tool” they discuss in their book is a first generation tree-map … so maybe it was just luck they got this one right.
  • Reporting is a definite must – as long as it includes ad-hoc and user-driven analyses and models. Dashboards? How many times do I have to repeat that today’s dashboards are dangerous and dysfunctional.
  • Feedback submission for suggesting changes? There’s a big “oops!” Where’s the analysis if you can’t adjust the data organization yourself, right now, in real time? And if you have to give “feedback” which goes to a “committee” where everyone else has to agree on the change, which typically negates or generalizes the desired change – guess what? That’s right! The change never actually happens, or if it does happen, the delay has caused it to become irrelevant.
  • Feedback submission for suggesting fixes to the data? How can you do a valid analysis if you can’t fix classification errors, on the fly, in real time?
  • If the authors meant immediate update of new data as soon as it was available, then I’d give them this one. But it seems that what they really mean is that “the analysis cube should be updated as soon as the underlying data warehouse is updated“, but considering that they state on page 182, “in our opinion, there is no need for a frequent update of the cube” (note the singular case, which I’ll return to later), and then go on to state that quarterly warehouse updates are usually sufficient, I can’t give them this one either.
  • I agree that what-if analysis capability is a must – but how can you do “what if” analysis if you can’t change the data organization or the data classification, or even build an entirely new cube, on the fly?

The authors then dive into the required capabilities of the analytics module, which, in their view, should be:

  • OLAP tool capable of answering questions with respect to several, if not all, of the dimensions of your data
  • a reporting tool that allows for the creation of reports in various formats; cross-tabulation is very important in the context of reporting
  • search enabled interface

Which, at first glance, seems to be on the mark — except for the fact that the authors’ world-view does not include real-time dimension and data re-classification, which means that any cross-tabs that are not supported by the current data organization of the warehouse are impossible. Furthermore, it’s not the format of the reports that matter, but the data the user can include in them. Users should be able to create and populate any model they need, whether it’s cross-tabular or not. Finally, we’re talking about spend analysis, not a web search engine. Search is important in any good BI tool, but if it’s one of the three fundamental properties that is supposed to make the tool ‘unique’, I’m afraid that’s a pretty ordinary tool indeed.

The authors apparently don’t understand that spend analysis is separate from, and does not need to be based on, a data warehouse. Specifically, they state (on page 12) that “data warehousing involves pulling periodic transactional data into a dedicated database and running analytical reports on that database … it seems logical to think that this approach can be used effectively to capture and analyze purchasing data … indeed … using this approach is possible“.

It’s possible to build a warehouse, but it’s not a good idea for spend analysis. The goal of warehousing is to centralize and normalize all of the data in your organization in one, and only one, common format that is supposed to be useful to everyone. Unfortunately, and this is the dirty little secret with data warehouses, this process ends up being useful to no one in the organization, which is why most analysts simply download raw transactions to their desktops for private analysis, and ignore the warehouse. But the authors don’t stop there. In a later chapter, they go on to imply that the schema is very important and that selection of the target schema for spend analysis should be carefully chosen based on several considerations (page 177), namely:

  1. are your domains adequately represented?
  2. will your schema be evolving to support a centralized PIM system?
  3. is your company global? is internationalization an important requirement?
  4. is any taxonomy already implemented at a division level?
  5. has the schema been maintained in recent months?

To this, all I can say is:

  1. Doesn’t matter. What matters is that the analyst has the data she needs for the analysis she is currently conducting.
  2. Who cares? There should be no link between your PIM and your SA system. PIM is just another potential data source to use, or ignore, as your analysts see fit.
  3. Whatever. If you have a good ETL tool, you can define a few rules to do language and currency mapping on import.
  4. Irrelevant. We’re talking SA, not ERP.
  5. I would think it would have been, since the only way in the authors’ worldview to change spend data representations is to change the underlying schema of the warehouse!

The authors cheerily state (on page 14) that “a good commodity schema is at the heart of a good spend analysis program because most opportunities are found by commodity managers“. But hold on just a minute! If most of your opportunities are being found by your commodity managers using a basic A/P spend cube, then they’re limiting themselves to very simple low hanging fruit – which is picked clean in the first few months in a typical organization that makes a commitment to spend analysis. That’s why the traditional spend analysis value curve drops to almost zero within a year – meaning that if you don’t recover the cost of the effort in the first three months, you’ll never recover it. An A/P cube is just the beginning of the discovery process, not the endpoint.

The authors also make a strong argument for auto-classification, stating that (on page 100) “the reader must note that classifying millions of transactions is a task that should be done by using auto-classification and rules-based classification technology” and that “unless you license spend analysis applications, data scrubbing can be a very manual time consuming activity which requires a team of content specialists“.

Actually, nothing about rules-based classification mandates that the rules must be built by a robot, and there are many reasons why that can be a bad idea (not the least of which is the fact that robots are far from infallible). Classification rules can be built easily and effectively by hand … by a clerk … even in a very large organization with many disparate business units. Once built, this set of rules can then be applied in a fully automated way to every new transaction added to the system. So let’s not confuse “automation of creation” with “automation of application,” please. Of course, you do need a good, modern, spend analysis tool that allows for the creation of rules groups of different priorities, and you need a rules creation mechanism that’s easy to use and easy to understand.

Have you ever wondered why skilled consultants can build and map a spend cube to 90% accuracy very quickly? Well, here’s one tried-and-true “manual” methodology that builds terrific “automated” rules:

  1. map the top 200 GL codes
  2. map the top 200 vendors
  3. map the GL code + Vendor for vendors who sell you more than one item, or items in more than one category, depending on the level of detail you need

If you want to, you can get to 95-97% accuracy by extending to the top 1000 GL codes and the top 1000 vendors — if you really believe you are going to source 1000 vendors (and of course you’re not). To check your work, you’ll need to run reports that show you:

  • top GL’s and top commodities by vendor
  • top vendors and top GL’s by commodity
  • top vendors and top commodities by GL

Simply keep mapping until all three reports are consistent, and you are as accurate as you’ll need to be — and you’ll have the advantage of having built your own mapping rules, that you understand. The alternative, which is error-checking the work of an automaton (a process that must be done, because no robot is perfect), is difficult, tedious, and error-prone — and it must be repeated on every data refresh.

When the authors state (on page 116) that “manual editing is sufficient, but it is also extremely inefficient … it is not scalable with respect to the size of the data“, this is flatly untrue. The creation of dimensional mapping rules is wholly unrelated to the volume of the transactions — the same effort is required for 1M transactions as is required for 100M, and most spend datasets can be mapped very effectively with dimensional rules only. The only exception is datasets whose only component is a text description; and here, too, the authors’ “scalability” argument falls apart, since human-directed phrase mapping can divide-and-conquer quite effectively.

To top it all off, the authors go on to violate the first rule of spend analysis, which is “NEVER, EVER, EVER EXCLUDE DATA”. They take great pains to classify all of the errors that can occur in the ETL process and then bluntly state that (on page 109) “if you have errors in category iv (root cause is undocumented and cannot be inferred), then you have two alternatives … the first alternative, if possible, is to exclude these data from your sources … errors of category iv are unacceptable and could jeopardize your entire analysis … so they should be eliminated“.

No, NO, NO! YOU MUST ACCEPT ALL OF THE RECORDS and YOU MUST DO SOMETHING SENSIBLE with the records that don’t fit into your notion of reality. For example — create a new Vendor ID, and family it automatically under Not Found, or Missing. Dropping data jeopardizes your analysis much more than creating an “Uncategorized” or “Missing” data node. What if errors represent 15% of your spend? Then you’d be reporting that you are spending 85M on a category when you are spending 100M. Your numbers won’t add up … and when the CFO files a SEC filing on data that is later found to be incomplete by the auditors, guess whose head is going to roll?

And before I forget, let’s get back to that web-centric requirement where the authors imply that all of this means web-based access to a central cube (singular case). Throughout the entire book they refer to “the cube” (such as when they state that “in our opinion, there is no need for a frequent update of the cube“) as if there’s only ever one cube to be built. Turns out there isn’t just one cube to be built — there are dozens of cubes to be built. Some power analysts build 30 or 40 commodity-specific invoice-level cubes (what are those? you won’t learn that from Pandit and Marmanis), and regularly maintain a dozen of these every month — not every quarter (as the authors recommend).

The only real hint that the authors give that multiple cubes might be useful is where they state (on page 51) that “some companies are taking the approach of creating different cubes for different uses, rather than packing all possible information in a single cube for all users … for example, all users might not be interested in auditing P-Card information … rather than include all of the details related to P-Card transactions in the main cube, you can simply model the top-level info (supplier, cost center) in the main cube … then … create a separate ‘microcube’ that has all of the detailed transactional information … the two cubes can be linked, and the audit team can be granted access to the microcube … the microcube approach can be rolled out in a phased manner“. Or, in short form, you can have multiple cubes if you have too much data, and the way you do it is to create ONE main cube, and then micro-cube drill-downs for relatively non-important data. I don’t even know how to verbalize how wrong this is — it completely inverts the value proposition. (Now, to be fair, they also state that “ideally, the cubes should be replicated on the user machine for the purposes of making any data changes“, but they give no definition as to what form these cubes should take or what changes are to be permitted, so we are left assuming their previous definition, which is secondary micro-cubes and only minor, meaningless, alterations, since the dimensional and rule-based classifications require “approval”).

At this point, you’re probably asking yourself – did the authors get anything right? Sure they did! Specifically:

  • Chapter 4 on opportunity identification had a good list of opportunities to start with. Too bad most of them are the low-hanging fruit opportunities easily identified with out-of-the-box reporting and that there’s no real insight on how to do serious untapped opportunity identification when there isn’t a pre-canned report available.
  • Chapter 5 on the anatomy of spend transactions had a good overview of the formats used in various systems … but if you’re a real analyst, you probably know all this stuff anyway.
  • Chapter 7 on taxonomy considerations had good, direct, simple introductions to UNSPSC, eOTD, eCl@ss, and RUS. It’s too bad these schemas are relatively useless when it comes to sourcing-based opportunity identification.
  • When the authors pointed out (on page 8) that there is still widespread low-adoption of spend analysis, they are correct … but when they state that it’s because we’re talking tens or hundreds of millions of transactions, it’s irrelevant and wrong. For any specific analysis, there’s probably only a few million or tens of millions of transactions that are relevant, and a real spend analysis tool on today’s desktops and laptops can operate on that number of transactions without issues. There is no need for a mainframe.
  • When they state that the categorization of errors is critical because not all errors are equally costly to fix, they’re right … but the data warehouse is irrelevant. Just add a new mapping rule and you’re done. Two minutes, tops. What’s the big deal? Oh, I forgot — in the authors’ world, you can’t add a new mapping rule on the fly.

To sum up, when the authors state in their preface (on page xv) that “if implemented incorrectly, a spend analysis implementation program can become easily misdirected and fall short of delivering all of the potential savings“, I wholeheartedly agree. Unfortunately, the authors themselves provide a road map for falling short.