Category Archives: Spend Analysis

Analytics I: Optimization Comes of Age

Today’s post is by Eric Strovink of BIQ.

I remember my first experience with optimization. I was taken to a guidance counsellor’s office at my local high school, where a special terminal was set up. This terminal was connected to a system that would allegedly try to find the “best” college for me. It asked many questions. Questions like, “Would you prefer a warm climate?” and “Would you prefer an academic setting with equal numbers of men and women?” Well, duh. Those were easy answers.

My goal was to attend one of the premier engineering schools in the US. I wanted MIT or CalTech or Stanford or Carnegie Mellon. I’d be happy with Rice. If my grades or scores weren’t good enough for the snooty super-competitive schools, I’d try for Rensselaer or Northeastern.

The system ended up choosing an entirely unsuitable school, evidently equally weighing my academic preferences and my social and geographic preferences.

What’s my point? Well, in a microcosm, this has been the essential problem with optimization. When you provide a “constraint” — and let’s be precise, here, the term really is “constraint” — an optimizer will not look outside that constraint for options. It cannot. It is a mathematical engine, and it can’t read your mind and figure out which is a “soft” requirement and which is a “hard” requirement. As far as it’s concerned, they’re all requirements, and, by whatever God you (don’t) believe in, it will find a solution that fits those requirements, if there is one.

That’s one reason why optimization has struggled to find its way.

I was listening to my wife talking to a survey telemarketer the other day. She said, “I really don’t have an opinion about Blue Cross’s responsiveness to patient needs. I’ve never had Blue Cross.” There was a pause. Then she said, “But how can I have an opinion on a 1 to 10 scale, if I’ve never used them?” There was another pause. She said, “OK, but ….” There was another pause. She sighed, and said, “OK, 5.”

What’s my point? Well, do you really know the answer to what kind of constraints you should impose on your optimization model? Or are you supplying an answer because you don’t know the answer, but you have to supply something? And after the optimization model has solved, can you remember all the places where you guessed, but you didn’t really know? What if you forgot one of those places? And what if that one guess caused the model to solve in a really non-optimal way (non-optimal from your perspective, not its)?

That’s another reason why optimization has struggled to find its way.

The breakthrough has come with what I’ll term “guided optimization”. If you hike in the White Mountains of New Hampshire, for example, you have a large number of excellent trails to choose from. Many of them are safe climbs that lead to outstanding views and vistas; but others lead up steep, often wet cliffs that are unsuitable for casual hiking. You need a guide; in this case, any of the excellent guide books from the Appalachian Mountain Club. In the case of optimization, your guide usually needs to be an experienced practitioner who can help you set up your model, show you how to move constraints to find inflection points in your model, and so on. (The good news is that lots of vendors provide guided services now, and it isn’t that expensive. Especially when you consider that optimization can be incredibly valuable.)

Companies that provide guided optimization services, like Trade Extensions, have enjoyed solid growth and have left a legacy of satisfied customers. You can always use optimization software on your own (Trade Extensions is no exception); but until you really understand what you’re doing, it can be unwise.

Optimization vendors have claimed for years that their systems are usable by novices. I don’t dispute that there are cases where this is true, and has been true. But for me, it’s a case of crying wolf: there have been so many claims, for so many years, with so many tears, that I’m solidly in the “get a guide” camp. I do hope, though, that optimization vendors will take additional steps to make guidance unnecessary. the doctor has assembled a pretty comprehensive list of what needs to happen.

At the end of the day, if you can’t do analysis yourself, you’re less likely to do it at all; which, as you’ll see in the next installment, is the theme of this series.

Next: Analytics II: What is Analysis?

Share This on Linked In

An Analyst Finally Gets BI Right!

After reading report after report after report from analyst firm after analyst firm after analyst firm for the last decade or so on how more BI is the answer (it’s usually not), I was very pleasantly surprised by this recent post by Lora Cecere (ex-AMR) of the Altimeter Group over on Supply Chain Shaman on why you should free the data to answer the questions that you don’t know to ask.

The first paragraph captures the situation perfectly:

It happens all the time. IT says to line of business leaders, “Tell me what you need for Business Intelligence (BI), and I will go find the right technologies“. The issue is that we don’t know, and we will not know soon. We only know that applications are changing and that the data is growing exponentially. The answer to the question of: “What is the right data architecture for demand-driven value networks?” is “It is evolving. We don’t know“.

This means that no fixed BI or OLAP solution is ever going to solve your problem! I don’t care if it can handle and/or is designed for geo-spatial data, sentiment analysis, loyalty programs, POS, or CRM. It won’t work. That’s why, as I keep stressing, you need a real data analysis solution that can cube, dimension, map, slice, dice, augment, expand, re-map, re-cube, and start again on data sets of millions of transactions in real time on your high end laptop or workstation. Until each of your analysts has this type of solution on their desktops, they’ll never get the intelligence they’re looking for.

Share This on Linked In

Auto-Classification is NOT the Answer, Part II

Today’s post is co-authored by Eric Strovink of BIQ.

In Part I, we overviewed the four of the major reasons why auto-classification is not the answer, namely:

  1. Automatically Generated Rule Sets are Difficult (if not Impossible) to Maintainand after a few days of trying, you might just go mad
  2. The Mapping is Rife with Errorsrunning a simple “Commodity Summary Report” after the first “auto-mapping” pass will reveal so many errors that it will knock you off your seat
  3. Automated Analysis is NOT Analysisas all an “automated” analysis can do is run a previously defined report
  4. True Analyis Goes Well Beyond AP Dataand there are considerably more opportunities in PxQ (price-by-quantity) data

However, even if all of this weren’t true, there is still one very good reason not to use automated classification, and that is:

5. Classification is Easy

The “secret sauce” of Commodity mapping has been known for over two decades. Create a hierarchical rules overlay, where oneset of rules overrides the next, as follows:

  1. Map the GL codes
  2. Map the top Vendors
  3. Map the Vendor + GL codes (for top Vendors who sell more than one Commodity)
  4. Map the Exceptions (for example, GL codes that always map to a particular Commodity)
  5. Map the Exceptions to the Exceptions

Why does this method work? It works because the “tail” of the distribution, which is spend you can’t afford to source and don’t care about, ends up being weakly mapped via GL codes by group 1. The vendors you actually care about, in a first-order mapping exercise perhaps the top 500 or 1000 by spend, are very carefully mapped in group 2; and if they provide more than one Commodity (service or product), they are mapped even more carefully again in group 3. Groups 4-N cover exceptions — such as the case where a particular vendor in a particular geography is “known” to provide only one Commodity. Note that this type of knowledge is known only by you — no automatic classifier could possibly know this, and therefore no automatic classifier can take advantage of such knowledge.

Note that errors do not creep into this process. It is hard to make a mistake, and it’s obvious where the mistake has been made when it is made.That’s why the work can be done, by hand, to over 97% accuracy by a clerk in just a few days even in the largest of Fortune 500s. Why? Because the clerk does not have to think, just map. And once the mapping is done, it’s done, and it’s accurate. The rules are saved and never have to be modified, and their interaction with each other is easy to understand. The only changes that will ever be required are

  1. new rules when new GL codes or vendors are introduced,
  2. archival of old rules when old GL codes or vendors are retired, and
  3. new exception rules when mapping errors are discovered.

And if by some chance a user can’t find the time to map spend, then “auto-mapping” (or an outsourced manual mapping effort) should be required to produce the above rule groups. That way, the user can add to and modify the rule groups by hand, using the same tools, when errors in the mapping are discovered. The tool should not be reclassifying data automatically to new rules generated on every cube refresh (which is what could happen if the classifier is, for example, using genetic algorithms for mapping rules).

Why use an error-ridden auto-classification process when you can do it error free the first time, by hand, in a few days, and get immeasurably better results?

Share This on Linked In

Auto-Classification is NOT the Answer, Part I

Today’s post is co-authored by Eric Strovink of BIQ.

Not a month doesn’t go by these days without a new spend classification / consulting play hitting the market. Considering that true spend analysis is one of only two sourcing technologies proven to deliver double digit percentage savings (that average 11%), one would think this would be a good thing. But it’s not. Most of these new plays are focusing on automatic classification, analysis, and reporting — which is not what true spend analysis is. True spend analysis is intelligently guided analysis, and, at least until we have true AI, it can only be done by a human. So what’s wrong with the automatic approach?

1. Automatically Generated Rule Sets are Difficult to Maintain

Almost all of today’s auto-classifiers generate a single level rule set that is so large that the size alone makes it unwieldy. This is because auto-classifiers depend on string matching techniques to identify vendor names or line items. But when a new string-matching rule is added, what is its impact on the other rules? There is no way to know other than to replay the rules every time. This quickly exhausts the patience of anyone trying to maintain such a rules set, and produces errors that are difficult to track down and essentially impossible to fix. Worse, what happens when you delete a rule? The process is intrinsically chaotic and unstable. We get calls all the time from users who have thrown up their hands at this.

But with a layered rule set (more on this in part II), where each rule group takes priority over the rule sets above it, the average organization can achieve a reasonable first-order mapping result with only a few hundred GL mapping rules and a few hundred vendor mapping rules, along with a handful of rules to map vendor + GL code combinations in the situations where a vendor supplies more than one Commodity (and an even smaller number of exception rules where a vendor product or service can map to a different Commodity depending upon spend or use). If finer resolution is required, map more GL codes and more vendors; or map just the GL codes and vendors that are relevant to the sourcing exercise you are contemplating. There’a a reason for the 80-20 rule; it makes sense. Mapping a vendor like Fred’s Diner is irrelevant. Mapping a vendor like IBM correctly and completely, with full manual oversight and control, is critical.

2. Finding Errors, Performing Q/A, Avoiding Embarrassment

How can a spend cube be vetted? It’s actually quite easy. Run a “Commodity Summary Report” (originally popularized by The Mitchell Madison Group, circa 1995). This report provides a multi-page book, one page per Commodity, showing top vendors, top GL codes, and top Cost Centers, ordered top-down by spend. Errors will jump out at you — for example, what is this GL doing associated with this Commodity? Does this Vendor really supply this Commodity? Does this Cost Center really use this Commodity?

Then invert the Commodity Summary Report to book by Vendor, showing top GL codes, top Commodities, and top Cost Centers. Errors are obvious again; why is this Commodity showing up under this Vendor? What’s the story with this GL code being associated with this Vendor? Then invert the Commodity Summary Report to book by GL code, showing top Vendors, top Commodities, and top Cost Centers. When you refine the rules set to the point where nothing jumps out at you using any of these three views, then congratulations: you have a consistent spend map that will hold up well to any outside examination. If someone crawls down into the weeds and finds an inaccurate GL mapping, simply add a rule to the appropriate group (probably Vendor), and the problem is solved. If the mapping tool is a real-time tool, as it ought to be, the problem can be solved immediately, in seconds.

[N.B. We encourage you to run the Commodity Summary Report on the results of your automatically-generated rules set. But please do it only if you are sitting down comfortably. We don’t want you to hurt yourself falling off the chair.]

3. Automated Analysis is NOT Analysis

All an automated system can do is repeat a previously identified analysis. Chances are that if the analysis was already done, the savings opportunity was already found and addressed. That means that after the analysis is done the first time, no more savings will be found. The only path to sustained savings is when a user manually analyzes their data in new and interesting ways that yield new and previously unnoticed patterns or general trends with outliers well outside the norm — as it is those outliers that represent the true savings opportunities. And sometimes the only way to find a novel savings opportunity is to allow the analyst to follow her hunches to uncover unusual spending patterns that could allow significant savings if normalized.

4. True Analysis Goes Well Beyond AP Data

Last but not least, it must be pointed out that the bulk of the (dozens of) spend analysis cubes that need to be built by the average large company are on PxQ (price x quantity) data, not on A/P data. In the PxQ case, classification is totally irrelevant; yet PxQ analysis is where the real savings and real insights occur. More on that in an upcoming Spend Analysis Series.

In our next post, we’ll review the final reason that auto-classification is not the answer.

Share This on Linked In

Clean Data Is Good …

but the ability to clean it on the fly is better!

Chain Link Research, which has been publishing some of the best thought leadership on Supply Chain Management in recent months, recently ran a piece on “contract and supplier management lessons” that summarized eight key lessons from their recent research. Seven of these are dead on and emphasize lessons I’ve been trying to impart for years (including a couple that still haven’t been learned by most of the space).

The eighth lesson, which states that data cleanliness cannot be overemphasized is correct, but overlooks the fundamental problem associated with data — it will never be 100% clean. Even if you have one hundred bodies manually reviewing and cleansing the data (which is exactly what you get if you buy a certain vendor’s solution, since that’s their unwritten strategy for dealing with all the transactions that their automated mapping algorithm is unable to classify), you’re not going to get it all right. First of all, data is always being added to the system — you’ll never be 100% up to date. Secondly, classifications need to change over time. And, most importantly, humans make mistakes and while they’ll fix some errors correctly, they’ll screw up other errors (which they may miss entirely).

The real to success is having a data analysis tool that allows you to fix an error in real time as soon as its spotted — not a traditional data warehouse where you have to wait weeks (or months) for the refresh. Then you can get away with 80% to 90% accuracy* (which is all you need to figure out where the problems really lie) because, if a supplier or customer spots an error in the data, you can say “sorry, let me fix that”, click on the transaction, click on the link that shows the rule that ultimately produced the mapping, and either (a) change the rule if it is wrong or (b) create a new exception (overlay) mapping rule if the mapping rule is normally right, but this is a special case. The report is updated, very little changes in the big picture, and you move on. That’s the way you do it.

* You can achieve this level of mapping accuracy in a matter of days, creating rules by hand, no matter how much data you have. All you have to do is apply the secret sauce of:

  1. Map the GL codes
  2. Map the top Vendors
  3. Map the Vendor + GL codes (for top Vendors who sell more than one Commodity)
  4. Map the Exceptions (for example, GL codes that always map to a particular Commodity)
  5. Map the Exceptions to the Exceptions**

** If your data is really bad or you have a really sophisticated categorization scheme.

Share This on Linked In