Auto-Classification is NOT the Answer, Part II

Today’s post is co-authored by Eric Strovink of BIQ.

In Part I, we overviewed the four of the major reasons why auto-classification is not the answer, namely:

  1. Automatically Generated Rule Sets are Difficult (if not Impossible) to Maintainand after a few days of trying, you might just go mad
  2. The Mapping is Rife with Errorsrunning a simple “Commodity Summary Report” after the first “auto-mapping” pass will reveal so many errors that it will knock you off your seat
  3. Automated Analysis is NOT Analysisas all an “automated” analysis can do is run a previously defined report
  4. True Analyis Goes Well Beyond AP Dataand there are considerably more opportunities in PxQ (price-by-quantity) data

However, even if all of this weren’t true, there is still one very good reason not to use automated classification, and that is:

5. Classification is Easy

The “secret sauce” of Commodity mapping has been known for over two decades. Create a hierarchical rules overlay, where oneset of rules overrides the next, as follows:

  1. Map the GL codes
  2. Map the top Vendors
  3. Map the Vendor + GL codes (for top Vendors who sell more than one Commodity)
  4. Map the Exceptions (for example, GL codes that always map to a particular Commodity)
  5. Map the Exceptions to the Exceptions

Why does this method work? It works because the “tail” of the distribution, which is spend you can’t afford to source and don’t care about, ends up being weakly mapped via GL codes by group 1. The vendors you actually care about, in a first-order mapping exercise perhaps the top 500 or 1000 by spend, are very carefully mapped in group 2; and if they provide more than one Commodity (service or product), they are mapped even more carefully again in group 3. Groups 4-N cover exceptions — such as the case where a particular vendor in a particular geography is “known” to provide only one Commodity. Note that this type of knowledge is known only by you — no automatic classifier could possibly know this, and therefore no automatic classifier can take advantage of such knowledge.

Note that errors do not creep into this process. It is hard to make a mistake, and it’s obvious where the mistake has been made when it is made.That’s why the work can be done, by hand, to over 97% accuracy by a clerk in just a few days even in the largest of Fortune 500s. Why? Because the clerk does not have to think, just map. And once the mapping is done, it’s done, and it’s accurate. The rules are saved and never have to be modified, and their interaction with each other is easy to understand. The only changes that will ever be required are

  1. new rules when new GL codes or vendors are introduced,
  2. archival of old rules when old GL codes or vendors are retired, and
  3. new exception rules when mapping errors are discovered.

And if by some chance a user can’t find the time to map spend, then “auto-mapping” (or an outsourced manual mapping effort) should be required to produce the above rule groups. That way, the user can add to and modify the rule groups by hand, using the same tools, when errors in the mapping are discovered. The tool should not be reclassifying data automatically to new rules generated on every cube refresh (which is what could happen if the classifier is, for example, using genetic algorithms for mapping rules).

Why use an error-ridden auto-classification process when you can do it error free the first time, by hand, in a few days, and get immeasurably better results?

Share This on Linked In