Category Archives: Spend Analysis

The Time for One Vision is … Not Any Time Soon

Today’s guest post is from Eric Strovink of BIQ.

the doctor has asked “Has the time for one vision arrived?” The point of his post is contained in the last paragraph¹, and boils down to whether “Best of Breed” solutions in the supply chain space are “good” or “bad” from an “integrated data” perspective.

It is certainly the case that there are a lot of software solutions that boil down to nothing more than a custom database fronted by a UI and a report writer. Such systems are rather easy to build; in theory, they can be built (as the doctor has pointed out in jest previously) using a VBA programmer and a Microsoft Access database. Jesting aside, home-brew solutions can exceed both the functionality and usability of so-called “enterprise” solutions. For example, Access-based 1990’s-era spend analysis leave-behinds from consulting organizations such as the old Mitchell Madison Group are still running in some large companies today, and, I daresay, are still superior to many current solutions.

Since there’s a pretty low technical bar to producing YAS (Yet Another Solution), there are grounds for hand-wringing when trying to keep track of them all, and of all the disparate data they are managing.But it really doesn’t matter whether a software product is built by in-house resources using Access and VBA, or by an international team of professional programmers using J2EE/Flex/Silverlight/Ajax/etc. and delivered via the browser, because the answer to the doctor’s question is simple:

Until there is a major, earthshaking change in the technology of database systems, the notion of an “integrated” enterprise-wide data store is pure fantasy.

Why? Because, as the post points out, “each data source [is using] a different coding and indexing scheme, [and] there is no common framework that connects the applications.” And that’s all she wrote, folks. You can’t store egg nogg in a fruit basket. It’s just not going to work.

Now, kudos to Coupa and others for “opening their API” (meaningful for programmers, not so much for ordinary humans) and so forth, but there is at present no way to integrate disparate, unrelated data into some centralized data store, without losing all the detail in the process. I don’t care if it’s all “spend” data, either. Slapping a label on something doesn’t make it homogeneous. I’m a bit of an expert on spend data, and I can assure you that spend data comes in all shapes and sizes and is certainly not homogeneous, whether you run an e-procurement system or you do not.²

And, of course, both old and new database vendors have been claiming for years to be able to integrate disparate data sources across the enterprise. Sure, if you want to join a few records across disparate databases that share common keys, there’s demo-ware that they can show you. It works great. But try a multi-way join across millions of records across disparate databases, and I’ll join you for a beer in the year 2025 when the query finishes.

So, I’ll steal the thunder from the “future post” mentioned by the doctor and jump right to the conclusion:

Don’t worry about “integrating” data, because it’s not going to work the way you hope it will. At best, you will end up with inadequate compromises and uselessly generic data, like a design-by-committee spend cube that is shelf-ware after six months.
Do worry about being able to move data easily in and out of the systems that you have. Don’t allow vendors to “lock up” your data; you should be able to change platforms easily, whenever you want to.
Do worry about flexibility and adaptability in your analysis system. You should be able to operate it yourself, for example. If your data is locked up behind some SQL database that only IT drones can access, it isn’t doing you any good at all.
Do worry about being able to move data from [anywhere] to your analysis system, quickly and easily.³

Let’s see what the doctor thinks, when he gets around to it.

¹Apparently the doctor has never taken Journalism 101. But we can forgive him, since he doesn’t pretend to be a journalist.

²There are new ideas like “semantic database systems”; but a quick glance at recent history will show how well that works out in practice (Jason Busch over at Spend Matters, for example, made the mistake of drinking the semantic search Kool Aid with the now-defunct Spend Matters Navigator).

³Also, it’s important to clarify the notion that “real time” access to data is required for procurement decisions. No procurement decision needs to be made in real time. Is this a Hollywood science fiction movie where we need to dodge laser blasts from Tie fighters zooming in from all angles? No, it’s the real world, and decisions can and should be made thoughtfully and carefully. When the doctor says “real time,” I would hope that he means that there should be access to the data and answers to questions without waiting a week or a month for some analyst to write software.

Share This on Linked In

Analytics IV: OLAP: The Imperfect Answer

Today’s post is by Eric Strovink of BIQ.

When relational database technology breaks down, as it does on any sizeable transaction-oriented dataset when multiway joins on millions of records are required, the answer is, essentially, to “cheat”. At the risk of dumbing down some pretty complex technology (and the work of some extremely smart people), the usual idea¹ is to pre-aggregate totals in advance of the query, so that most of the work of the multi-way joins has been done in advance. This is called “OLAP” — an acronym that is unfortunate at best (“OnLine Analytical Processing”).

OLAP databases solve some data analysis problems, in particular slow joins, but only for certain columns in the dataset, and only for datasets that contain transactional data. So, many of the intrinsic problems of the data warehouse are exacerbated in an OLAP database, because the OLAP database is even more special purpose, and its schema very rigorously constrains the queries that can be expected to work efficiently.

Building OLAP databases is therefore harder than building general-purpose relational databases, and thinking in OLAP terms is also harder than thinking relationally. Deciding what columns are “interesting” is challenging as well, and also time-consuming; by the time the OLAP dataset is built, and you decide the column is “uninteresting”, you may have wasted considerable effort.

But OLAP datasets do provide one major advantage, and that’s the ability to “slice and dice” data rapidly, with visual impact and in human-understandable terms. OLAP viewers give users great visibility into data relationships, and enable exploration of large datasets without any need for IT expertise. That’s primarily what “Business Intelligence” or “BI” tools bring to the party: the ability to navigate OLAP datasets for insight.

So, OLAP solves some problems, but fails to solve others. Here is a short (and incomplete) list of significant issues:

A dependence (typically) on a(nother) fixed database schema
Another level of schema complexity to manage, in addition to the underlying database schema
Another level of inflexibility, in that changing the OLAP database organization is often even more difficult than changing the underlying database schema
Another level of complexity in SQL queries (called “multidimensional SQL”, or MDX) must be used, that is much harder to comprehend than ordinary SQL.

In the procurement space, OLAP databases are often used for “spend analysis,” but more on that topic in part V.

Previous: Analytics III: The Data Expert and His Warehouse

Next: Analytics V: Spend “Analysis”

¹There are many approaches to OLAP.

Share This on Linked In

Spend Analysis Is Not Strategic. It Isn’t Always Strategic! Part II

That’s right, in and of itself spend analysis is Not strategic. This isn’t to say that spend analysis isn’t one of the most important actions that your supply chain can take in its effort to reduce costs, improve efficiency, and make the most effective use of business resources, but that the art of simply doing a spend analysis is not strategic.

Spend analysis provides a picture of the products and services the organization is spending money on, whom the products and services are being bought from, the organizational buyers who are spending the money, where the products and services are being bought from, and where the products and services are being shipped to and/or utilized. But this process is not strategic — it’s tactical. Furthermore, this information alone is not strategic. Let’s say the organization is spending 2M on computing equipment. So what? On it’s own, this information is not strategic. And unless the spend is significant (at least 1% of organizational spend) and the number one goal is to reduce total organizational spend by 5%, or the equipment needs to be unique (the organization’s proprietary trading platform only runs on hardware that natively supports AIX Unix), it’s not going to be used strategically. If the analyst compares spend to market prices and determines that reasonable savings are available (5% to 15%), the decision might be to run a sourcing event, but if it’s just another cookie-cutter RFX/Reverse Auction and/or TCO optimization with the same supplier base, it’s not strategic.

And then there’s the most common use of spend analysis in an organization that knows how to use it. Ad-hoc queries to determine if a (duplicate) invoice is being paid twice, if the wrong amount was paid to a vendor, if a department is on budget, if a category has enough spend to warrant a sourcing event, etc. Not strategic. Very important, but not strategic.

The reality is that very few events are strategic, because very vew categories are strategic. Unless it’s a unique product or service, unless the spend is a significant percentage of organizational spend, unless the product or service directly relates to a (long-term) organizational goal, or unless you’re looking for a strategic-partner to share in development, production, costs, or risk (mitigation)s, it’s probably not strategic. It’s probably still important, because every cent and resource counts in today’s economy, but let’s stop confusing tactical with strategic.

Share This on Linked In

Analytics III: The Data Expert and His Warehouse

Today’s post is by Eric Strovink of BIQ.

Nothing is potentially more dangerous to an enterprise than the “data expert” and his data warehouse. In the data expert’s opinion, every question can be answered with an SQL query; any custom report can be written easily; and any change to the database schema can be accomplished as necessary. Everything of interest can be added to the warehouse; the warehouse will become the source of all knowledge and wisdom for the company; life will be good.

How many times have we heard this, and how many times has this approach failed to live up to expectations? Problem is, business managers usually feel that they don’t have the background or experience to challenge IT claims. There’s an easy way to tell if you’re being led down the rose-petaled path by a data analysis initiative, and it’s this: if your “gut feel” tells you that the initiative’s claims are impossibly optimistic, or if common sense tells you that what you’re hearing can’t possibly be true (because, for example, if it’s that easy, then why isn’t everyone else doing it), then go with your gut.

Sometimes the reflexive response of management to an IT claim is to say, “OK, prove it“. Unfortunately, challenging a data expert to perform a particular analysis is pointless, because any problem can be solved with sufficient effort and time. I recall an incident at a large financial institution, where an analyst (working for an outsourcer who shall remain nameless) made the claim that he could produce a particular complex analysis using (let’s be charitable and not mention this name, either) the XYZ data warehouse. So, sure enough, he went away for a week and came back triumphantly waving the report.

Fortunately for the enterprise, the business manager who issued the challenge was prepared for that outcome. He immediately said, “OK, now give me the same analysis by …“, and he outlined a number of complex criteria. The analyst admitted that he’d need to go away for another week for each variant, and so he was sent packing.

It’s not really the data expert’s fault. Most computer science curricula include “Introduction to Database Systems” or some analog thereof; and in this class, the wonders and joys of relational database technology are employed to tackle one or more example problems. Everything works as advertised; joins between tables are lickety-split; and the new graduate sallies forth into the job market full of confidence that the answer to every data analysis problem is a database system.

In so many applications this is exactly the wrong answer. The lickety-split join on the sample database that worked so well during “Introduction to Database Systems,” in the real world turns into a multi-hour operation that can bring a massive server to its knees. The report that “only” takes “a few minutes” may turn out to need many pages of output, each one a variant of the original; so the “few minutes” turns into hours.

Consider the humble cash register at your local restaurant. Is it storing transactions in a database, and then running a report on those transactions to figure out how to cash out the servers? No, of course it isn’t. Because if it did, the servers would be standing in line at the end of the night, waiting for the report to be generated. A minute or two per report — not an unreasonable delay for a database system chewing through transactional data on a slow processor — means an unacceptable wait. That’s why that humble restaurant cash register is employing some pretty advanced technology: carefully “bucketizing” totals by server, on the fly, so that it can spit out the report at the end of the night in zero time.

We’ll talk about “bucketizing” — otherwise known as “OLAP” — in part IV.

Previous: Analytics II: What is Analysis?

Next: Analytics IV: OLAP: The Imperfect Answer

Share This on Linked In

Analytics II: What is Analysis?

Today’s post is by Eric Strovink of BIQ.

Ask a statistician or an applied mathematician, and she’ll probably tell you that analysis is either (1) building predictive models based on historical data, or (2) deciding whether past events are statistically significant (i.e., ascertaining whether what actually happened is sufficiently different than what might have happened by random chance).

But most of us aren’t applied mathematicians or statisticians, so we can get into trouble very easily. For example, we typically haven’t got a particular hypothesis to test (which is critical), and that means any patterns we might “find” are immediately suspect. That’s because in any dataset one can always come up with a hypothesis that generates significant results if one looks hard enough. With regard to predictions, we generally aren’t confident about the predictive power of our models, because we are neither facile with advanced predictive modeling techniques, nor do we have access (in general) to a sufficiently large sample of “known outcomes” to which to compare our predictions. Without a massive dataset like that provided by the Netflix Prize competition, there is no hope of refining a solution.

Of course, practical analysis work can be done without any advanced statistical or modeling techniques. Practical analysis boils down to “finding stuff in your data” that you either didn’t know about, or weren’t sufficiently aware of. That’s the basis of what business analysts do every day. Which salespeople are selling, and which aren’t? What products are selling where, and what aren’t? What was their profit margin, and why? What are the costs associated with running the business, and are they reasonable or unreasonable? And so on.What’s required in order to come up with these answers is well understood:

Acquire data from one or more sources.
Transform like data sources into a common format, and link unlike-but-related data sources together with common keys (or computed expressions that result in common keys).
Create a schema for the data sources, obeying the conventions of a [selected] database system.
Load the data sources into the database system.
Issue queries against the database, and, when useful, format the results into reports.

Steps 1 through 4 are accomplished out-of-the-box by every ERP or accounting system, although only for a small subset of the useful data in an organization. Step 5 is also accomplished by ERP or accounting systems, on that same subset of data, but (historically) rather poorly. That’s why there has been such a large market for “Business Intelligence” or “BI” tools that put some necessary functionality back into Step 5.

However, when the data aren’t generated by the system that’s reporting on them, or aren’t resident in one of a handful of ERP systems to which a BI system can attach automatically, then we hit the essential problem with business data analysis. This problem is either ignored or deliberately misunderstood by most IT organizations, and it’s simply this: business analysts, in general, are either unwilling or unable to accomplish the following:

Transform data;
Create database schemata;
Load database tables;
Issue SQL queries.

And, even if they can accomplish those steps, exploratory analysis usually can’t be justified by management because the above process takes too long (and therefore costs too much, causing the expected value of the analysis to be negative). Which means, IT departments, that you can buy the business people all the data warehouse tools you want, and it won’t make a whisker’s bit difference with respect to their ability to analyze data. Sure, you could hire a data expert to help them, but that won’t work either (I’ll save that explanation for part III).

Previous: Analytics I: Optimization Comes of Age

Next: Analytics III: The Data Expert and His Warehouse

Share This on Linked In