Category Archives: Spend Analysis

Analytics IV: OLAP: The Imperfect Answer

Today’s post is by Eric Strovink of BIQ.

When relational database technology breaks down, as it does on any sizeable transaction-oriented dataset when multiway joins on millions of records are required, the answer is, essentially, to “cheat”. At the risk of dumbing down some pretty complex technology (and the work of some extremely smart people), the usual idea1 is to pre-aggregate totals in advance of the query, so that most of the work of the multi-way joins has been done in advance. This is called “OLAP” — an acronym that is unfortunate at best (“OnLine Analytical Processing”).

OLAP databases solve some data analysis problems, in particular slow joins, but only for certain columns in the dataset, and only for datasets that contain transactional data. So, many of the intrinsic problems of the data warehouse are exacerbated in an OLAP database, because the OLAP database is even more special purpose, and its schema very rigorously constrains the queries that can be expected to work efficiently.

Building OLAP databases is therefore harder than building general-purpose relational databases, and thinking in OLAP terms is also harder than thinking relationally. Deciding what columns are “interesting” is challenging as well, and also time-consuming; by the time the OLAP dataset is built, and you decide the column is “uninteresting”, you may have wasted considerable effort.

But OLAP datasets do provide one major advantage, and that’s the ability to “slice and dice” data rapidly, with visual impact and in human-understandable terms. OLAP viewers give users great visibility into data relationships, and enable exploration of large datasets without any need for IT expertise. That’s primarily what “Business Intelligence” or “BI” tools bring to the party: the ability to navigate OLAP datasets for insight.

So, OLAP solves some problems, but fails to solve others. Here is a short (and incomplete) list of significant issues:

  • A dependence (typically) on a(nother) fixed database schema
  • Another level of schema complexity to manage, in addition to the underlying database schema
  • Another level of inflexibility, in that changing the OLAP database organization is often even more difficult than changing the underlying database schema
  • Another level of complexity in SQL queries (called “multidimensional SQL”, or MDX) must be used, that is much harder to comprehend than ordinary SQL.

In the procurement space, OLAP databases are often used for “spend analysis,” but more on that topic in part V.

Previous: Analytics III: The Data Expert and His Warehouse

Next: Analytics V: Spend “Analysis”

1There are many approaches to OLAP.

Share This on Linked In

Spend Analysis Is Not Strategic. It Isn’t Always Strategic! Part II

That’s right, in and of itself spend analysis is Not strategic. This isn’t to say that spend analysis isn’t one of the most important actions that your supply chain can take in its effort to reduce costs, improve efficiency, and make the most effective use of business resources, but that the art of simply doing a spend analysis is not strategic.

Spend analysis provides a picture of the products and services the organization is spending money on, whom the products and services are being bought from, the organizational buyers who are spending the money, where the products and services are being bought from, and where the products and services are being shipped to and/or utilized. But this process is not strategic — it’s tactical. Furthermore, this information alone is not strategic. Let’s say the organization is spending 2M on computing equipment. So what? On it’s own, this information is not strategic. And unless the spend is significant (at least 1% of organizational spend) and the number one goal is to reduce total organizational spend by 5%, or the equipment needs to be unique (the organization’s proprietary trading platform only runs on hardware that natively supports AIX Unix), it’s not going to be used strategically. If the analyst compares spend to market prices and determines that reasonable savings are available (5% to 15%), the decision might be to run a sourcing event, but if it’s just another cookie-cutter RFX/Reverse Auction and/or TCO optimization with the same supplier base, it’s not strategic.

And then there’s the most common use of spend analysis in an organization that knows how to use it. Ad-hoc queries to determine if a (duplicate) invoice is being paid twice, if the wrong amount was paid to a vendor, if a department is on budget, if a category has enough spend to warrant a sourcing event, etc. Not strategic. Very important, but not strategic.

The reality is that very few events are strategic, because very vew categories are strategic. Unless it’s a unique product or service, unless the spend is a significant percentage of organizational spend, unless the product or service directly relates to a (long-term) organizational goal, or unless you’re looking for a strategic-partner to share in development, production, costs, or risk (mitigation)s, it’s probably not strategic. It’s probably still important, because every cent and resource counts in today’s economy, but let’s stop confusing tactical with strategic.

Share This on Linked In

Analytics III: The Data Expert and His Warehouse

Today’s post is by Eric Strovink of BIQ.

Nothing is potentially more dangerous to an enterprise than the “data expert” and his data warehouse. In the data expert’s opinion, every question can be answered with an SQL query; any custom report can be written easily; and any change to the database schema can be accomplished as necessary. Everything of interest can be added to the warehouse; the warehouse will become the source of all knowledge and wisdom for the company; life will be good.

How many times have we heard this, and how many times has this approach failed to live up to expectations? Problem is, business managers usually feel that they don’t have the background or experience to challenge IT claims. There’s an easy way to tell if you’re being led down the rose-petaled path by a data analysis initiative, and it’s this: if your “gut feel” tells you that the initiative’s claims are impossibly optimistic, or if common sense tells you that what you’re hearing can’t possibly be true (because, for example, if it’s that easy, then why isn’t everyone else doing it), then go with your gut.

Sometimes the reflexive response of management to an IT claim is to say, “OK, prove it“. Unfortunately, challenging a data expert to perform a particular analysis is pointless, because any problem can be solved with sufficient effort and time. I recall an incident at a large financial institution, where an analyst (working for an outsourcer who shall remain nameless) made the claim that he could produce a particular complex analysis using (let’s be charitable and not mention this name, either) the XYZ data warehouse. So, sure enough, he went away for a week and came back triumphantly waving the report.

Fortunately for the enterprise, the business manager who issued the challenge was prepared for that outcome. He immediately said, “OK, now give me the same analysis by …“, and he outlined a number of complex criteria. The analyst admitted that he’d need to go away for another week for each variant, and so he was sent packing.

It’s not really the data expert’s fault. Most computer science curricula include “Introduction to Database Systems” or some analog thereof; and in this class, the wonders and joys of relational database technology are employed to tackle one or more example problems. Everything works as advertised; joins between tables are lickety-split; and the new graduate sallies forth into the job market full of confidence that the answer to every data analysis problem is a database system.

In so many applications this is exactly the wrong answer. The lickety-split join on the sample database that worked so well during “Introduction to Database Systems,” in the real world turns into a multi-hour operation that can bring a massive server to its knees. The report that “only” takes “a few minutes” may turn out to need many pages of output, each one a variant of the original; so the “few minutes” turns into hours.

Consider the humble cash register at your local restaurant. Is it storing transactions in a database, and then running a report on those transactions to figure out how to cash out the servers? No, of course it isn’t. Because if it did, the servers would be standing in line at the end of the night, waiting for the report to be generated. A minute or two per report — not an unreasonable delay for a database system chewing through transactional data on a slow processor — means an unacceptable wait. That’s why that humble restaurant cash register is employing some pretty advanced technology: carefully “bucketizing” totals by server, on the fly, so that it can spit out the report at the end of the night in zero time.

We’ll talk about “bucketizing” — otherwise known as “OLAP” — in part IV.

Previous: Analytics II: What is Analysis?

Next: Analytics IV: OLAP: The Imperfect Answer

Share This on Linked In

Analytics II: What is Analysis?

Today’s post is by Eric Strovink of BIQ.

Ask a statistician or an applied mathematician, and she’ll probably tell you that analysis is either (1) building predictive models based on historical data, or (2) deciding whether past events are statistically significant (i.e., ascertaining whether what actually happened is sufficiently different than what might have happened by random chance).

But most of us aren’t applied mathematicians or statisticians, so we can get into trouble very easily. For example, we typically haven’t got a particular hypothesis to test (which is critical), and that means any patterns we might “find” are immediately suspect. That’s because in any dataset one can always come up with a hypothesis that generates significant results if one looks hard enough. With regard to predictions, we generally aren’t confident about the predictive power of our models, because we are neither facile with advanced predictive modeling techniques, nor do we have access (in general) to a sufficiently large sample of “known outcomes” to which to compare our predictions. Without a massive dataset like that provided by the Netflix Prize competition, there is no hope of refining a solution.

Of course, practical analysis work can be done without any advanced statistical or modeling techniques. Practical analysis boils down to “finding stuff in your data” that you either didn’t know about, or weren’t sufficiently aware of. That’s the basis of what business analysts do every day. Which salespeople are selling, and which aren’t? What products are selling where, and what aren’t? What was their profit margin, and why? What are the costs associated with running the business, and are they reasonable or unreasonable? And so on.What’s required in order to come up with these answers is well understood:

  1. Acquire data from one or more sources.
  2. Transform like data sources into a common format, and link unlike-but-related data sources together with common keys (or computed expressions that result in common keys).
  3. Create a schema for the data sources, obeying the conventions of a [selected] database system.
  4. Load the data sources into the database system.
  5. Issue queries against the database, and, when useful, format the results into reports.

Steps 1 through 4 are accomplished out-of-the-box by every ERP or accounting system, although only for a small subset of the useful data in an organization. Step 5 is also accomplished by ERP or accounting systems, on that same subset of data, but (historically) rather poorly. That’s why there has been such a large market for “Business Intelligence” or “BI” tools that put some necessary functionality back into Step 5.

However, when the data aren’t generated by the system that’s reporting on them, or aren’t resident in one of a handful of ERP systems to which a BI system can attach automatically, then we hit the essential problem with business data analysis. This problem is either ignored or deliberately misunderstood by most IT organizations, and it’s simply this: business analysts, in general, are either unwilling or unable to accomplish the following:

  • Transform data;
  • Create database schemata;
  • Load database tables;
  • Issue SQL queries.

And, even if they can accomplish those steps, exploratory analysis usually can’t be justified by management because the above process takes too long (and therefore costs too much, causing the expected value of the analysis to be negative). Which means, IT departments, that you can buy the business people all the data warehouse tools you want, and it won’t make a whisker’s bit difference with respect to their ability to analyze data. Sure, you could hire a data expert to help them, but that won’t work either (I’ll save that explanation for part III).

Previous: Analytics I: Optimization Comes of Age

Next: Analytics III: The Data Expert and His Warehouse

Share This on Linked In

Analytics I: Optimization Comes of Age

Today’s post is by Eric Strovink of BIQ.

I remember my first experience with optimization. I was taken to a guidance counsellor’s office at my local high school, where a special terminal was set up. This terminal was connected to a system that would allegedly try to find the “best” college for me. It asked many questions. Questions like, “Would you prefer a warm climate?” and “Would you prefer an academic setting with equal numbers of men and women?” Well, duh. Those were easy answers.

My goal was to attend one of the premier engineering schools in the US. I wanted MIT or CalTech or Stanford or Carnegie Mellon. I’d be happy with Rice. If my grades or scores weren’t good enough for the snooty super-competitive schools, I’d try for Rensselaer or Northeastern.

The system ended up choosing an entirely unsuitable school, evidently equally weighing my academic preferences and my social and geographic preferences.

What’s my point? Well, in a microcosm, this has been the essential problem with optimization. When you provide a “constraint” — and let’s be precise, here, the term really is “constraint” — an optimizer will not look outside that constraint for options. It cannot. It is a mathematical engine, and it can’t read your mind and figure out which is a “soft” requirement and which is a “hard” requirement. As far as it’s concerned, they’re all requirements, and, by whatever God you (don’t) believe in, it will find a solution that fits those requirements, if there is one.

That’s one reason why optimization has struggled to find its way.

I was listening to my wife talking to a survey telemarketer the other day. She said, “I really don’t have an opinion about Blue Cross’s responsiveness to patient needs. I’ve never had Blue Cross.” There was a pause. Then she said, “But how can I have an opinion on a 1 to 10 scale, if I’ve never used them?” There was another pause. She said, “OK, but ….” There was another pause. She sighed, and said, “OK, 5.”

What’s my point? Well, do you really know the answer to what kind of constraints you should impose on your optimization model? Or are you supplying an answer because you don’t know the answer, but you have to supply something? And after the optimization model has solved, can you remember all the places where you guessed, but you didn’t really know? What if you forgot one of those places? And what if that one guess caused the model to solve in a really non-optimal way (non-optimal from your perspective, not its)?

That’s another reason why optimization has struggled to find its way.

The breakthrough has come with what I’ll term “guided optimization”. If you hike in the White Mountains of New Hampshire, for example, you have a large number of excellent trails to choose from. Many of them are safe climbs that lead to outstanding views and vistas; but others lead up steep, often wet cliffs that are unsuitable for casual hiking. You need a guide; in this case, any of the excellent guide books from the Appalachian Mountain Club. In the case of optimization, your guide usually needs to be an experienced practitioner who can help you set up your model, show you how to move constraints to find inflection points in your model, and so on. (The good news is that lots of vendors provide guided services now, and it isn’t that expensive. Especially when you consider that optimization can be incredibly valuable.)

Companies that provide guided optimization services, like Trade Extensions, have enjoyed solid growth and have left a legacy of satisfied customers. You can always use optimization software on your own (Trade Extensions is no exception); but until you really understand what you’re doing, it can be unwise.

Optimization vendors have claimed for years that their systems are usable by novices. I don’t dispute that there are cases where this is true, and has been true. But for me, it’s a case of crying wolf: there have been so many claims, for so many years, with so many tears, that I’m solidly in the “get a guide” camp. I do hope, though, that optimization vendors will take additional steps to make guidance unnecessary. the doctor has assembled a pretty comprehensive list of what needs to happen.

At the end of the day, if you can’t do analysis yourself, you’re less likely to do it at all; which, as you’ll see in the next installment, is the theme of this series.

Next: Analytics II: What is Analysis?

Share This on Linked In