Daily Archives: February 6, 2007

Spend Analysis VI: New Horizons (Part 2)

Today I’d like to welcome back Eric Strovink of BIQ who, as I indicated in part I of this series, is authoring the first part of this series on next generation spend analysis and why it is more than just basic spend visibility. Much, much more!

Federation

One of the most serious limitations of OLAP analysis is
the schema structure itself — typically a “star” schema,
where a voluminous “fact” or “transaction” file is surrounded
by supporting files, or “dimensions.” In the case of spend
analysis, dimensions are Supplier, Cost Center, Commodity,
and so on; transactions are typically AP records.

Why is this schema limiting? Because there are only certain ways
that a dimension file can be linked to transaction files, and it
isn’t always clear which file ought to be the transaction file
and which files ought to be dimensions. For example, suppose
that the transaction file consists of AP transactions, and a
dimension file consists of invoice line items. The problem is that the invoice line item
file is “moving faster” than the AP file; i.e. for every invoice number
that appears in AP, there are multiple invoice lines that match.
Which invoice line item should we link to?

Well, we could invert the problem and build the dataset from the invoice detail file
instead, except that we typically won’t have invoice detail for
every AP record, so that probably won’t work. Here’s a couple of ideas
that will work: (1) we could build a separate measure column for invoice
line items, and include them as AP record equivalents (coercing
the two record types into a common format); (2) we could drop the
associated AP record whenever we have invoice line item data,
and include the AP information inside those line items, redundantly.

There are other options, too.

But the essential problem is that we have two separate
datasets, and we’re trying to join them at the hip. There is an
AP dataset, and there is an invoice line item dataset, and
never the twain shall meet, except artificially.
Even when there is no granularity issue at all, and when one dataset
can be normalized or snowflaked such that every matching line
item can be joined through from the other, the amount of effort
required to set up the index->index->index relationships can be daunting.

Instead, why not create two separate datasets, efficiently
and quickly; and then as a final step, federate them together on
a common dimension? Suppose the federation logic was “join” — in
that case, we’d drill on an element in dataset A; dataset B would
drill on the common dimension from A; and then A would drill again
on the common dimension from B. What we’d see is the perfect join
of all of the records from A and from B that shared a common key
in the common dimension; and we’d have the ability to reference
all data from any dimension of both A and B.

There are many forms of federation in addition to join —
for example, “master-slave,” where we drill on A, and B shows us
its common nodes; but does not feed those back to A. That relationship
can go the other way, as well, from B to A. In addition, there’s a
“disjoint” operation — show me all the nodes in B that don’t share a
key in the common dimension with A (and vice versa).

Federation represents a key productivity enhancer
for dataset creation, as well as a simplification to
the dataset building process in general. Federation also passes the
“usability” litmus test, in that the resulting datasets are much
easier to understand than massive levels of index indirection and
snowflaking, and have the potential to produce richer results.

The technical challenges for federation are considerable: maintaining
multiple connections to multiple datasets; representing multiple data
dimensions inside the context of a single data viewer; providing mechanisms
for pulling data seamlessly from multiple datasets for reports and
analyses; and last but not least, augmenting the OLAP engine to perform
federation operations effectively and quickly.

Is federation worth it?
I think so, emphatically.

This brings to an end our initial Spend Analysis series.
Thanks for the opportunity, Michael; and thanks to everyone who
took the time to wade through it.

As Eric said, this ends Sourcing Innovation’s initial series on spend analysis. I’d like to thank Eric for his enlightening posts and hope that you learned something from them.