Today I’d like to welcome back Eric Strovink of BIQ (acquired by Opera Solutions, rebranded ElectifAI) who, as I indicated in part I of this series, is authoring the first part of this series on next generation spend analysis and why it is more than just basic spend visibility. Much, much more!
Federation
One of the most serious limitations of OLAP analysis is the schema structure itself — typically a “star” schema, where a voluminous “fact” or “transaction” file is surrounded by supporting files, or “dimensions.” In the case of spend analysis, dimensions are Supplier, Cost Center, Commodity, and so on; transactions are typically AP records.
Why is this schema limiting? Because there are only certain ways that a dimension file can be linked to transaction files, and it isn’t always clear which file ought to be the transaction file and which files ought to be dimensions. For example, suppose that the transaction file consists of AP transactions, and a dimension file consists of invoice line items. The problem is that the invoice line item file is “moving faster” than the AP file; i.e. for every invoice number that appears in AP, there are multiple invoice lines that match. Which invoice line item should we link to?
Well, we could invert the problem and build the dataset from the invoice detail file instead, except that we typically won’t have invoice detail for every AP record, so that probably won’t work. Here’s a couple of ideas that will work: (1) we could build a separate measure column for invoice line items, and include them as AP record equivalents (coercing the two record types into a common format); (2) we could drop the associated AP record whenever we have invoice line item data, and include the AP information inside those line items, redundantly.
There are other options, too.
But the essential problem is that we have two separate datasets, and we’re trying to join them at the hip. There is an AP dataset, and there is an invoice line item dataset, and never the twain shall meet, except artificially. Even when there is no granularity issue at all, and when one dataset can be normalized or snowflaked such that every matching line item can be joined through from the other, the amount of effort required to set up the index->index->index relationships can be daunting.
Instead, why not create two separate datasets, efficiently and quickly; and then as a final step, federate them together on a common dimension? Suppose the federation logic was “join” — in that case, we’d drill on an element in dataset A; dataset B would drill on the common dimension from A; and then A would drill again on the common dimension from B. What we’d see is the perfect join of all of the records from A and from B that shared a common key in the common dimension; and we’d have the ability to reference all data from any dimension of both A and B.
There are many forms of federation in addition to join — for example, “master-slave,” where we drill on A, and B shows us its common nodes; but does not feed those back to A. That relationship can go the other way, as well, from B to A. In addition, there’s a “disjoint” operation — show me all the nodes in B that don’t share a key in the common dimension with A (and vice versa).
Federation represents a key productivity enhancer for dataset creation, as well as a simplification to the dataset building process in general. Federation also passes the “usability” litmus test, in that the resulting datasets are much easier to understand than massive levels of index indirection and snowflaking, and have the potential to produce richer results.
The technical challenges for federation are considerable: maintaining multiple connections to multiple datasets; representing multiple data dimensions inside the context of a single data viewer; providing mechanisms for pulling data seamlessly from multiple datasets for reports and analyses; and last but not least, augmenting the OLAP engine to perform federation operations effectively and quickly.
Is federation worth it? I think so, emphatically.
This brings to an end our initial Spend Analysis series. Thanks for the opportunity, Michael; and thanks to everyone who took the time to wade through it.
As Eric said, this ends Sourcing Innovation’s initial series on spend analysis. I’d like to thank Eric for his enlightening posts and hope that you learned something from them.