Today’s post is by Eric Strovink of BIQ.
When relational database technology breaks down, as it does on any sizeable transaction-oriented dataset when multiway joins on millions of records are required, the answer is, essentially, to “cheat”. At the risk of dumbing down some pretty complex technology (and the work of some extremely smart people), the usual idea1 is to pre-aggregate totals in advance of the query, so that most of the work of the multi-way joins has been done in advance. This is called “OLAP” — an acronym that is unfortunate at best (“OnLine Analytical Processing”).
OLAP databases solve some data analysis problems, in particular slow joins, but only for certain columns in the dataset, and only for datasets that contain transactional data. So, many of the intrinsic problems of the data warehouse are exacerbated in an OLAP database, because the OLAP database is even more special purpose, and its schema very rigorously constrains the queries that can be expected to work efficiently.
Building OLAP databases is therefore harder than building general-purpose relational databases, and thinking in OLAP terms is also harder than thinking relationally. Deciding what columns are “interesting” is challenging as well, and also time-consuming; by the time the OLAP dataset is built, and you decide the column is “uninteresting”, you may have wasted considerable effort.
But OLAP datasets do provide one major advantage, and that’s the ability to “slice and dice” data rapidly, with visual impact and in human-understandable terms. OLAP viewers give users great visibility into data relationships, and enable exploration of large datasets without any need for IT expertise. That’s primarily what “Business Intelligence” or “BI” tools bring to the party: the ability to navigate OLAP datasets for insight.
So, OLAP solves some problems, but fails to solve others. Here is a short (and incomplete) list of significant issues:
- A dependence (typically) on a(nother) fixed database schema
- Another level of schema complexity to manage, in addition to the underlying database schema
- Another level of inflexibility, in that changing the OLAP database organization is often even more difficult than changing the underlying database schema
- Another level of complexity in SQL queries (called “multidimensional SQL”, or MDX) must be used, that is much harder to comprehend than ordinary SQL.
In the procurement space, OLAP databases are often used for “spend analysis,” but more on that topic in part V.
Previous: Analytics III: The Data Expert and His Warehouse
Next: Analytics V: Spend “Analysis”
1There are many approaches to OLAP.