Originally posted on the Synertrade blog in April, 2018.
Right now you’re probably thoroughly confused, but that’s okay. Because if I just entitled this article with the core topic of “data harmonization”, you’d be equally confused. And at least this title asks a relevant question.
So what is data harmonization? Technically, it refers to the effort to combine data from different sources into one consistent view that can be provided to a user in a comprehensible, and sensible, fashion. And that sounds simple, until you try to dissect what that means and what it entails.
Let’s start with the “combining”. This is easier said than done. Data can be fully structured (as it is in a well-defined relational record), semi-structured (as it is in a relational record with meta-data fields and blob fields or object records with ids and unstructured data), or fully unstructured (such as full text articles, audio and video files, etc.). So if one source is structured, and one is unstructured, how do you combine it?
Also, data can be stored in different types of systems — old school flat file databases and mega Excel sheets, slightly more refined column-oriented databases, traditional relational database systems, newer object oriented databases, and modern NoSQL databases are just a few of the more common examples.
And even if you can figure out how to mesh all of these, you have to remember that each database can be stored in a different character set (ANSI, ISO, Unicode, etc.) and use different base formats for its integer and real numbers (even if both databases call the data type float, there can be different bit counts AND different uses for the leading or trailing bit, one of which will usually signify sign).
And even once your data jockeys figure all of this out and can take all of your structured, semi-structured, and un-structured data from a variety of record (row) oriented, column oriented, relational, object, and NoSQL systems and put them all into one massive (virtual) data warehouse (through a super view) with appropriately mapped data elements that all use the same, commonly defined, data types in the same character set, that’s just the entrance exam. They still haven’t passed the course, and you still can’t do anything. Why?
Just because you can get the data into one data store doesn’t mean you can actually do anything useful with the data. Why? When you have data across ten systems, you tend to have duplicate data across a dozen systems and until you can merge all copies of the data into one, de-duplicated, correct data record, any report you run will be incorrect and useless.
Why would you have ten (10) copies? Think about it. You will be keeping, and working with, data on a supplier in the MRP (manufactured / custom product details), ERP (locations and some master data), AP (for payments), Catalog (for finished products), e-Sourcing application (where data is collected), SRM (where performance data is tracked and relationships are managed), SCAR (to solve problems), CLM (where the contracts go), e-Invoicing (which processes all the invoices), and Risk Management (where you collect internal metrics and third party data), and maybe more.
And it’s not just as simple as identifying the 10 supplier records into one mega record, because all will have a supplier id, supplier name, basic contact information, etc. This means that you will have up to 10 copies of a piece of information when you only want one, and, moreover, not all will be the same. Why? Some system will use different ids, and some systems will have incorrect, misspelled, or abbreviated data. You will need to identify all copies of the data, correct and incorrect, and replace them with one, up to date, correct copy and do this for every single data record, which could be tens of thousands for suppliers, hundreds of thousands in your employee database, and millions in your product database. Not an easy task.
But even if you manage to do this, you’ll find that you still have critical gaps in the data, such as risk and sustainability scores, product and service catalogs beyond what you currently buy, deep location data (all facility locations, warehouse size, global lane options, etc.), and so on. You will need to enrich your data with data from third party sources and feeds in order to make it useable. And then you have to consider all of the issues that go hand in hand with mapping yet another source into the central (virtual) data warehouse (since the EDI feed could have its own schema), for dealing with duplicate, incorrect, and incomplete data (as certain values will be repeated, contact information could be out of date if the feed was only validated three months ago but you updated a new contact manually one month ago), and no single feed will have all the data you’re missing. So enrichment can be a challenge as well.
However, even when you manage to
- (virtually) centralize,
- de-duplicate and cleanse, and
- enrich
the data, you’ll still find that your data is not harmonized. In order for data to be harmonized, it has to be structured for use. While you will be mapping each piece of data you import to a well defined record format, just having good record formats is not that useful. To do detailed analysis and reporting, you need well defined cubes and views on those cubes. Those cubes need to be well defined and well structured.
And even though analytics is the first application you think of that needs structure, it’s not the only one. The internal catalog also needs appropriately structured data to allow for fast searching and retrieval. The tax analysis system to review payments by country, vendor, and agreement to determine if there are reclamation options. And so on.
In other words, data harmonization is not a simple effort, and requires a quartet of capabilities to get it right:
- schema and data format normalization for (virtual) centralization,
- de-duplication and cleansing,
- enrichment, and
- structuring
And unless all of these capabilities are sound and in harmony, an organization will never achieve data harmonization, which is critical for Supply Management success.
And this is why an understanding not only of why you need harmonization, but what’s involved and why you might need a provider that understands the intricacies, is so important. In this article, we’ve tried to give you a sense of what’s involved so you can work with your IT department to ask potential vendors the right questions to separate who talks the talk from who actually walks the walk. The reality is that while every vendor understands the importance, not every vendor can do it well, especially in an efficient and timely manner.