Why are We Still Hyping Big Data When We Haven’t Mastered Small Data?

The end of the year is coming, everyone is looking for next year’s tech, and everyone wants cognitive / AI that works on Big Data. But there are two real problems with this:

1. With current hard drive and memory capacities in a single machine, we typically don’t have enough data to fill it (at least with efficient encodings) in our typical back-office functions or unbearable computation times on a quad-core (with efficient algorithm implementations).

2. When it comes to learning, the biggest sets we have for training are typically quite small!

We’ll start with the second point first. Consider spend analytics, where the primary task is to map transactions to a categorization hierarchy. If you want to use a “deep learning” AI, then you need a big data set to train that AI. But how many transactions will the average organization have that have been mapped and verified individually by a human? Maybe 10K or 20K. Even a spend analysis provider will typically have only verified a few hundred thousand or maybe a couple of million compared to the tens or hundreds of millions of transactions its big customers will throw at it a year.

The situation is even worse for contract analytics. A large multi-national might have 20K contracts, but how many have been properly indexed with meta data at the clause and term level? If you have 2K you’ve struck gold. A contracts analytics provider might struggle to cobble together a data set of 20K contracts. This is even smaller data.

And then moving on to the first point. Even though most first (and even second) spend analytics applications are slow, (Opera) BIQ has been able to process and re-categorize a million transactions on a dual-core or better laptop with 8GB of memory in under a minute for almost ten years, and their most recent version on a modern quad-core laptop with 16GB of memory can handle a million transactions in a little over a second! In fact, it can handle ten million transactions in less time than it takes you to enjoy two sips of your coffee. And when you consider that most analytics are only on a set of related categories for a relatively short time period (at most 3 years), the number of transactions is typically only a few hundred thousands for a large company and in the tens of thousands for a mid-size company. That’s not only small data, but data that can, these days, even be processed in your browser (as a new analytics offering will prove next year).

So before you go goo-goo-ga-ga over big data, understand how big your data really is and get the application that works best for your data, which will more likely be at a cost point that works best for Finance as well.