A lot of vendors will tell you a lot of what they do is so hard and took thousands of hours of development and that no one else could do it as good or as fast or as flexible when the reality is that much of what they do is easy, mostly available in open source, and can be replicated in modern Business Process Management (BPM) configuration toolkits in a matter of weeks. In this series we are tackling the suite of supply management applications and pointing out what is truly challenging and what is almost as easy as cut-and-paste.
In our first three posts we discussed basic sourcing, basic procurement, and supplier management where few technical challenges truly exist. Then, yesterday, we started a discussion of spend analysis, where there are deep technical challenges (that we discuss today), but also technical by-gones that vendors still perpetuate as challenges and stumpers that are not true challenges but are to many vendors who didn’t bother to spend the time it takes to hire a development team that can figure them out. Yesterday we discussed the by-gones and the first-stumper. Today, we discuss the second big stumper and the true challenges.
Technical Stumper: Multi-Cube
The majority of applications support one, and only one, cube. As SI has indicated again and again (and again and again), the power of spend analysis resides in the ability to quickly create a cube on a hunch, on any schema of interest, analyze the potential opportunity, throw it away, and continue until a new value opportunity is found. This also needs to be quick and easy, or the best opportunities will never be found.
But even today, many applications support ONE cube. And it makes absolutely no sense. Especially when all one has to do to create a new cube is just create a copy of the data in a set of temporary tables designed just for that and update the indexes. In modern databases, it’s easy to dynamically create a table, bulk copy data from an existing table to the new table, and then update the necessary index fields. The cube can be made semi-persistent by storing the definition in a set of meta-tables and associating it with the user (which is exactly how databases track tables anyway).
Apparently vendors are stumped on how easy this is, otherwise, the doctor is stumped as to why most vendors do not support such basic capability.
Technical Challenge: Real-time (Collaborative) Reclassification
This is a challenge. Considering that the reclassification of a data set to a new hierarchy or schema could require processing every record, and that many data sets will contain millions of transaction records, and modern processors can only do so many instructions per second, this will likely always be a challenge as big data gets bigger and bigger. As a result even the best algorithms can generally only handle a few million records on a high end PC or laptop in real time. And while you can always add more cores to a rack, there’s still a limit as to how many cores can be connected to an integrated memory bank through a high-speed bus … and as this is the key to high-speed data processing, even the best implementations will only be able to process so many transactions a second.
Of course, this doesn’t explain why some applications can re-process a million transactions in real time and some crash before you load 100,000. This is just bad coding. This might be a challenge, but it’s still one that should be handled as valiantly as possible.
Technical Challenge: Exploratory 3-D Visualization
There’s a reason that Nintendo, X-box, and PlayStation keep releasing new hardware. They need to support faster rendering as the generation of realistic 3-D graphics in real-time requires very powerful processing. And while there is no need to render realistic graphics in spend analysis, creating 3-D images that can be rotated in real-time, blown up, shrunk down, drilled into to create a new 3-D image, which can again be rotated, blown-up, shrunk, drilled into, etc. in real time is just as challenging. This is because you’re not just rendering a complex image (such as a solar system, 3-D heated terrain map, etc.) but also annotating it with derived metrics that require real-time calculation, storing the associated transactions for tabular pop-up, etc. — and we already discussed how hard it is to reclassify (and re-calculate derived metrics on) millions of transactions in real time.
Technical Challenge: Real-time Hybrid “AI”
First of all, while there is no such thing as “AI”, because machines are not intelligent, there is a such thing as “automated reasoning” as machines are great at executing programmed instructions using whatever logic system you give them, and while there is no such thing as “machine learning” as it requires true intelligence to learn, there is a such thing as an “adaptive algorithm” and the last few years have seen the development of some really good “adaptive algorithms” that employ the best “automated reasoning” techniques that can, with training, over time improve to the point where classification accuracy can (quickly) get to 95% or better. And the best can be pre-configured with domain models that can jump-start the classification process and often get up to 80% accuracy with no training on reasonably clean data.
But the way these algorithms typically work is that data is fed into a neural network or cluster-machine, the outputs compared to a domain model, and where the statistical-based technique fails to generate the right classification, the resulting score is analyzed and the statistical weights or boundaries of the cluster modified, and the network or cluster machine re-run until the classification accuracy reaches a maximum. But in reality, what needs to happen is that as users correct classifications in real-time when doing ad-hoc analysis in derived spend cubes, the domain models need to be modified and the techniques updated in real time, and the override mapping remembered until the classifier automatically classifies all similar future transactions correctly. This requires the implementation of leading edge “AI” (which should be called “AR”) that is seamlessly integrated with leading edge analytics.
In other words, while building any analytics application may have been a significant challenge last decade when the by-gones were still challenges and the stumpers required a significant amount of brain-power and coding to deal with, that’s not the case anymore. These days, the only real challenge is real-time reclassification, visualization, and reasoning on very large data sets … even with parallel processing, this is a challenge if a large number of records have to be reprocessed, re-indexed, and derived dimensions recalculated.
But, of course, the challenges, and lack of, don’t end with analytics. Stay tuned!