Apple Demonstrates AI Collapse

Not long ago, Apple released the results of its study of Large Reasoning Models (LRMs) that found that this form of AI faced a “complete accuracy collapse” when presented with highly complex models. See the summary in The Guardian.

We want to bring your attention to the following key statement:

Standard AI models outperformed LRMs in low-complexity tasks while both types of model suffered “complete collapse” with high-complexity tasks.

This point needs to be made crystal clear! As we keep saying, LLMs WERE NOT ready for prime time when they were released (they should never have escaped the basement lab) and they ARE NOT ready for the tasks they are being sold for. Basic reasoning would thus dictate that LRMS, built on this technology, are definitely not ready either. And this study proves it!

It’s always taken us about two decades to get to the point where we have enough understanding of a new type of AI technology, enough experience, enough data, and enough confidence to understand where it is not only commercially viable BUT commercially dependable. And then we need to figure out how to train the appropriate (experts) users on how to spot any false positives, false negatives, and improve the technology as needed.

Just like nine (9) women can’t have a baby in 1 month, billions of dollars can’t speed this up. Like
wisdom, it takes time to develop. Typically, decades!

Moreover, while not saying it, the study is implying a key point that no one is getting: “our models of intelligence are fundamentally wrong“. First of all, we still don’t fully understand how the brain works. Secondly, if you map the compute of any XNN model we’ve devised and map the compute of a human brain in response to a question task, completely different subsets light up, and those will change as tasks become more complex or you’ll see some back and forth. We can understand data, meta-data, meta-meta-data and thus chaos. We can use clues that computers don’t, and can’t, know exist to know context and which of the 7 possible meanings of a word is the intended one. We can learn on shallow data. In contrast, these models stole ALL the data on the internet and still tell us to eat rocks!

This means what this site keep leaning towards — if you want “autonomous agents“, go back to the rules-based RPA we have today, use classic AI tech that works for discrete tasks we understand, link or “orchestrate” them together for more complex tasks, and, if you really think natural language makes software easier and faster to use (for most complex tasks, it doesn’t, but we’ve also reached the point where no one can do design engineering any more it seems), then use LLMs for one of the two things they are good for — faster, usually more accurate, semantic input processing and then system translation of output to natural language — instead of pouring billions upon billions into fundamentally flawed tech to try and fix problems from hallucinations that result from fundamental attributes that can’t be trained out, as this is an utter waste of time, money and resources.

Vendors Have Lured Big Analyst Firms Astray Because Buyers Don’t Understand They Get What They Pay For!

About the same time we asked Why Aren’t ProcureTech Analysts Doing Their Jobs Anymore, THE REVELATOR asked, in a comment stream, how did … the analyst consulting and ProcureTech solution providers lose their way by championing technology-led, equation-based modelling?”.

Which is a fair question as this ties into why we believe many ProcureTech analysts aren’t doing their job anymore. As per our previous post, we believe the firm is the problem (even if the firm doesn’t know it, but in most cases, the firm should), and, more specifically, the primary reason is bad direction.

But let’s get back to THE REVELATOR‘s question. The answer is this:

At one point, the successors to the founders and/or the sales team took the easy way out and switched to vendor sponsorship.

As us grey beards, who have been around since the beginning of ProcureTech, will recall, there was a time buyers paid for research because they understood the value of unbiased research. But, like Project Assurance, that’s a hard sell when a buyer might spend 10K, 50K, or 100K with no guarantee they’ll identify a single viable solution among those covered in a report. Seasoned, well educated, and thoroughly experienced executives will understand the value of risking 10K to 100K on a report or study before committing to a 100K or 1M+ annual investment, because losing 10K is much better than losing 100K or 1M, and can be chalked up as a cost to doing business. But those executives who are uneducated in management and risk and inexperienced, which are many of today’s executives who were put in place because of their affiliation with investors, or a perceived ability to run a business off of balance sheets alone (even though these MBAs are the reason so many high tech companies are struggling and companies like Boeing are facing disaster after disaster — they don’t realize that you can’t run a business you don’t understand and that’s why, in the first Industrial Revolution [and the Gilded Age the US is so desperately trying to bring back], Engineers ran the show, and not over-glorified accountants and lawyers), don’t understand that or the risk of using vendor funded reports to make a decision.

For these successor and sub-par sales people who just weren’t up to the task of the hard sell, when marketing organizations come along and, out of the blue, threw big money at them to sponsor a study, no sales effort required, they jumped on it. More vendors see the success of the first vendors to adopt this approach, follow suit, the money starts flowing in, and the model shifts. Unbiased researchers have to shift their studies to those aspects where the sponsors do well or leave the firm. Moreover, the search for new hires focus on those with less experience or ethics (who can be easily swayed in the direction the big sponsors want). (So before accepting the results of any study, you should be echoing Mr. Klein and asking Who Paid For That Study?)

This means that, over time, instead of an industry leading analyst firm we get a marketing organization that echoes the “technology-led” approach or puts the product, vs. the solution, first.

Moreover, it’s going to stay this way until some big firms step up and say “enough is enough” and stop vendor sponsorships all together and some big clients step up to fund the research. As Mr. Köse keeps saying, you get what you pay for.

Sourcing Innovation stands by it’s statement that the USA is …

math stupid that it made in it’s post explaining why the lack of adoption of analytics is NOT complicated. the doctor knows it ruffled a few feathers, but it’s not the doctor claiming that, it’s the OECD data (which is available here).

At least the doctor didn’t point out in that post that the USA is effectively failing across the board as it is below average in literacy, numeracy, and adaptive problem solving (and significantly below average in numeracy, as we pointed out in our last article), that there should be no reason for this when the USA is seventh in the world in nominal GDP per capita (beaten only by Iceland*, Singapore, Norway, Switzerland, Ireland, and Luxembourg*, where the * countries are not in the OECD rankings), and that the USA could afford to have the best educated people in the world if it desired (and it could allocate the budget if it desired, considering the percentage it spends on defence is more than twice the global average, and that’s before all of the foreign military aid).

However, he feels it is now very important that he does point this out because too many Americans are heralding the budget cuts to the Federal Department of Education (on the basis that funding should be tied to performance, which is a justifiable goal, but the best way to do that needs to be carefully considered) without a plan instead of insisting that it be restructured to address the serious educational deficiencies or replaced with more state level agencies (where funding is tied to specific focal points and not allowed to be disbursed on whims).

To nail these points home, here is the relevant data:

Literacy

Country Rank Score
Finland 1st 296
Canada 10th 271
Czechia 14th 260
AVERAGE 260
USA 16th 258

(which is a 12 point drop for the USA since the last OECD ranking!)

Numeracy

Country Rank Score
Finland 1st 294
Canada 12th 271
AVERAGE 263
Croatia 21st 254
USA 25th 249

(which is a 7 point drop for the USA since the last OECD ranking)

Adaptive Problem Solving

Country Rank Score
Finland 1st 276
Canada 10th 259
AVERAGE 250
Slovak Republic 19th 247
USA 19th 247

I’m old enough to remember when the US education system was the envy of the world (even though the US has scored in the lower half, and sometimes the bottom, of the FIMS, FISS, and the IEA — which measured the global performance of the primary and secondary education systems across 12 to 20 countries back in the 1960s through 1980s), because, post Sputnik, the US poured money into public education in an attempt to produce the best students in the world to enter post-secondary STEM programs and become the best engineers in the world … and its Universities took prominence as the Universities you wanted to be admitted to (bypassing centuries old Universities in the UK and Europe in popularity).

Now it’s true that the US should have improved substantially based on this investment (which means that there are fundamental issues that have never been addressed), but just saying “it doesn’t work” and attempting to tear it down without a plan to put something better in place is not only unhelpful but sends a message to the world that the US no longer values having the best education system. I’m afraid this will have ripple effects on the popularity of US institutions, which rely a lot on full tuition foreign students to maintain their top-tier quality programs, and lead to further degradation in adult literacy, numeracy, and problem solving skills (which are now barely on par with countries North Americans grew up believing, partially thanks to propaganda, to be significantly below us).

For those of you who not only want your American-based companies to continue to be the best in the world, but also want America to attract global headquarters (or at least regional headquarters) of more multi-nationals, the sincere hope is that you will fix this. In this increasingly unstable global economy (thanks to natural and man-made disasters), the winners will be those with the best educated people who have the skills to use the best tools at their disposal to make the best decisions fast enough to survive. As a result, companies that want to weather the storms should now be more inclined to choose the Nordics, Japan, or Canada (which top the adaptive problem solving list with high literacy and numeracy scores, and don’t have the energy issues Germany is dealing with or the lack of local population that Estonia is dealing with). Now, while that last option is good for the doctor, let’s face it, for the past eigthy years, the market dynamics worked best when the biggest companies were in America and, through mutual trade agreements (NAFTA or USCMA), Canada supported.*

* Although it must be admitted that maybe the time of American dominance with Canadian and Mexican support has, unfortunately, come to an end. Especially since Canada is still “Open” on the Civicus Human Rights Watchlist and not one of the two countries that recently had their score narrowed significantly in the March 2025 update. While research needs to be done on the subject, when you consider that 17 of the top 31 countries are “open” and 11 are “narrowed” in terms of human rights and civic freedoms on the Civicus rating scale, there does seem to be a high correlation between civic freedom and average educational level as only 2 countries are “obstructed” and only 1 country is “repressed”. And while the repressed country of Singapore comes in high at #13 if you take the average across the 3 scores, the two “obstructed” countries come in low at 22 and 26 respectively.)

There Are Best-in-Class Solutions for End-to-End Indirect Sourcing Processes …

… you just have to do your research!

A while ago I posted that your standard sourcing solution doesn’t work for direct (because it doesn’t, and, relatively speaking, very few sourcing solutions do work for direct), and one of the comments I received implied that it doesn’t even work for indirect. And while some of the solutions out there are so minimal / antiquated / poorly designed, it can be considered a fair question (as there are certainly a number of solutions that would never make a recommendation list by the doctor under any circumstance), the reality is that there are lots of solutions that work well for indirect sourcing.

Now, if you are thinking about a best-in-class 7-step sourcing process, then it’s true you might just need on the supplier side:

  • one module for supplier discovery
  • one for extensive supplier qualification from a 360-degree risk, compliance, sustainability, quality, service, etc. perspective
  • one for supplier onboarding and communications
  • one for supplier performance management and development

As well as a sourcing platform that supports:

  • multiple RFX Formats
  • fluid multi-round events
  • strategic sourcing decision optimization so you understand baselines, the cost of business rules, etc.

And possibly a separate best-in-class analytics solution that:

  • lets you dig deep into costs, trends, and outliers

And then an “orchestration” platform that

  • helps you integrate them all so that all data is available in all platforms all of the time

So while you could need as many modules as steps, they exist, and you can build a fantastic solution for your organization and process and get great results. Just don’t expect it from an average suite (that won’t be BiC across the board, will only be tailored for large enterprise, and may only be super appropriate for certain industries). The sheer number of companies in this space (see the Mega Map) means that the odds of you not being able to put together a good solution are small (although it also means that the workload of finding those solutions is quite large, as it takes work to weed through 666 solutions, which is a number that is ruining Procurement, as Joël Collin-Demers indicates).

The lack of solutions for indirect is not the problem, the lack of solutions that not only allows, but can be configured, to enforce a good process is!

More specifically, we are talking about mandatory dual-sourcing! Which, sadly, is still not being done in direct, even though JIT supply chains have been out-the-window at least since Eyjafjallajokull (remember that? it should have been the first push to start properly dual sourcing), with the situation getting progressively worst (on a sometimes daily basis) since March, 2020. (Five years of natural and man-made disasters should be more than enough of a wake-up call, right?)

This is not something indirect has normally done because the view has been “it’s a standard finished off-the-shelf product, I’ll just get it from someone else if I need to“, not recognizing that, for some products, 90% still ultimately come from a single destination country (which is often China) and any disruption to that country (pandemics [as China’s, often impossible, zero tolerance policy will close entire cities for months without any regard to the consequences to the rest of the world], border closings on key land routes, port strikes, and now extremely high [never seen before] tariffs) will jeopardize almost all supply. And for other products, they have chosen a smaller supplier with limited scalability and no nearby options (and resourcing will take time as it will also involve rerouting and ripple effects through the supply chain — and this could also add to cost).

At the end of the day, the platform has to allow you to understand, track, and address your biggest risks, or, as we wrote sixteen (16) years ago (and stand by it to this day), your platform will be your biggest risk because it’s the unexpected that you don’t plan for that kills you, not the expected, no matter how severe.

And while this is not a risk-centric post (as we have written series on that), the largest cause of risk is not natural disasters (even though we are now seeing dozens of major disasters every year, the reality is that most are still localized) or pandemics (while epidemics are increasing, true pandemics still work out to only be a twice-a-century event [although if we don’t step up our global management thereof, the rate will double]), but human generated risks. Stupid humans create more risk and chaos than the planet does!

A Shiny New SaaS or AI Wrapper Doesn’t Make Tech Any Better

Just like painting a hammer bright shiny pink doesn’t change it’s fundamental function, putting a new shiny SaaS wrapper on a traditional desktop application or adding a Gen-AI interface to allow for a “conversational” interaction doesn’t fundamentally change what the application can do.

What an application can do depends upon the data model it can support, the core algorithms that process that data, and the workflows that connect them together to take raw inputs and produce necessary outputs. If the data model is not sufficient, the algorithms not appropriate, and the workflow lacking, a shiny new wrapper won’t change anything … the software will be no more effective than the software that is being replaced.

Pick any significant application, and the best results usually depend on intense or complex calculations, using a proper algorithm that works on a proper model populated by the right inputs, and if any piece is missing, the solution doesn’t work. In our area, it’s Source to Pay, and that starts with sourcing. In sourcing, the right decision is that which results not in the lowest bid, but the lowest lifecycle cost of the purchase, which takes into account not just unit costs, and not just shipping and tariffs and interim warehousing costs for landed costs, but also utilization/waste costs, local warehousing and inventory costs, (amortized) service costs, disposal costs, and even carbon costs if they vary by option. It considers all of the available product/SKU options, plants, shipping routes, and localized plant/warehouse/store needs and uses optimization and analytics to identify the optimal award that minimizes the overall cost while maintaining service levels and minimizing risk. If the solution doesn’t allow you to build the right models, collect all the options, identify the plants and routes, and determine optimal mixes that meet your criteria, then it’s not a modern sourcing solution no matter how SaaSy it is, how new it is, or how much BS Gen-AI gets shoved into it. A good application solves your core problem. If it doesn’t do that, it’s not good. And at the end of the day, it doesn’t matter how slick and SaaSy it is, because if the only application that gets it right is a green screen desktop application, then that is the best solution to your problem. (We hope it’s not — but given how little there is behind many of these SaaS apps, which are built to look good by developers with little to no knowledge of the domain they think they can satisfy with simple algorithms, and sometimes just fancy interfaces to a classic desktop application wrapped in a web container which slaps on a web-friendly API interface to the classic app and classic algorithm — we can’t say it’s not going to be the case that you have to keep using that decades old green screen application.)

At the end of the day, it’s algorithms that work, and the reality is that these are often the algorithms that were developed decades ago by leading minds, stress tested and sharpened by brilliant minds, proven to work, and just waiting for the computing power to catch up to where they need it in order to shine. (The best data structures and algorithms text book ever written is over 35 years old. Most of the revolutionary developments were between the 70s and 90s.) MILP is decades old, but we really didn’t have the computing power to solve large, complex, real world models until about two decades ago (and then only if you didn’t mind waiting a few hours to a few days for a scenario to solve). But now we can solve them in minutes, if not seconds, and that allows for next-generation strategic analysis and planning, as long as you have a modern platform that uses a modern algorithm that can take advantage of multi-core cloud processing capabilities, the right data model, and the data inputs you need.

And therein lies the hitch — it all comes down to the data model, algorithm, and application design — not the UX, the intake and orchestration, or the “conversational” Gen-AI interface.

Remember this the next time someone tries to sell you a shiny new interface or an upgrade to what you have. Remember that most upgrades are because software stacks change, functionality that should have been in the last release is finally added (since many SaaS companies now release untested alphas), or major security or performance issues are resolved. Now, you need the fixes for sure, but you shouldn’t be paying any more than the maintenance fee for those. If the buyer rolls them in “functionality updates”, you should insist you get those for free. If you got buy without the missing functionality (either because you had complementary systems or added it yourself), then do you really need more untested functionality now?

And at the end of the day, the primary reason software stacks change is that if they didn’t, you’d have to buy a lot less tech, and then the investors wouldn’t make money. Not all tech stacks offer significant improvements in functionality or even security. They just allow developers to work on the new hotness and enterprises to force you into spending more money, without any guarantee of more value in what you’re delivered.

So don’t get fooled by new tech. Do your homework. Sometimes the best tech is the old busted hotness.

P.S. Yes, Joel the number 666 is ruining Procurement*, but not necessarily, or just, in the way you appear to believe it is.

* see the Mega Map