Not long ago, Apple released the results of its study of Large Reasoning Models (LRMs) that found that this form of AI faced a “complete accuracy collapse” when presented with highly complex models. See the summary in The Guardian.
We want to bring your attention to the following key statement:
Standard AI models outperformed LRMs in low-complexity tasks while both types of model suffered “complete collapse” with high-complexity tasks.
This point needs to be made crystal clear! As we keep saying, LLMs WERE NOT ready for prime time when they were released (they should never have escaped the basement lab) and they ARE NOT ready for the tasks they are being sold for. Basic reasoning would thus dictate that LRMS, built on this technology, are definitely not ready either. And this study proves it!
It’s always taken us about two decades to get to the point where we have enough understanding of a new type of AI technology, enough experience, enough data, and enough confidence to understand where it is not only commercially viable BUT commercially dependable. And then we need to figure out how to train the appropriate (experts) users on how to spot any false positives, false negatives, and improve the technology as needed.
Just like nine (9) women can’t have a baby in 1 month, billions of dollars can’t speed this up. Like
wisdom, it takes time to develop. Typically, decades!
Moreover, while not saying it, the study is implying a key point that no one is getting: “our models of intelligence are fundamentally wrong“. First of all, we still don’t fully understand how the brain works. Secondly, if you map the compute of any XNN model we’ve devised and map the compute of a human brain in response to a question task, completely different subsets light up, and those will change as tasks become more complex or you’ll see some back and forth. We can understand data, meta-data, meta-meta-data and thus chaos. We can use clues that computers don’t, and can’t, know exist to know context and which of the 7 possible meanings of a word is the intended one. We can learn on shallow data. In contrast, these models stole ALL the data on the internet and still tell us to eat rocks!
This means what this site keep leaning towards — if you want “autonomous agents“, go back to the rules-based RPA we have today, use classic AI tech that works for discrete tasks we understand, link or “orchestrate” them together for more complex tasks, and, if you really think natural language makes software easier and faster to use (for most complex tasks, it doesn’t, but we’ve also reached the point where no one can do design engineering any more it seems), then use LLMs for one of the two things they are good for — faster, usually more accurate, semantic input processing and then system translation of output to natural language — instead of pouring billions upon billions into fundamentally flawed tech to try and fix problems from hallucinations that result from fundamental attributes that can’t be trained out, as this is an utter waste of time, money and resources.