Daily Archives: June 18, 2025

Got a Headache? Don’t Take an Aspirin or Query a LLM!

Yesterday we provided you with a brief history of Aspirin, the first turn-of-the-century miracle drug that was both society’s salvation and sorrow, though the latter wouldn’t be known for more than half a century. As we discussed, it was hailed as a miracle and life-saving drug that could be used for everything from the common cold to global pandemics. And it worked, for a price. That price, when it needed to be paid, was usually one of many, many side effects which were often minor and insignificant compared to the perceived benefit the drug was bringing, except when they weren’t and they enflamed ulcers and/or increased gastrointestinal bleeding and created a life threatening situation, caused hyperventilation in a pneumonia patient, or induced a pulmonary edema and killed the patient. While the death rate even at the height of over-prescription was likely only 3%, and less than a 10th of that today, it’s still not good.

The reason for this, as we elaborated in our last post, is because, like many of the breakthrough technologies that came before, it was not only rolled out before the side effects, and more importantly, the long term effects, were well understood, but before even the proper use for the desired primary effects were well understood (as evidenced by the fact that the best physicians were routinely prescribing two to four times the maximum safe dosage during the Spanish Flu Pandemic almost 20 years after first availability). While there were benefits, there were consequences, some of them severe, and others deadly.

Medicine is as much a technology as a new mode of transportation (boat, automobile, airplane, etc.), a new piece of manufacturing equipment, a new computing device, or a new piece of software.

Now you see the point. Every breakthrough tech cycle is the same. Whether it is medicine, farm machinery, the airplane, or modern software technology — and this includes AI and definitely includes LLMs like ChatGPT.

As Aspirin proves, even if the first test seems to be successful, there’s always more beneath the surface. Especially when the population numbers in the billions and every individual could react differently. Or, in the case of an LLM, billions of people who have thousands of queries, the large majority of which have never been tested, and all of which could generate unknown results.

Moreover, there have not been significant large-scale independently funded academic studies that we can use to understand the true strengths and weaknesses, truths and hallucinations, and appropriate utilization of the technology. As Mr. Klein has pointed out in a recent LinkedIn post that asked who funded that study, over 80% of AI industry “studies” are funded by undisclosed sources, and most of them, like most industry studies these days (see Mr. Hembitski’s latest post) don’t contain good data on demographics, sample size, test material, or potential bias.

That would be the first step to trying to get a grip on this technology. The next step would be to create reasonable measures that we could use to appropriately define technology categories and domains for which we could identify tests and measures that would give us a level of confidence for a given population of inputs or usage. If you consider a traditional (X)NN (Neural Network), which have a fixed set of outputs and are designed to process inputs from a known population, we have developed methodologies to determine the accuracy of such models with high confidence through testing and random sampling with sufficiently sized data sets using appropriate statistical models. Furthermore, mathematicians have proved the accuracy of those models for a given population and we know that if appropriate tests have demonstrated 90% accuracy for a population with 98% confidence, the model is 90% accurate with 98% confidence when used properly.

We have no such guarantees for LLMs, nor any proof that they are reliable. “It worked fine for me” is NOT proof. Vendors quoting nebulous client success stories (without client names or real data) is not proof. Moreover, the fact they raised millions of dollars to bring this technology to market is definitely not proof. (All a raise proves is that the C-Suite sales team is very charismatic and convincing and great at selling a story. Nothing more. In fact, fund raising would be more honest if securities law allowed fund raising via poker and takeover protection via gunfighting, as imagined in the season two episode of Sliders “The Good, the Bad, and the Wealthy“. At least then the shenanigans would be out in the open.)

The closest thing out there to a good industry study on LLMs and LRMs is likely Apple’s newest study, as summarized in The Guardian, where they find that “standard AI models outperformed LRMs in low-complexity tasks while both types of model suffered “complete collapse” with high-complexity tasks“.

The study also found that as LRMs neared performance collapse they began “reducing their reasoning effort and that if the problem was complex enough even when provided with an algorithm that would solve the problem, the models failed.

Still we have to question this study, or more precisely, the release of this study (especially given the timing). Did Apple do it out of genuine academic interest to get to the bottom of the technology claims, or are they doing it to cast doubt on competition as rivals are claiming they are behind in the AI race (and thus they are focussing only on the negatives of the technology to show that their competition doesn’t have what their competition claims to have and are thus not behind).

The point is, we don’t understand this technology, and that fact should scream louder in your head every day. Look at all the bad stuff we’ve discovered so far, and it’s likely we’re not even close to being done yet:

Yes there is potential to the new technology, as there is with all discovery, but until we understand fully not only what that is, how to use it safely, and, most importantly, how to prevent harm, we should approach it with extreme caution and we should most definitely not let it tell us how to run our business or our lives — or else, like an Aspirin overdose, it might just kill us. (And remember, Aspirin was studied for 18 years before it was made available without a prescription, and deadly side effects and prescribed overdoses still happened. In comparison, today’s LLMs and LRMs haven’t been formally studied at all, and the providers of this technology want you to run your business, and your life, off of them in next-generation agentic systems. Think about that! And when the migraine comes, remember, don’t take Aspirin!)