Category Archives: AI

When It Comes To Gen-AI, I’m NOT Yelling Enough! Part I

Deep dive into the comments of this LinkedIn post and you’ll see a comment that we should stop yelling at the tools. I strongly disagree!

As per a previous post, until the space is ready to admit that

  • Gen-AI/LLMs are not the be-all and end-all, having very limited uses
  • real progress still requires real blood, sweat, elbow grease, and tears
  • you can’t replace people as this tech is NOT intelligent

and, more importantly

  • that these tools are not what people need and
  • these tools cannot be used as the foundation for suitable solutions (although they can be [a small] part of those solutions if care is taken)

We need to keep yelling, and do so rather loudly.

Because, to build on the metaphor, it’s not a shiny new hammer. If it was just a shiny new hammer, we could depend on one of three things happening when we use the hammer to hit the nail:

  1. the nail goes some distance into the wood, depending on how hard we swing,
  2. the nail doesn’t go, because the hammer is too light, or
  3. if the handle is weak or the head not securely attached and we hit really hard and the nail doesn’t go in, in the absolute worst case the handle will crack or the head will fall off.

However, with the fancy new hammer equivalent of Gen-AI, we also have to worry about the possibility that:

  1. the hammer is super magnetized and pulls the nail out on the backswing,
  2. the hammer splits the nail in half,
  3. the hammer super heats the nail and melts it, or
  4. the hammer is packed with C4 and explodes, ripping our arm off our body!

Because, when you use Gen-AI, you accept the possible side effects of hallucinations, decreased code/application security, bad math, fraud, lawsuits, deadly diets, extremist views, sleeper behaviour, dependency and cognitive reduction, suicide, blackmail, hit lists, and murder, with many links summarized in this LinkedIn post.

And the worst part is this technology is being shoved into every nook and cranny, even those where we have technology that has worked great for over a decade (because the new generation of college-dropout script kiddies who believe that they can prompt engineer a solution to anything don’t even know the basics anymore).

It’s not just not solving our problems, it’s creating new ones, and they are often worse than the problems we have. We need to yell about this!

Apple Demonstrates AI Collapse

Not long ago, Apple released the results of its study of Large Reasoning Models (LRMs) that found that this form of AI faced a “complete accuracy collapse” when presented with highly complex models. See the summary in The Guardian.

We want to bring your attention to the following key statement:

Standard AI models outperformed LRMs in low-complexity tasks while both types of model suffered “complete collapse” with high-complexity tasks.

This point needs to be made crystal clear! As we keep saying, LLMs WERE NOT ready for prime time when they were released (they should never have escaped the basement lab) and they ARE NOT ready for the tasks they are being sold for. Basic reasoning would thus dictate that LRMS, built on this technology, are definitely not ready either. And this study proves it!

It’s always taken us about two decades to get to the point where we have enough understanding of a new type of AI technology, enough experience, enough data, and enough confidence to understand where it is not only commercially viable BUT commercially dependable. And then we need to figure out how to train the appropriate (experts) users on how to spot any false positives, false negatives, and improve the technology as needed.

Just like nine (9) women can’t have a baby in 1 month, billions of dollars can’t speed this up. Like
wisdom, it takes time to develop. Typically, decades!

Moreover, while not saying it, the study is implying a key point that no one is getting: “our models of intelligence are fundamentally wrong“. First of all, we still don’t fully understand how the brain works. Secondly, if you map the compute of any XNN model we’ve devised and map the compute of a human brain in response to a question task, completely different subsets light up, and those will change as tasks become more complex or you’ll see some back and forth. We can understand data, meta-data, meta-meta-data and thus chaos. We can use clues that computers don’t, and can’t, know exist to know context and which of the 7 possible meanings of a word is the intended one. We can learn on shallow data. In contrast, these models stole ALL the data on the internet and still tell us to eat rocks!

This means what this site keep leaning towards — if you want “autonomous agents“, go back to the rules-based RPA we have today, use classic AI tech that works for discrete tasks we understand, link or “orchestrate” them together for more complex tasks, and, if you really think natural language makes software easier and faster to use (for most complex tasks, it doesn’t, but we’ve also reached the point where no one can do design engineering any more it seems), then use LLMs for one of the two things they are good for — faster, usually more accurate, semantic input processing and then system translation of output to natural language — instead of pouring billions upon billions into fundamentally flawed tech to try and fix problems from hallucinations that result from fundamental attributes that can’t be trained out, as this is an utter waste of time, money and resources.

A Shiny New SaaS or AI Wrapper Doesn’t Make Tech Any Better

Just like painting a hammer bright shiny pink doesn’t change it’s fundamental function, putting a new shiny SaaS wrapper on a traditional desktop application or adding a Gen-AI interface to allow for a “conversational” interaction doesn’t fundamentally change what the application can do.

What an application can do depends upon the data model it can support, the core algorithms that process that data, and the workflows that connect them together to take raw inputs and produce necessary outputs. If the data model is not sufficient, the algorithms not appropriate, and the workflow lacking, a shiny new wrapper won’t change anything … the software will be no more effective than the software that is being replaced.

Pick any significant application, and the best results usually depend on intense or complex calculations, using a proper algorithm that works on a proper model populated by the right inputs, and if any piece is missing, the solution doesn’t work. In our area, it’s Source to Pay, and that starts with sourcing. In sourcing, the right decision is that which results not in the lowest bid, but the lowest lifecycle cost of the purchase, which takes into account not just unit costs, and not just shipping and tariffs and interim warehousing costs for landed costs, but also utilization/waste costs, local warehousing and inventory costs, (amortized) service costs, disposal costs, and even carbon costs if they vary by option. It considers all of the available product/SKU options, plants, shipping routes, and localized plant/warehouse/store needs and uses optimization and analytics to identify the optimal award that minimizes the overall cost while maintaining service levels and minimizing risk. If the solution doesn’t allow you to build the right models, collect all the options, identify the plants and routes, and determine optimal mixes that meet your criteria, then it’s not a modern sourcing solution no matter how SaaSy it is, how new it is, or how much BS Gen-AI gets shoved into it. A good application solves your core problem. If it doesn’t do that, it’s not good. And at the end of the day, it doesn’t matter how slick and SaaSy it is, because if the only application that gets it right is a green screen desktop application, then that is the best solution to your problem. (We hope it’s not — but given how little there is behind many of these SaaS apps, which are built to look good by developers with little to no knowledge of the domain they think they can satisfy with simple algorithms, and sometimes just fancy interfaces to a classic desktop application wrapped in a web container which slaps on a web-friendly API interface to the classic app and classic algorithm — we can’t say it’s not going to be the case that you have to keep using that decades old green screen application.)

At the end of the day, it’s algorithms that work, and the reality is that these are often the algorithms that were developed decades ago by leading minds, stress tested and sharpened by brilliant minds, proven to work, and just waiting for the computing power to catch up to where they need it in order to shine. (The best data structures and algorithms text book ever written is over 35 years old. Most of the revolutionary developments were between the 70s and 90s.) MILP is decades old, but we really didn’t have the computing power to solve large, complex, real world models until about two decades ago (and then only if you didn’t mind waiting a few hours to a few days for a scenario to solve). But now we can solve them in minutes, if not seconds, and that allows for next-generation strategic analysis and planning, as long as you have a modern platform that uses a modern algorithm that can take advantage of multi-core cloud processing capabilities, the right data model, and the data inputs you need.

And therein lies the hitch — it all comes down to the data model, algorithm, and application design — not the UX, the intake and orchestration, or the “conversational” Gen-AI interface.

Remember this the next time someone tries to sell you a shiny new interface or an upgrade to what you have. Remember that most upgrades are because software stacks change, functionality that should have been in the last release is finally added (since many SaaS companies now release untested alphas), or major security or performance issues are resolved. Now, you need the fixes for sure, but you shouldn’t be paying any more than the maintenance fee for those. If the buyer rolls them in “functionality updates”, you should insist you get those for free. If you got buy without the missing functionality (either because you had complementary systems or added it yourself), then do you really need more untested functionality now?

And at the end of the day, the primary reason software stacks change is that if they didn’t, you’d have to buy a lot less tech, and then the investors wouldn’t make money. Not all tech stacks offer significant improvements in functionality or even security. They just allow developers to work on the new hotness and enterprises to force you into spending more money, without any guarantee of more value in what you’re delivered.

So don’t get fooled by new tech. Do your homework. Sometimes the best tech is the old busted hotness.

P.S. Yes, Joel the number 666 is ruining Procurement*, but not necessarily, or just, in the way you appear to believe it is.

* see the Mega Map

When Someone Says “Real AI”, Ask For Details!

We shouldn’t have to remind you, but since too many people are falling for, and buying into, the hype and selecting tech that does not, and can not, ever,work, we are going to remind you yet again.

Computers do NOT think!

To think is to direct one’s mind … where one is an intelligent being, not a dumb box. Computers thunk … they compute using algorithms (which are hopefully advanced and encapsulate expert guidance and knowledge, but that is far from guaranteed).

Computers do NOT learn.

Appropriately selected and implemented probabilistic / statistical / machine learning algorithms will improve their performance over time as more data becomes available, but they do not learn. Learn is to acquire knowledge (or skill), and by definition, knowledge can only be acquired by an intelligent being.

Computer Programs Can Adapt …

but there’s no guarantee the adaption is going to improve their performance under your definition, or even maintain their performance. Their performance could actually decrease over time.

What is critically important is that there are two primary types of algorithms that can be used to create an AI application:

Deterministic and Probabilistic

A deterministic algorithm is one that, by definition, given a particular input will, no matter what, always produce the same output, with the underlying machine always passing through the same sequence of states. As long as you don’t screw up the input, or the retrieval of the output, (and, of course, the hardware doesn’t fail), it is 100% reliable.

A probabilistic algorithm, in comparison, is an algorithm that incorporates randomness or unpredictability into its execution, and may or may not produce the same output given successive iterations of the same input. Nor is there even any guarantee that the algorithm will produce a correct, or even an acceptable, input a given percentage of the time. Well designed, these algorithms may allow for consistently faster computation, better identification of edge cases, or even a lower chance of error, on average, for a certain class of inputs (but with the caveat that other classes of inputs may suffer a higher error rate).

Deterministic algorithms can be relied on to execute certain tasks and functions autonomously with no oversight and no worry. Probabilistic cannot. In other words, you cannot assign a probabilistic algorithm a task for autonomous computation unless you can live with the worst possible outcome of the algorithm getting it wrong. And this is what Gen-AI, and most of today’s “AI” tech, is based on.

This is the critical problem with today’s AI-tech and AI-Hype. Especially when a probabilistic system can, by definition, use any method it likes to determine a probability (which may or may not be at all appropriate, since a model is only valid if it accurately captures the “population” dynamics) and may, or may not, be accurate. For some of these situations, it will be the case that neither the company nor the provider of the system will have enough historical data (market situation and outcome) to even attempt to make a reasonable prediction, and there definitely won’t be enough data to know the accuracy, because standard measures of model accuracy (like the Brier Score), tend to require a lot of data, especially if you have a situation where you need to accurately identify rare events as this could require 1,000 or more “data points” (which, in a typical market scenario, would require enough data to identify the market condition and then the unexpected change”).

(And this is exacerbated by the reality that, for many of these situations, one could likely employ more traditional “statistical techniques” like trend analysis, clustering, classical machine learning, etc. to solve much of the problem at hand.)

It’s important to remember that Gen-AI LLMs, which power most of the new (fake) agentic tech, are all probabilistic based (and designed in such a way that hallucinations are a core function that CAN NOT be eliminated), and much of it is complete and utter garbage for what it was designed for, and even worse for tasks it wasn’t defined for (like math and complex analyses). (Everyday we see a new example of complete and utter failure, often due to hallucinations, of this tech. For example, you can’t even get a list of real books out of it — as per a recent contribution to the Chicago Sun Times which which published its Summer Reading List of 15 books, of which only 5 of which actually exist. And then there are numerous examples of lazy lawyers getting raked over the coals by judges for using ChatGPT to do their homework and quoting fake cases!)

While we do need to augment purely deterministic tech with more adaptive tech that uses the best “statistical techniques” to more quickly adapt to situations, we need to spell out the techniques and restrict ourselves to what is now “classic machine learning” where the algorithms have been well researched and stress tested over decades (not modern Gen-AI powered agentic tech that has worse odds than your local casino). At least then we’ll have confidence and can enforce bounds on what the solution can actually do (to limit any potential damage).

Especially now that we finally have the computing power we need to effectively use tried-and-true “classic” ML/AI techniques that require large data stores and huge processing power for highly accurate predictions. The reality is that even though this tech has existed for at least 25 years, the computing power required made it totally impractical for all but the most critical situations. Twenty-five years ago, a large Strategic Sourcing Decision Optimization (SSDO) model would run all weekend. Today you can solve it in a few seconds on a large rack server (with 64 cores, GB of cache, and high-speed access to TB of storage). The fact that we finally have (near) real time capability means that this tech is not only finally usable in all situations, but finally effective.

[And if vendors actually hired real computer scientists, applied mathematicians, and engineers and built more of this tech, instead of script kiddies cobbling together LLMs they don’t understand, we would be a decade ahead of where we are today.]

A Very Brief History of “Safe” American Inventions and Products

More specifically, a brief history of inventions and products developed, or (primarily) adopted, in the USA as perfectly “Safe” for public use when they were anything but! From the late 1800s to the present day.

Asbestos: large scale mining began in the late 1800s when manufacturers and builders decided it was a great thermal and electrical insulator whose adverse effects on human health were not widely recognized and acknowledged until the (late) 1970s; even today exposure is still the #1 cause of work-related deaths in the world (with up to 15K dying annually in the US due to asbestos-related disease)

Aspirin: as per our previous post, invented in 1897, available over the counter in 1915, it was heavily promoted as the cure all in the 1920s through the 1940s and might have cost us over a hundred thousand lives due to overprescription during the Spanish Flu pandemic alone

Cocaine: from the late 1880s through the early 1910s, your physicians were big fans of the Victorian wonder drug (as per this Lloyd Manufacturing Ad archived on the NIH site) as it was effectively the first effective local anesthetic the western world knew about (which was endorsed by the Surgeon-General of the US Army in 1886), although the real popularity was in the public, with an estimated 200,000 cocaine addicts in the US by 1902; still, it was 1914 before it was restricted to prescription use, 1922 before tight regulations were put in place, and likely the late 1940s before prescription and dispensation finally came to an end; moreover, it was generally viewed as harmless and non-addictive until crack emerged in 1985 (even though the number of cocaine related deaths in the US climbed to 2 per 1,000 in 1981)

DDT: (this is particularly relevant to Gen-Z who are fully on-board the Gen-AI hype train) developed in the 1940s as the first modern synthetic insecticide, Gen Z’s grandparents and great-grandparents used to run through DDT clouds that were sprayed in the streets of your cities and towns in the 1940s through the 1960s, as the first health risks were not reported until roughly 1962 when Rachel Carson published Silent Spring, and it wasn’t until 1972 when the US banned it for adverse effects on human health (as well as the environment); to this day, we’re still not sure how many deaths it has contributed to, although the UN estimates 200K people globally still die from toxic exposure to pesticides, of which DDT was the first and the precursor to many newer derivations (Source)

PFAS, inc. PTFE (Teflon)

developed by DuPont in 1938, spun off into Chemours, it found use as a lubricant and non-stick coating for pans, and was produced using PFOA (C8), which we now know (and should have known much sooner, but there was a massive PFAS cover up) is carcinogenic (but only for the last decade or so as it was only classified as such in 2013 even though we should have known by the late 1990s) but they still aren’t banned (even though legislation was proposed last year to phase them out over the next decade); because of the cover ups and lack of studies until recent times, we still don’t know how deadly this was, and is, but estimates are that PFAS likely killed 600K annually between 1999 and 2015 and 120K annually after that in the USA (Source) … WOW!

Tobacco: in the 1950s, cigarettes were advertised as good for you with Doctor (Camel Advertisement) and Dentist (Viceroy Advertisement) recommendations on the ads! Despite the fact that health risks were known since the late 1950s (when the first epidemiological study showing an association between smoking and lung cancer was published by Wynder and Graham), minors in the USA could still buy cigarettes until 2009 … even though Tobacco likely killed over 100 Million people globally in the 1900s (Source)

etc.

We could go on, but the point is this: like most cultures, the USA is not good at picking winning technology that is safe for everyday use, or at least safe enough under appropriately designated usage conditions.

There’s a reason that most countries have harsh regulations on the introduction of new consumer products and technologies that US lobbyists and CEOs scream about, and that’s because more mature countries (which have been around longer than a mere 249 years) understand that no matter how safe something seems, every advancement comes at a cost, every invention comes with a risk, and every convenience comes at a price — and until we know what we are paying, when we need to pay it, and how much we are going to pay, we shouldn’t rush in head first with blinders on.

And while we might still get it wrong, the reality is that we’re more likely to get it right if we take our time and properly evaluate a new technology or advancement first, and even if we get it partially wrong, as in the case of Aspirin, at least the gain should outweigh the cost. For example, even though it can be argued Aspirin was rushed to market, when used in proper doses, the side effects for the vast majority of the population are typically much less than the anti-inflammatory benefits as, for decades, there was no substitute. Even if it gave a person stomach irritation or minor ulcers, if it was life-saving, then that was a reasonable cost at the time.

However, in the cases of DDT, PFAS, and Tobacco, there was no excuse for the lack of research, and, in some cases, the prolonged cover up of research that indicated that maybe the products were not safe but, in fact, very deadly, and since they brought no significant life saving benefits (Malaria wasn’t a big concern in the USA; people were cooking with butter, lard, and oils for centuries; and, in small quantities, both alcohol and cannabis were known to not only be safer, but even medicinal in the right quantities), there was no need to rush them to market.

The simple fact of the matter is that no tech — be it chemical/medicinal, (electro-)mechanical, or computational — can be presumed safe without adequate testing over time, and that’s why we need regulations and proper application of the scientific method. A lack of apparent side effects doesn’t mean that there are none. That’s why we have the scientific method and mathematical proofs (for confidence and statistical certainty), which is something today’s generation doesn’t appear to know a thing about (especially if they just did a couple of years of college programming) as they’ve probably never been in a real lab [or played with uranium like their grandparents because it was legal in the USA to sell home chemistry kits with uranium samples to children in the 1950s, and these kits included the Gilbert U-238 Atomic Energy Lab] and more than likely don’t know the rule of thumb that you should generally add the acid to the base (and not vice versa because, otherwise, this could happen) and that you should definitely add the acid to whatever liquid [typically water] you are diluting it with.

Regulations exist for a reason, and that reason is to keep us safe. The Hippocratic Oath should not be restricted to
doctors and the Obligation of the Order should not be restricted to engineers. Every individual in every organization bringing a product to market should be bound by the same, and regulations should exist to make sure that all organizations take reasonable care in the development and testing of every product brought to market, real or virtual. (This doesn’t mean that every product needs to be inspected, but that regulations and standards exist for organizations to follow, and those caught not following the regulations should be subject to fines that would ensure that not just the company, but the C-Suite personally, was bankrupted if the company was found to have ignored the regulations.)

While Gen Z might like the Wild Wild West (which the USA never grew out of) as much as Gen X who created the dot com boom, we need to remember that the dot com boom ended in the dot com bust in 2000, and that if this new generation continues to latch on to AI like Boomers would latch on to blankies and teddies, it just means they are doomed to repeat the mistakes of their grandparents (and will bring about a tech market crash that makes the dot com bust look like a blip). You’re supposed to learn from history, NOT repeat it!