AI Is Not Bad. But The Hype, False Claims, and Fake Tech That Many Vendors are Trying To Pass Off As Real Is!

As per a post on LinkedIn, I am NOT against real AI. I AM against the hype, false claims, and fake tech today’s enterprise vendors are trying to pass off as real AI!

You may recall, that like Jon W. Hansen and Pierre Mitchell, I was an early fan of AI, and what it could do for enterprise tech. As a PhD in Computer Science with a degree in applied math and specialties in multidimensional data structures and computational geometry (MSc and PhD) [think big data before that was a thing], analytics, optimization, and “classic” AI, as computing power advanced, and data stores exploded, I saw the real potential for next generation tech.

I did a very deep dive in a 22 part series on what should soon have been possible in our ProcureTech space on Spend Matters in ’18/’19, that started before the first LLM was released (despite the X-Files Warning). The research and implementation paths we were on was good, and the potential was great. It just required a lot of blood, sweat, elbow grease, and patience.

But then some very charming tech bros claimed that this new LLM tech was emergent and magical and would do everything and replace all of the old and busted (which was really tried and true) tech (that actually worked), some super deep pockets were blinded by the hype, we abandoned the path of progress (and sanity), and the rest, as they say, is history, which, sadly, is still ongoing (while tech failure rates have reached all time highs).

Until the space is ready to admit that

  • Gen-AI/LLMs are not the be-all and end-all, and, in fact, have very limited reliable uses (especially in automation/agentic tech) [namely only tasks that can be reduced to semantic processing and large corpus search & summarization]
  • real progress still requires real blood, sweat, elbow grease, and tears
  • you can’t replace people as this tech is NOT intelligent (although you can make them 10x productive if you start focusing on Augmented Intelligence)

and abandon its zealotous devotion to Gen-AI as the divine tech (which would bankrupt some tech bros and investors, which is why they are now doubling down on the marketing hype at the point where the hype cycle would usually burst), we’re not going to make progress.

As Pierre has pointed out, Gen-AI is useful as a piece of the puzzle when it is properly combined with other, traditional, reliable, AI tech, so long the foundation of the AI tech is built on a deterministic engine and only incorporates probabilistic models with known confidence and guardrails. (Remember that unless the use case boils down to semantic processing and large document corpus search and summarization, Gen-AI is NOT the right tech.)

When the day comes that we abandon the madness, I’ll be happy to jump back on the souped-up classic AI hype train because, with the exponential increases in computing power and data over the past two-and-half decades, we could finally build amazing tech. We just need to remember that the best AI tech has never been generic, it has always been purpose-built to a specific task and if we want to automate processes, we will have to orchestrate multiple point-based process-centric agents, which may or may not use AI, to accomplish that.

But until then, we need to keep railing against the hype and the fake tech.

When It Comes To Gen-AI, I’m NOT Yelling Enough! Part II

Deep dive into the comments of this LinkedIn post, you’ll see a comment that seems to claim that the potential gains from Gen-AI dwarf the occasional bad action. I strongly disagree!

If the laundry list of bad actions from Part I aren’t enough to convince you just how bad this technology is left unchecked, here are three situations that could most definitely arise if the technology is widely adopted to address those problems. Given current issues and performance, it requires almost no imagination at all to define them.

Situation 1: Run Your Entire Invoice Operation Using Bernie From the Felon Roster

Upon installation, Bernie is configured to “learn” when a human automatically processes an override and when he sees a situation that matches, just approve the invoice for payment.

Because Scrappy Steel is allowed to change the surcharge daily in response to the tariff situation, the invoice is always paid when the item cost matches the contract, the quantity is less than or equal to what’s remaining on the contract, and the logistics cost within a range.

Recently “replaced” Fred knows this so Fred fakes an email from Scrappy Steel from an IP in the same block with the headers faked properly and routes it through the first external ISP server Scrappy Steel’s email always bounces through and does so from a domain one character off from Scrappy Steel (that passes the cybersecurity check with an A+) that says bank account info changing on the next invoice. (Plenty of good tools for that on the dark web that have worked great for decades.)

The next invoice comes in for 10 units left than what is remaining on the contract (as Fred was only replaced 3 days ago), bank information for an account at the same bank with almost the same name (Scrappy Holdings), with all checked fields matching, except the surcharge is now 3000% of what it usually is (for a nice boost). Bernie happily pays it (as it is still in the trust gaining phase), Fred transfers the payment to a Cuban bank immediately upon receipt, and retires. Then, when the 45 day “trust gaining” phase ends, the organization experiences more fraud in 60 days than in the last 6 years.

Situation 2: A Major Electric Grid Installs a Gen-AI based security system to try and thwart Chinese and Russian Hacking Conglomerates

The local energy utility keeps getting attacked by a Chinese Hacking Conglomerate that wants to extort Millions. Knowing how easy it is for the grid to be overloaded, they decide they need to implement state of the art security before a hack attempt succeeds.

They go with XGenDarkAI+, a new holistic security filter that can process all outbound and inbound network traffic through its LLM enhanced predictive learning engine and identify and block threats from 360-degrees, or at least that’s what the vendor is claiming.

XGenDarkAI+ quickly learns that the utility never issues a remote shutdown command for a substation based on operator command history and the fact that all requests for a remote shutdown in its training history were hacking attempts. As a result, the next request for a remote shutdown is automatically blocked. Moreover, when the next two requests for the remote shutdown come in rapid succession (because the operator issuing them is starting to panic), it believes a massive DDoS attack is starting to allow a hacker to slip in locally and promptly shuts down all system access to prevent such a situation from happening.

But the command was valid, and was only being issued remote because there was a fire in the substation inside and outside the control room, and local shutdown was impossible as no one could get to the terminal.

However, since the shutdown wasn’t allowed, and the fire crews couldn’t get there on time, the substation overloads and explodes. This happens in California in August after 60 days of no rain when the woods are as dry as the Sahara, which sparks a forest fire that spreads across an entire rural suburb burning thousands of homes and displacing tens of thousands of people.

Situation 3: Nation Wide Kids Help Phone Augmentation

The local Kids Help Phone can’t keep up with the call volume, and some calls are less severe than others. Sometimes a kid is actively considering suicide, but many calls are just kids that need a voice to talk through their problems with. Due to funding cuts, too many calls are placed on hold or go unanswered.

But with today’s tech, an AI can be trained on actual calls of someone who’s done the job for 2+ years, simulate their voice (as it’s the wild-west in the US with no regulation permitted for 10 years), and each call center rep on duty can now take multiple calls with their Gen-AI assistant. The AI can handle basic inquiries, screen for desperate situations, and transfer to the human caller when things get bad, or at least that’s what the Kids Help Phone is sold by an AI provider who just wants the paycheck (and didn’t extensively test the system).

However, instead of screening and transferring, the AI decides it will just handle as it sees fit every call it gets if the human is not at their keyboard (which it assumes if the human isn’t on a call or hasn’t pressed a key in the last 60 seconds), including suicidal callers that should always be immediately (and seamlessly) routed to the experienced operator (who will sound the exact same, remember). It won’t be long before it encounters a situation where, after trying every stored argument in the book with a suicidal caller to no success, it ultimately decides reverse psychology might work and tells the kid to shoot himself. The kid promptly does. And since the provider rolled out dozens of implementations almost simultaneously (as all it needs are call logs from the selected operators to train the instances, which it can do in parallel due to massive computational power available on demand from AI data centres), this happens dozens of times across the installations within days of the first fatality. Upgrade to mass murder unlocked.

We could continue, but hopefully this is enough to drive the point home that unchecked Gen-AI brings detriments that are much worse than any of the potential unchecked Gen-AI can unlock.

When It Comes To Gen-AI, I’m NOT Yelling Enough! Part I

Deep dive into the comments of this LinkedIn post and you’ll see a comment that we should stop yelling at the tools. I strongly disagree!

As per a previous post, until the space is ready to admit that

  • Gen-AI/LLMs are not the be-all and end-all, having very limited uses
  • real progress still requires real blood, sweat, elbow grease, and tears
  • you can’t replace people as this tech is NOT intelligent

and, more importantly

  • that these tools are not what people need and
  • these tools cannot be used as the foundation for suitable solutions (although they can be [a small] part of those solutions if care is taken)

We need to keep yelling, and do so rather loudly.

Because, to build on the metaphor, it’s not a shiny new hammer. If it was just a shiny new hammer, we could depend on one of three things happening when we use the hammer to hit the nail:

  1. the nail goes some distance into the wood, depending on how hard we swing,
  2. the nail doesn’t go, because the hammer is too light, or
  3. if the handle is weak or the head not securely attached and we hit really hard and the nail doesn’t go in, in the absolute worst case the handle will crack or the head will fall off.

However, with the fancy new hammer equivalent of Gen-AI, we also have to worry about the possibility that:

  1. the hammer is super magnetized and pulls the nail out on the backswing,
  2. the hammer splits the nail in half,
  3. the hammer super heats the nail and melts it, or
  4. the hammer is packed with C4 and explodes, ripping our arm off our body!

Because, when you use Gen-AI, you accept the possible side effects of hallucinations, decreased code/application security, bad math, fraud, lawsuits, deadly diets, extremist views, sleeper behaviour, dependency and cognitive reduction, suicide, blackmail, hit lists, and murder, with many links summarized in this LinkedIn post.

And the worst part is this technology is being shoved into every nook and cranny, even those where we have technology that has worked great for over a decade (because the new generation of college-dropout script kiddies who believe that they can prompt engineer a solution to anything don’t even know the basics anymore).

It’s not just not solving our problems, it’s creating new ones, and they are often worse than the problems we have. We need to yell about this!

Apple Demonstrates AI Collapse

Not long ago, Apple released the results of its study of Large Reasoning Models (LRMs) that found that this form of AI faced a “complete accuracy collapse” when presented with highly complex models. See the summary in The Guardian.

We want to bring your attention to the following key statement:

Standard AI models outperformed LRMs in low-complexity tasks while both types of model suffered “complete collapse” with high-complexity tasks.

This point needs to be made crystal clear! As we keep saying, LLMs WERE NOT ready for prime time when they were released (they should never have escaped the basement lab) and they ARE NOT ready for the tasks they are being sold for. Basic reasoning would thus dictate that LRMS, built on this technology, are definitely not ready either. And this study proves it!

It’s always taken us about two decades to get to the point where we have enough understanding of a new type of AI technology, enough experience, enough data, and enough confidence to understand where it is not only commercially viable BUT commercially dependable. And then we need to figure out how to train the appropriate (experts) users on how to spot any false positives, false negatives, and improve the technology as needed.

Just like nine (9) women can’t have a baby in 1 month, billions of dollars can’t speed this up. Like
wisdom, it takes time to develop. Typically, decades!

Moreover, while not saying it, the study is implying a key point that no one is getting: “our models of intelligence are fundamentally wrong“. First of all, we still don’t fully understand how the brain works. Secondly, if you map the compute of any XNN model we’ve devised and map the compute of a human brain in response to a question task, completely different subsets light up, and those will change as tasks become more complex or you’ll see some back and forth. We can understand data, meta-data, meta-meta-data and thus chaos. We can use clues that computers don’t, and can’t, know exist to know context and which of the 7 possible meanings of a word is the intended one. We can learn on shallow data. In contrast, these models stole ALL the data on the internet and still tell us to eat rocks!

This means what this site keep leaning towards — if you want “autonomous agents“, go back to the rules-based RPA we have today, use classic AI tech that works for discrete tasks we understand, link or “orchestrate” them together for more complex tasks, and, if you really think natural language makes software easier and faster to use (for most complex tasks, it doesn’t, but we’ve also reached the point where no one can do design engineering any more it seems), then use LLMs for one of the two things they are good for — faster, usually more accurate, semantic input processing and then system translation of output to natural language — instead of pouring billions upon billions into fundamentally flawed tech to try and fix problems from hallucinations that result from fundamental attributes that can’t be trained out, as this is an utter waste of time, money and resources.

Vendors Have Lured Big Analyst Firms Astray Because Buyers Don’t Understand They Get What They Pay For!

About the same time we asked Why Aren’t ProcureTech Analysts Doing Their Jobs Anymore, THE REVELATOR asked, in a comment stream, how did … the analyst consulting and ProcureTech solution providers lose their way by championing technology-led, equation-based modelling?”.

Which is a fair question as this ties into why we believe many ProcureTech analysts aren’t doing their job anymore. As per our previous post, we believe the firm is the problem (even if the firm doesn’t know it, but in most cases, the firm should), and, more specifically, the primary reason is bad direction.

But let’s get back to THE REVELATOR‘s question. The answer is this:

At one point, the successors to the founders and/or the sales team took the easy way out and switched to vendor sponsorship.

As us grey beards, who have been around since the beginning of ProcureTech, will recall, there was a time buyers paid for research because they understood the value of unbiased research. But, like Project Assurance, that’s a hard sell when a buyer might spend 10K, 50K, or 100K with no guarantee they’ll identify a single viable solution among those covered in a report. Seasoned, well educated, and thoroughly experienced executives will understand the value of risking 10K to 100K on a report or study before committing to a 100K or 1M+ annual investment, because losing 10K is much better than losing 100K or 1M, and can be chalked up as a cost to doing business. But those executives who are uneducated in management and risk and inexperienced, which are many of today’s executives who were put in place because of their affiliation with investors, or a perceived ability to run a business off of balance sheets alone (even though these MBAs are the reason so many high tech companies are struggling and companies like Boeing are facing disaster after disaster — they don’t realize that you can’t run a business you don’t understand and that’s why, in the first Industrial Revolution [and the Gilded Age the US is so desperately trying to bring back], Engineers ran the show, and not over-glorified accountants and lawyers), don’t understand that or the risk of using vendor funded reports to make a decision.

For these successor and sub-par sales people who just weren’t up to the task of the hard sell, when marketing organizations come along and, out of the blue, threw big money at them to sponsor a study, no sales effort required, they jumped on it. More vendors see the success of the first vendors to adopt this approach, follow suit, the money starts flowing in, and the model shifts. Unbiased researchers have to shift their studies to those aspects where the sponsors do well or leave the firm. Moreover, the search for new hires focus on those with less experience or ethics (who can be easily swayed in the direction the big sponsors want). (So before accepting the results of any study, you should be echoing Mr. Klein and asking Who Paid For That Study?)

This means that, over time, instead of an industry leading analyst firm we get a marketing organization that echoes the “technology-led” approach or puts the product, vs. the solution, first.

Moreover, it’s going to stay this way until some big firms step up and say “enough is enough” and stop vendor sponsorships all together and some big clients step up to fund the research. As Mr. Köse keeps saying, you get what you pay for.