You CAN NOT Safely Use LLMs for Contracts or Legal Work!

Darlene Newman recently wrote a great article that makes it abundantly clear why you CAN NOT Safely use LLMs for Contracts or any other document with any Legal implications whatsoever!

Not only can you not train out hallucinations, because they are a fundamental function of the technology, but every time the LLM touches the document, it can (and likely will) corrupt something that was already correct (and reviewed) before.

In other words, you collect all your reference documents, ask it to generate a contract that contains all of your mandatory clauses, addresses all the risks, incorporates the schedule, specifies the requirements, etc. etc. etc. and get back a 50 page document where the section, paragraph, and sentence quality ranges from masterpiece to monkey on crack. You then spend hours (to days) fixing everything and ask the LLM to simply correct spelling, grammar, and ensure key requirements are met in the new/changed sections only (giving it the original document for comparison). The LLM spits out a cleaned up copy, you review all the sections you updated, it looks good, and you send it out.

Little do you know that because you added an article in one section, shortened a sentence in another section, and improved the grammar in a third section that it decided to rewrite half those sections for you, because it decided the specific requirements you called out for the new sections weren’t addressed enough. In the process, other key requirements are dropped, risk mitigations have been written out, and the contract now heavily favours the other side when something goes wrong. Not at all what you intended, but that’s what you got because you didn’t review all 50 pages with care.

Maybe not too bad if nothing goes wrong, and maybe devastating if it does.

But nothing goes wrong in the short term, so your Legal team decides to use it to try and defend a claim against your company. This is where it goes from bad to much, much, worse. You upload the brief, you outline your counterpoints, you upload your supporting documents — including the relevant law and cases you know of, you ask it to find more law and cases relevant to your defense, and ask it to create your first response. You let it chug, go to lunch, and come back to a 60 page, 220 point response with half a dozen statues and two dozen cited cases.

You go through all the law, realize that only 8 of the statutes are (somewhat) relevant, remove the 3 that aren’t and the fake one the LLM found on the internet. Then you go through all the cases, realize only 14 are actually supporting, 7 are not relevant, and 3 were completely hallucinated and make the corrections. Mark all the paragraphs that are okay, the ones that need updates, and what updates are needed. Get sign off on what’s good, what needs updates, and push it through again. It comes back with a couple of new potential statutes, another 8 potential cases, updates to multiple paragraphs, and you review again. You find one of the statutes potentially relevant, 4 of the cases real and usable, and half of the paragraphs look good. You mark all this, make the updated correction lists, get sign-off, and send it back to the LLM. You don’t notice it also changed 5 of the paragraphs you were completely happy with, changed some quotes to non-existent quotes, and replaced an approved reference with a hallucinated one. This goes on for a few more iterations, where key clauses/references are not rechecked, and you still end up with a 70 page document with a dozen hallucinations, 3 non-existent cases, and faulty logic despite review by multiple senior partners, because no one checked what they were happy with last iteration because they expected the LLM would not change it because they explicitly told the LLM not to.

Unlike an intern, who is naturally lazy and tired of working 84 to 112 weeks for peanuts and will happily ignore anything you tell him to ignore, as well as intelligent (when he chooses to be), the dumber-than-a-doornail LLM recomputes the meaning of inputs on every request, has the same chance of messing up on every request, has the same chance of understanding the request but predicting you were being facetious and actually want it to rewrite the paragraphs chock full of hallucinations, and so on. You don’t notice, submit the brief with $1,000/hour senior partner sign off, and make a mockery of your firm with all the AI slop (as well as securing it a massive fine from a p!ssed off judge tired of AI slop).

And there’s no way to stop it. It doesn’t matter how detailed your instructions are. It doesn’t matter how much effort you go through to lock parts of the document down with automated input and output checks and re-dos when the LLM screws up. Every time the LLM touches the document, something will corrupt. The only thing that is unknown is whether or not is how detrimental the corruption is.

As per Darlene’s post,

Microsoft Research tested 19 AI models across 310 professional documents. They gave each model a document editing task, then another, then another … for 20 interactions in total. Frontier models corrupted 25% of document content by the end.

25%! That’s a lot of corruption of good content. And enough to ensure you get AI slop every time!

9 Signs You Were FORCED To Negotiate

Tom Mills, author of Procure Bites, recently gave us 9 signs you were born to negotiate. Now, since, as we said before, some of you are still in organizations where Purchasing is still treated as an old-school function, and run by old-school die hards who still think it’s the (19)80’s, might be wondering where it came from because that’s not the negotiating behaviour you’re used to seeing in your Procurement team who act like they are wild west gunslingers who win or lose the deal at the poker table. (They are The Good, The Bad, and The Wealthy like their sales peers, after all.)

Tom’s profile might be the profile of a Procurement negotiation professional you want to see, but if your Procurement organization is still the Island of Misfit Toys, that’s not the profile you have. This post is for you, and describes the lead buyer in your Purchasing department that was put there because they didn’t belong (or want to be) anywyere else, and, for one reason or another, the organization can’t (or won’t) get rid of them just yet.

Enjoy!

Like Any Tool, AI Won’t Solve Leadership Problems!

Paul Martyn is right to cringe a little every time he hears a solution provider say:

AI and automation won’t replace employees. It will free them up for more strategic work
Because there are two fundamental problems with this statement.

1. As Paul points out in his recent article, if strategic work is not already happening, that’s not a technology problem. That’s a leadership problem!

2A. You can’t drop tech in and suddenly become more efficient unless you have all the data and processes in place to support it — and it’s a money back guarantee you don’t have all of the data and processes in place to support it.

2B. Unless AI stands for Augmented Intelligence, AI will actually consume MORE of your time as you deal with the hallucinations and errors it will create on a regular basis. (Remember, only 1 in 20 organizations are seeing a return on their AI investments, and I guarantee those are the ones that either got tricked into, or simply bought, old fashioned RPA (robotic process automation) that actually works.

Don’t fall for the spin. If you want strategy

1. Make sure it’s already happening.

Maybe it’s only 10% of categories going through strategic sourcing, but you have to start somewhere. Then you can increase that percentage as you automate more tactical work.

2. Allocate time to (old-school) automation.

One at a time, pick a very time consuming process ripe for automation. Map it end to end. Redesign it for automation. Automate it. As time frees up, more time for strategy and automating more processes.

3. When the automation effort in time-consuming / painful processes that remain exceeds the expected time return over the next 12 months, look for outside help.

Not before. And that’s how you don’t fall for the spin!

AI is NOT Failing Because of a Lack of Forward Positioned Data

Lack of forward positioned data is NOT the problem.

(It is a problem, but not the biggest one!)

An AI agent making 1000X the decisions IS!

Right now, while the big AI players have achieved 80% to 90% “accuracy” on their carefully designed synthetic benchmarks, when applied to real world problems, accuracy in many domains drops to 25% (or worse, as at most 20% of code generated by an AI survives into a production application once it gets reviewed by a senior developer who finds a plethora of security issues, boundary condition errors, and code that, frankly, just doesn’t solve the problem at all).

THIS MEANS THAT THE AI IS MAKING 750X MORE WRONG DECISIONS THAN THE HUMAN!

That’s a LOT of mistakes.

Meanwhile, give an expert human

a) always available forward positioned data and Augmented Intelligence applications to process it (so all the data the expert human needs to make the decision is at her fingertips)

b) A-RPA (Automation) software that is best-of-breed and capable of immediately executing any decision the human makes (possibly using the forward positioned data and appropriate augmented intelligence outputs)

And that human will make 100X the decisions she’s making now, and get 95% of them correct. So if you hire 10 humans, you will have 25X less errors (5% vs 75%).

When you consider ten humans will cost considerably less than AI when you consider the rapidly rising token costs and the costs of dealing with the 25X increase in errors the AI will bring, Augmented Intelligence powered by Forward Deployed Data and a small team of humans will be a LOT more productive than you ever thought possible.

The world is not binary, flat, or stable!

It’s multi-state, curved, and chaotic.

You need fuzzy math, fractal geometry, and non-linear differential equations to describe it.

Similarly, the supply chain world we built is not a predictable single source flatland (as the work of Edwin Abott Abott in 1884 should have made clear to you).

You need multi-state logic, multiple (supply) chains and multiple methods for managing them.

And these DO NOT fit into a 2 x 2 grid! It’s this ongoing lie that ultimately leads to failure and organizations bringing in one consultancy* after another, and one platform after another, in an attempt to fix problems which never go away.

Every distinct dimension that needs to be considered in classification and decision making is a distinct dimension that needs to be taken account in any methodology or “map” presented to you (and multiplies the number of “buckets” you need for classification). So if you have three dimensions, you need at least 2 * 2 * 2 = 8 buckets in your classification scheme (as you will have at least 2 values per dimension you differentiate on, and that’s assuming each dimension you are differentiating on is a binary decision — if it were ternary, e.g you were classifying each dimension on high, medium, low or red, yellow, green, then you would have 3 * 3 * 3 = 27 buckets).

That’s why every single analyst quadrant map that attempts to assess a vendor, product, or service on more than 2 dimensions is an ultimate failure. (That’s why SolutionMap works — it’s just tech vs customer sentiment, not innovation, service, tech, market fit, market strategy, product strategy, industry strategy, geographic strategy, product viability, pricing, track record, execution, operations, and customer experience randomly squished into two meaningless composite values using absurd average weightings that are equivalent to taking the average weight of an apple, BMX bike, and a cruise ship.)

Mathematically, this would require a 14-D hypercube with 16,384 sub-cubes. And that’s why you don’t measure everything, only what counts! But try as you might, you usually going to end up with at least 3 independent dimensions that are critical to any problem you work on. But that’s not a bad thing! [Remember, the 3-sided triangle is the most stable shape with area in flatland (where analysts and consultants still love to live in to this day), and the 4-sided tetrahedron (pyramid) you can make from 4 triangles in 3-D is one of the most fundamentally stable shapes there is (and atomic bonding proves this).]

Since, when it comes to Procurement, the 3 most critical dimensions are complexity, risk, and organizational impact of what you’re buying, proper Procurement is dictated by a pocket cube. The Busch-Lamoureux Exact Purchasing pocket cube to be precise.

So if anyone else claims their updated Kraljic matrix will work for you, just shut the door. Don’t bother arguing. If they won’t accept real-world reality, you won’t get a real-world solution. Find someone who understands the complexity and can build you a platform to address it, with as much automation as can be brought to bare. (And quite a bit can be brought to bear, as per our series on operationalizing the pocket cube.) That’s how you will succeed. The old fashioned way — define the problem, use Human Intelligence (HI) to address the problem, and design processes and systems to execute the solution as efficiently as possible. The fundamentals don’t change, and anyone who says otherwise is a scam artist trying to sell you (silicon) snake oil. Don’t buy it.

* Now big consultancies won’t tell you this because if you get it right the first time, they can’t continue to sell you consulting hours, which is their ultimate goal.