And I’m so happy I’m not the only one pushing this theory. Mr. Stephen Klein recently published a great post on The Age of Pretend.
In the post he notes that:
“Everyone assumes AI’s biggest bottleneck is compute. … That assumption is wrong. The real bottleneck … is architecture, specifically, a design decision made in 1945. … The real constraint: the von Neumann bottleneck. Modern computers separate memory and processing. Data has to move back and forth between them. For most software, that’s fine.
For AI, it’s catastrophic.
Some numbers the industry rarely highlights:
- Accessing off-chip memory consumes ~200× more energy than the computation itself
- Roughly 80% of Google TPU energy goes to electrical connections, not math
- A 70-billion-parameter model moves ~140 GB of data just to generate one token”
LET THAT SINK IN. Us old timers remember “640K out to be enough for anyone”! The Apollo Guidance Computer — you know, the one that was installed on each Apollo Command Module and Lunar Module in the Apollo Missions, had 2K Core RAM Memory and a 36K ROM. Even today, unless you have an iPhone 17, your phone probably only has 128 GB of storage. That means, even with the processing power of your phone (that dwarfs most computers us old timers have ever owned), you can only process ONE token. (Now do you understand why the data center [energy] demands for your Gen-AI chat-bots are destroying the planet? Anyway, we digress …)
This means that (Gen-)AI has hit a wall. Computer Architecture supports massive compute at scale, massive storage at scale, but not massive transfers at scale.
So what does this mean?
Do you remember the days of RAM drives? Not only did it speed things up, but it kept your machine cooler because, as Stephen noted, less energy accessing data in RAM than on disk.
And do you remember the fun of Assembly? (Okay, that’s sarcasm!) Once you learned to maximize register usage (i.e. re-sequencing processing so that you minimized reads from, and writes to, memory), your code got faster still (and machines stayed cooler longer, which was obvious by the lack of noisy fans spinning up).
We’ve known about this problem for decades. (Eight decades to be exact!) It’s too bad today’s students don’t study the basics and understand it’s not strength that determines computational speed and energy requirements, it’s data scale — whether the data fits in memory or not, whether “significant” chunks fit in the onboard GPU memory or not. (And specifically, can you scale the data down enough for the efficiency you require?)
But this is still the key point in Stephen’s article:
“The next major improvements will likely come from smarter algorithms.”
We might need brute force to detect patterns we can’t (yet) see, but the only way to truly advance is to understand those patterns and code optimal, light-weight algorithms that exploit fundamental rules to allow us to process data quickly and efficiently.
Until we figure that out. You’ll never have usable AI (and definitely never have REAL AI as not only will it never be intelligent, but it will never, ever, get anywhere close).
