Every decade or so, some idiots who never studied Huffman coding or Information Theory believe they have cracked the problem of infinite compression, and this linked paper is just the latest example of this lunacy. I really hope this was a joke paper authored by AI because it’s all bullcr@p!
On average, a text token in a LLM should require 20 bits or less (as 17 bits support a 129,000 word vocabulary) while a vision token can be 16,384 bits (based on 1024 dimensional continuous vectors) — because it takes a lot of bits to represent pixelation of a square in a 2-D image! This says you can store about 820 text tokens in the same space it takes to store one vision token. Or, you can store the entire text (lossless) in 48K, versus the 4M it would take to store the 250 vision tokens (using very lossy compression) that are required in the paper. Looks like a LOT of people can’t do basic math if this is being praised as revolutionary!
Moreover, the raw text, which maintains the full context if the tokens are kept in order, is not only fully lossless, but can be compressed using a modified Lempel-Ziv algorithm to take up an average of less than 2 bits per character (and achieve up to an 80% compression rate). Given that the average length of a word in average text is 5 characters, and a space is one character, 2500 words would be 15,000 characters, storable in 30,000 bits or a mere 4K! In other words, this paper is trying to pass off a ONE THOUSAND FOLD increase in space requirements as space saving! Pure lunacy!
In other words, if someone is claiming something too good to be true, it is! Don’t fall for it or the sure to follow claims that DeepSeek OCR is revolutionary because of this. (Since every document is different, you can’t imagine the true loss with a 90% vision token reduction!)
