Here's the submission that won the Hutter Prize in 2021: https://github.com/amar...

willvarfar · on Dec 30, 2024

Yes, one way to think about arithmetic compression is encoding the difference between the prediction and reality.

This isn't normally what people mean by lossy compression, though. In lossy compression (e.g. mainstream media compression like JPEG) you work out what the user doesn't value and throw it away.

vlovich123 · on Dec 31, 2024

That’s a stretch to call it lossy. To my eye the purpose of the LSTM seems indistinguishable from a traditional compression dictionary.

And that still doesn’t show how lossless compression is tied to intelligence. The example I always like to give is, “What’s more intelligent? Reciting the US war of independence Wikipedia page verbatim every time or being able to synthesize a useful summary in your own words and provide relevant contextual information such as it’s role in the French Revolution?”

tshaddox · on Dec 31, 2024

These lossless compression algorithms don’t just recall a fixed string of text. Obviously that can be accomplished trivially by simply storing the text directly.

These lossless compression algorithms compress a large corpus of English text from an encyclopedia. The idea is that you can compress this text more if you know more about English grammar, the subject matter of the text, logic, etc.

I think you’re distracted by the lossless part. The only difference here between lossy and lossless compression is that the lossy algorithm also needs to generate the diff between its initial output and the real target text. Clearly a lossy algorithm with lower error needs to waste fewer bits representing that error.

vlovich123 · on Jan 2, 2025

There’s no immediately obvious reason that you should be able to come up with a diff correction to apply to recreate loseless that is more efficient than traditional compressors. In essence you’re still stuck with the idea that you have a compression dictionary and trying to build a more efficient dictionary. It’s not clear there’s a link between that and intelligence.

canjobear · on Dec 30, 2024

This is standard lossless compression. None of the concepts particular to lossy compression (like rate-distortion theory) are used.