If you are processing emails for security reasons, and want to find viruses even if they are in archive files, it's easy to write the code to "just keep unarchiving until we're out of things to unarchive", but not only can that lead to quite astonishing expansions, it can actually be a process that never terminates at all.
I remember when I first read about these, and "a small file that decompresses to a gigabyte" was also "a small file that decompresses to several multiples of your entire hard disk space" and even servers couldn't handle it. Now I read articles like this one talking about "oh yeah Evolution filled up 100GB of space" like that's no big deal.
If you have a recursive decompressor you can still make small files that uncompress to large amounts even by 2025 standards, because the symbols the compressor will use to represent "as many zeros as I can have" will themselves be redundant. The rule that you can't compress already-compressed content doesn't necessarily apply to these sorts of files.
A few years ago David Fitfield invented a technique which provides a million-to-one non-recursive expansion, by overlapping the file streams: https://www.bamsoftware.com/hacks/zipbomb/
I thought someone posted a blog post from someone who does in the last couple of months? Any time they got hits on their site from misbehaving bots I think they returned a gzip bomb in the HTTP response.
It's all just spray-and-pray crap. You're extremely unlikely to be their target, they're just looking for a convenient shell for a botnet. The most likely way they'll handle it if you do actually break them is just blacklist your address. You're not going to be worth the effort.
I've been sending a nice 10GB gzip bomb (12MB after compression, rate limited download speed) to people that send various malicious requests. I think I might update it tonight with this other approach.
I could, at the expense of a lot of bandwidth. /dev/urandom doesn't compress, so to send something that would consume 10GB of memory, I'd have to use up 10GB of bandwidth. The 10GB of /dev/zero that I return in response to requests takes up just 11MB of bandwidth. Much more efficient use of my bandwidth.
A more effective (while still relatively efficient) alternative would be to have a program that returns an infinite gzip compressed page. That'll catch anyone that doesn't set a timeout on their requests.
I don't imagine it would be too difficult to write a python app that dynamically creates the content, just have the returned content be the output of a generator. Not sure it's worth it though :)
I had a few minutes. This turns out to be really easy to do with FastAPI:
from fastapi import FastAPI
from starlette.responses import StreamingResponse
from fastapi.middleware.gzip import GZipMiddleware
app = FastAPI()
app.add_middleware(GZipMiddleware, minimum_size=0, compresslevel=9)
def lol_generator():
while True:
yield "LOL\n"
@app.get("/")
def stream_text():
return StreamingResponse(lol_generator(), media_type="text/plain")
Away it goes, streaming GZIP compressed "LOL" to the receiver, and will continue for as long as they want it to. I guess either someone's hard disk is getting full, they OOM, or they are sensible and have timeouts set on their clients.
Probably needs some work to ensure only clients that accept GZIP get it.
Yikes, the gzip stdlib module is painfully slow in python. Even by "I'm used to python being slow" standards, and even under pypy. Even if I drop it down to compresslevel=5, what I'm most likely to do is consume all my CPU, than the target's memory.
A quick port to rust with gemini's help has it running significantly faster for a lot less overhead.
I'd be curious if there's an LLM prompt equivalent of a zip bomb that will explode the context window. I know there's deterministic limits on context window, but future LLMs _are_ going to have strange loops and going to be very susceptible to circular reasoning.
Before AGI, there will be a untenable gullible general intelligence.
I've seen LLMs get into loops because they forgot what they were trying to do. For instance, I asked an LLM to write some code to search for certain types of wordplay, and it started making a word list (rather than writing code to pull in a standard dictionary), and then it got distracted and just kept listing words until it ran out of time.
One of the things that will likely _characterize_ AGI are nondeterministic loops.
My bet is that if AGI is possible it will take a form that looks something like
x_(n+1) = A * x_n (1 - x_n)
Where x is a billions long vector and the parameters in A (sizeof(x)^2 ?) are trained and also tuned to have period 3 or nearly period three for a meta-stable near chaotic progression of x.
Whats confusing to me is the dual use of the word entropy in both the physical science and in communication. The local minimums are some how stable in a world of increasing entropy. How do these local minimums ever form when there's such a large arrow of entropy.
Certainly intelligence is a reduction of entropy, but it's also certainly not stable. Just like cellular automata (https://record.umich.edu/articles/simple-rules-can-produce-c...), loops that are stable can't evolve, but loops that are unstable have too much entropy.
So, we're likely searching for a system thats meta stable within a small range of input entropy (physical) and output entropy (information).
If you have any system that tries to gravitate to a local minimum it is almost impossible to not make Newton's fractal with it. Classical feed forward network learning does pretty much look like newtons method to me. Please take a look into https://en.m.wikipedia.org/wiki/Newton%27s_method
> Now I read articles like this one talking about "oh yeah Evolution filled up 100GB of space" like that's no big deal.
Is this actually a practical issue though? Windows, Mac and Linux all support transparent compression at the filesystem level, so 100GB of /dev/zero isnt actually going to fill much space at all.
If you are processing emails for security reasons, and want to find viruses even if they are in archive files, it's easy to write the code to "just keep unarchiving until we're out of things to unarchive", but not only can that lead to quite astonishing expansions, it can actually be a process that never terminates at all.
I remember when I first read about these, and "a small file that decompresses to a gigabyte" was also "a small file that decompresses to several multiples of your entire hard disk space" and even servers couldn't handle it. Now I read articles like this one talking about "oh yeah Evolution filled up 100GB of space" like that's no big deal.
If you have a recursive decompressor you can still make small files that uncompress to large amounts even by 2025 standards, because the symbols the compressor will use to represent "as many zeros as I can have" will themselves be redundant. The rule that you can't compress already-compressed content doesn't necessarily apply to these sorts of files.