Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Another fun one is the .zip or .tar.gz file that decompresses to itself: https://research.swtch.com/zip

If you are processing emails for security reasons, and want to find viruses even if they are in archive files, it's easy to write the code to "just keep unarchiving until we're out of things to unarchive", but not only can that lead to quite astonishing expansions, it can actually be a process that never terminates at all.

I remember when I first read about these, and "a small file that decompresses to a gigabyte" was also "a small file that decompresses to several multiples of your entire hard disk space" and even servers couldn't handle it. Now I read articles like this one talking about "oh yeah Evolution filled up 100GB of space" like that's no big deal.

If you have a recursive decompressor you can still make small files that uncompress to large amounts even by 2025 standards, because the symbols the compressor will use to represent "as many zeros as I can have" will themselves be redundant. The rule that you can't compress already-compressed content doesn't necessarily apply to these sorts of files.



A few years ago David Fitfield invented a technique which provides a million-to-one non-recursive expansion, by overlapping the file streams: https://www.bamsoftware.com/hacks/zipbomb/


Might be fun to respond with one of these to malicious requests for /.env, /.git/config and /.aws/credentials instead of politely returning 404s.


I thought someone posted a blog post from someone who does in the last couple of months? Any time they got hits on their site from misbehaving bots I think they returned a gzip bomb in the HTTP response.


I remember that also.

edit - this? https://idiallo.com/blog/zipbomb-protection


Yes that's the one.


It’s definitely tempting, but I prefer not to piss off people who are already being actively malicious.


It's all just spray-and-pray crap. You're extremely unlikely to be their target, they're just looking for a convenient shell for a botnet. The most likely way they'll handle it if you do actually break them is just blacklist your address. You're not going to be worth the effort.


Isn’t this how a court system works?


I've been sending a nice 10GB gzip bomb (12MB after compression, rate limited download speed) to people that send various malicious requests. I think I might update it tonight with this other approach.


Can't you just server /dev/urandom?


I could, at the expense of a lot of bandwidth. /dev/urandom doesn't compress, so to send something that would consume 10GB of memory, I'd have to use up 10GB of bandwidth. The 10GB of /dev/zero that I return in response to requests takes up just 11MB of bandwidth. Much more efficient use of my bandwidth.

A more effective (while still relatively efficient) alternative would be to have a program that returns an infinite gzip compressed page. That'll catch anyone that doesn't set a timeout on their requests.

I don't imagine it would be too difficult to write a python app that dynamically creates the content, just have the returned content be the output of a generator. Not sure it's worth it though :)


I had a few minutes. This turns out to be really easy to do with FastAPI:

    from fastapi import FastAPI
    from starlette.responses import StreamingResponse
    from fastapi.middleware.gzip import GZipMiddleware
    
    app = FastAPI()
    
    app.add_middleware(GZipMiddleware, minimum_size=0, compresslevel=9)
    
    def lol_generator():
        while True:
            yield "LOL\n"
    
    @app.get("/")
    def stream_text():
        return StreamingResponse(lol_generator(), media_type="text/plain")

Away it goes, streaming GZIP compressed "LOL" to the receiver, and will continue for as long as they want it to. I guess either someone's hard disk is getting full, they OOM, or they are sensible and have timeouts set on their clients.

Probably needs some work to ensure only clients that accept GZIP get it.


Yikes, the gzip stdlib module is painfully slow in python. Even by "I'm used to python being slow" standards, and even under pypy. Even if I drop it down to compresslevel=5, what I'm most likely to do is consume all my CPU, than the target's memory.

A quick port to rust with gemini's help has it running significantly faster for a lot less overhead.


And eat up your bandwidth?


The goal is to DOS the abuser, so the cost to the server needs to be much lower than to the client.

/dev/urandom is not at all that.


I'd be curious if there's an LLM prompt equivalent of a zip bomb that will explode the context window. I know there's deterministic limits on context window, but future LLMs _are_ going to have strange loops and going to be very susceptible to circular reasoning.

Before AGI, there will be a untenable gullible general intelligence.


I've seen LLMs get into loops because they forgot what they were trying to do. For instance, I asked an LLM to write some code to search for certain types of wordplay, and it started making a word list (rather than writing code to pull in a standard dictionary), and then it got distracted and just kept listing words until it ran out of time.


One of the things that will likely _characterize_ AGI are nondeterministic loops.

My bet is that if AGI is possible it will take a form that looks something like

    x_(n+1) = A * x_n (1 - x_n) 
Where x is a billions long vector and the parameters in A (sizeof(x)^2 ?) are trained and also tuned to have period 3 or nearly period three for a meta-stable near chaotic progression of x.

"Period three implies chaos" https://www.its.caltech.edu/~matilde/LiYorke.pdf

That is if AGI is possible at all without wetware.


Chaos isn't intelligence. Chaos is unmanageable growth in your solution space, the oppisite of what you want.


Whats confusing to me is the dual use of the word entropy in both the physical science and in communication. The local minimums are some how stable in a world of increasing entropy. How do these local minimums ever form when there's such a large arrow of entropy.

Certainly intelligence is a reduction of entropy, but it's also certainly not stable. Just like cellular automata (https://record.umich.edu/articles/simple-rules-can-produce-c...), loops that are stable can't evolve, but loops that are unstable have too much entropy.

So, we're likely searching for a system thats meta stable within a small range of input entropy (physical) and output entropy (information).


There are theories and evidence that your brain operates hovering on the edge of the phase transition to chaos

https://en.m.wikipedia.org/wiki/Critical_brain_hypothesis


If you have any system that tries to gravitate to a local minimum it is almost impossible to not make Newton's fractal with it. Classical feed forward network learning does pretty much look like newtons method to me. Please take a look into https://en.m.wikipedia.org/wiki/Newton%27s_method


> Now I read articles like this one talking about "oh yeah Evolution filled up 100GB of space" like that's no big deal.

Is this actually a practical issue though? Windows, Mac and Linux all support transparent compression at the filesystem level, so 100GB of /dev/zero isnt actually going to fill much space at all.


That's not switched on by default unless you use a filesytsem like ZFS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: