> Since the state is stored in a multi-level Bloom Filter style data structure, the memory needed is constant and does not scale with the number of clients.
Constant memory, but those hashes will take up CPU cycles. If you're running a workload that completes sub 20 milliseconds, these cycles spent hashing may not be worth it over, say, a constant-time admission control like token bucket.
Absolutely my thought as well. During my thesis i found token buckets to be better and more fair compared to other algos[1]. This was in libvirt. my finding was that it scaled well up to around 1k buckets if memory serves right. after that the results were weird to say the least. (of course the servers i tested on were quite old with not too much memory). Would be nice to run my thesis tests again to find what happened in the last decade.
Not just CPU-bound, but IO-bound workloads can also complete within milliseconds.
> have they claimed that?
The mention of "token bucket" in the project readme is why I wrote the comment I did.
... FAIR [only throttles] when there's a genuine shortage of resources as opposed to the approaches like token bucket or leaky bucket which may reject requests ...
That's the typical design for a token-bucket rate limiter. A token-bucket limiter uses more memory but is a simpler design. I believe this bloom-filter based implementation is designed to be more memory efficient at the cost being less CPU efficient.
Bloom filters really shine when they avoid a network round trip or pulling data from disk. Those are so far away you can easily afford to spend 5% of the cost of the request to avoid 99% of the requests.
Agreed! In a typical token bucket implementation, bucket state must be shared across multiple servers, which means you need a shared state repository such as Redis or a database (SQL or noSQL). This can add milliseconds to each request being handled.
My previous (very long) project was in such a state when I got there that it was only in the last year I was there that measuring things in microseconds was something I could get on other people's radars. I wish I had started sooner because I found a lot of microseconds in about a four month period. That was the most intensive period of user-visible latency reduction I saw the entire time it was there and second through fourth place took years of manpower to accomplish. Past me and future me are both still mad about that.
Rate limiters seem like the poster child for something you'd want to host in an in-memory KVS. Typical access times look more like ~100us, not milliseconds. Even if you have a complicated algorithm, you can still limit it to a single round trip by using a Redis Function or some moral equivalent.
You can also rate limit on the server side. Service A talks to Service B. Service B is 3 replicas. Each replica keeps track of how many times Service A talked to it. If that goes over the limit / 3, deny the request. Now there is no IPC required.
This isn't as accurate, but it's often adequate. I have never liked making a network request to see if I can make a network request; too slow. (I do use this architecture to rate-limit my own website, mostly because I wanted to play with writing a service to do that. But I'd hesitate to make someone else use that.)
That would work if you can assign requests from a given tenant to a single instance, but there are many situations in which that's either impossible or unwise. What if a single server doesn't have enough capacity to handle all the traffic for a tenant? How do you preserve the state if that instance fails?
I'm sorry, I don't think I understand your question. Are you talking about the KVS? You shard the servers for extra capacity. Several KVS's have built-in clustering if you want to go that route. They're usually incredibly stable, but if one goes down for whatever reason (say the physical machine fails), you just spin another one up to take its place.
In terms of preserving state, the answer for rate limiting is that it is almost always far, far less dangerous to fail open than it is to deny requests during a failure. If you really, really, wanted to preserve state (something I'd suggest avoiding for a rate limiter), several KVS's have optional persistence you can turn on, for example, Redis' AOF.
The end services themselves should be designed with some sort of pushback mechanism, so they shouldn't be in any danger of overloading, regardless of what's going on with the rate limiter.
I think I misunderstood what you were saying. By "in-memory KVS" with "access times in the microseconds" I thought you were implying a KVS hosted on the server that handles the requests. Otherwise, even if the KVS can respond in 100us to a local query, network latency is going to add much more than that.
Ah ok, that makes sense. Over loopback, you can roundtrip Redis at least as fast as double-digit micros. Intra-DC, your network latency should be somewhere in the triple-digit micros. I'd say if you're not seeing that, something is probably wrong.
I don't recall all the details but we did end up with lua to talk to redis or memcached to do some traffic shaping at one point. One for bouncing to error pages before we switched to CDN (long story), and another one for doing something too clever by half about TTFB. It's still really cheap especially on a box that is just doing load balancing and nothing else.
If you wanted to throw another layer of load balancer in, there are consistent hashing-adjacent strategies in nginx+ that would allow you to go from 2 ingress routers to 3 shards with rate limiters to your services, using one KV store per box. But I highly suspect that the latency profile there will look remarkably similar to ingress routers doing rate limiting talking to a KV store cluster.
Constant memory, but those hashes will take up CPU cycles. If you're running a workload that completes sub 20 milliseconds, these cycles spent hashing may not be worth it over, say, a constant-time admission control like token bucket.