To avoid unnecessary cache writes, there's also a plugin that does implement a r...

donavanm · on Jan 29, 2016

I believe the term youre looking for is "cache admission policy." This is an adjunct to cache eviction, both are needed for success. I'm very curious what a highly efficient insertion policy and trivial "eviction" policy (FIFO) would look like in practice.

Cache insertion research is generally focused on use cases like small associative hardware caches. There's very little applicable public research for larger software caching systems, that Ive found. Probably the best would be Gil Einziger. He appears to have found it as an application of his work on extremely space/time efficient counting of sets, http://www.graduate.technion.ac.il/Theses/Abstracts.asp?Id=2... and http://www.cs.technion.ac.il/~gilga/. Of notable mention is TinyLFU http://www.cs.technion.ac.il/~gilga/TinyLFU_PDP2014.pdf. Gil submitted it to Caffeine (Java caching library) last summer, https://github.com/ben-manes/caffeine/pull/24. It got some traction over the winter and is now showing up in other places (https://issues.apache.org/jira/browse/CASSANDRA-10855). In fact Ben Manes just had a guest post on High Scalability the other day http://highscalability.com/blog/2016/1/25/design-of-a-modern....

PS: If anyone is interested in these problems, We're Hiring.

edit: https://aws.amazon.com/careers/ or preferably drop me a line to my profile email or my username "at amazon.com" for a totally informal chat (Im an IC, not manager nor recruiter nor sales)

NovaX · on Jan 29, 2016

TinyLFU+FIFO is in the simulator (https://github.com/ben-manes/caffeine/wiki/Simulator). However, you'd probably also want the window cache to correct the deficiency outlined in the updated paper (http://arxiv.org/pdf/1512.00727.pdf). A FIFO version with that isn't in the simulator, but would be easy to add.

mutli1 trace - W-TinyLFU: 55.6% - TinyLFU+Random: 53.8% - TinyLFU+FIFO: 48.6% - TinyLFU+LRU: 54.8% - FIFO: 41.0% - LRU: 46.7%

FIFO's poor choice of a victim has a big impact, so it doesn't seem promising. A random policy looks like an attractive fit, though.

mvc · on Jan 29, 2016

Who's "we"?

cbsmith · on Jan 29, 2016

Yeah, with SSD's I wonder how much that really helps to improve performance vs. just no cache. Most SSD's have a lot of caching implemented internally, so disk cache can often be self defeating.

donavanm · on Jan 29, 2016

"It Depends." If youre doing "random" writes down to the block dev, like updating a filesystem, it can be very bad. You'll end up hitting the read/update/write cell issues and block other concurrent access. In general I'd worry (expect total throughput to go down, and tail latency way up) around a 10-20% write:read ratio. Conversely if youre doing sane sequential writes, say log structure merges with a 64-256KB chunk size, Id expect much less impact to your read latencies.

cbsmith · on Jan 29, 2016

This is a read cache though. If you just don't have it, you do reads on the SSD, which are pretty darn quick...

donavanm · on Jan 29, 2016

Ah, i must have missed the thread intent. I thought you were responding to the utility of limiting/optimizing cache write rate.

cbsmith · on Jan 29, 2016

You got that right. It's a funky cache. ;-)