Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love splay trees, skew heaps, etc. data structures that get very little love.

But I’ve actually tried to use splay trees in production code. The performance benefits versus a simple Patricia tree seemed totally absent in actual practice even for things like very frequent adjacent queries. The structure is neat but paying attention to memory layout and cache line size is far, far more important.



I use splay trees for their indirect utility. That is, the splay tree is a trivial way to calculate minimal, incremental updates to an append-only log. With most (all?) other b-tree structures, you would need to rewrite an unsustainable (or otherwise unpredictable) amount of tree data to the append-only log each time it is modified. The magic is that the splay operation redefines the root node every time you access the tree, so you just have to keep track of all the nodes you visited in each transaction batch and then write that sub-tree to the end of the log.

>paying attention to memory layout and cache line size is far, far more important.

Totally agree. For scenarios where I know I am going to have fewer than ~100k of something (assume an array of structs), I typically just do a linear scan over memory. If its a bunch of objects in memory or a cold access to storage, the trees become far more practical.


Hey, I'm really curious about how that works. Could you go into a bit more detail into that?

From what I can gather on your comment, it seems that if you are given a splay tree, and a batch of transactions i.e.

  GET 1
  SET 10 "hello"
  GET 15
You would write all the visited nodes, in order? But splaying involves a lot of tree operations and rotations too. I don't get how that works.


> But splaying involves a lot of tree operations and rotations too. I don't get how that works.

Correct, but the whole idea is that over time the same subset of hot nodes is going to trickle to the top of the tree, so you are just rewriting node offsets in your temporary list as you process the batch. A node may be logically modified 100x per batch, and only written to storage once.


One of the things that surprised me was that linear search on optimally laid out data was frequently faster than binary search with frequent pointer chasing.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: