Is there anything written about your multi-way join algorithm? We've done some really interesting implementations of Generic Join and the like and are always interested in hearing about other potentially better join algorithms.
It's only described in my thesis, which is not yet publicly available. (It should be, but the university thesis publication process seems to have a latency measured in years...)
The case for triangle joins is simple to describe, though. Suppose we have attributes A, B and C and finite maps (relations) R : A x B -> X, S : A x C -> Y and T : B x C -> Z. Now build tries for each so we have
abx : Trie A (Trie B X)
acy : Trie A (Trie C Y)
bcz : Trie B (Trie C Z)
We can join two tries using join : Trie k v1 -> Trie k v2 -> (v1 -> v2 -> v) -> Trie k v. Putting this all together we have:
Compared to a database that doesn't know how to handle triangle joins the difference is absolutely stunning. The above one-liner managed to do in 5 seconds what took MySQL over an hour...
Is there any way I could get a preprint of your thesis?
I've been working on a project that sits somewhere between spark and riak and this actually might address a core issue I have both at the node and query dispatch level.