That sounds really good. I think reasons for slow adoption are probably a mix of...

jandrewrogers · on April 11, 2023

There are several good ideas in distributed databases that are effectively not deployable in cloud environments because the requisite hardware features don’t exist. Since cloud deployability is a pervasive prerequisite for commercial viability, database designs tend to overfit for the limitations of cloud environments even though we know how to do much better.

Basically, we are stuck in a local minima of making databases work well in cloud environments that are not designed to enable efficient distributed databases. It is wasteful and also provides an arbitrage opportunity for cloud companies.

jpalomaki · on April 11, 2023

What kind of hardware features are we missing on cloud?

withinboredom · on April 11, 2023

- network packet timestamping via hardware

- this paper

- dedicated bandwidth (Azure gives you bandwidth based on instance size)

- XDP on network interface

Probably more, but those are what I know of from running into their non-existence.

deepGem · on April 11, 2023

The atomic clock is the key to enable distributed transactions and Google has a proprietary lock on their atomic clocks for Spanner.

withinboredom · on April 11, 2023

You need this for <em>global</em> consistency, but for logical local (like a single entity) this is unnecessary.

deepGem · on April 11, 2023

Yeah, you have a much better understanding of this topic than me for sure :)

amitport · on April 11, 2023

Do you know what is exactly proprietary in spanner? AFAIK most (all?) of the ideas existed before in the theoretical clock synchronization literature.

deepGem · on April 11, 2023

Nothing proprietary in Spanner but I believe no other vendor has an atomic clock similar to what Google has. Hence they are not able to implement the paxos global transaction lock which requires that all servers participating in the transaction be perfectly synchronized. This is what one of the comments above refers to I think.

So while the spanner paper is open to be implemented by any vendor, they don't have the proprietary advantage that Google has - the atomic clock. So Yugabyte, CockroachDB don't rely on atomic clocks. I tried to get to the ground level basics of this, but I haven't understood this matter completely yet.

amitport · on April 11, 2023

I guess using worse clocks would mean using a (slightly?) slower spanner, but I'm not sure what is the impact. In any case, if a big vendor (e.g., Amazon IBM Oracle Dell...) would want something on par with Google's clock they probably can achieve it (though I don't know much about these clocks).

di4na · on April 11, 2023

The problem is that the slower it is the more the window for error grows. And the harder it is to recover. It also impose a high boundary to latency. You cannot be faster than your clock error without risks.

Note that even Spanner had multiple downtime due to clock and/or network failures. In these case, any operation lal guarantees are lost. This makes it really dangerous.

jeffreygoesto · on April 11, 2023

Do you have indications why CPU bound? I have never seen Systems other than for example linear algebra or special algorithms to max out a modern CPU. Almost all loads today are in love way or another memory bound.

JackSlateur · on April 11, 2023

Bah Systems are either limited by CPU (processing) or by CPU (waiting for IO, especially memory)

Systems limited by memory as in "quantity of" are scarse

lightbendover · on April 11, 2023

Only scarce since it's an easy distributed problem to solve compared to IO.

cbsmith · on April 11, 2023

Distributed transactions are invariably latency (usually storage I/O) bound, rather than CPU or memory bound. I think your #2 is a big part of the challenge with RDMA, as well as the "trickiness" of the programming paradigm.

HopenHeyHi · on April 11, 2023

  The experimental setup involves using a cluster of 56 machines connected by an InfiniBand FDR network

That bit might have something to do with it.

necubi · on April 11, 2023

Amazon will rent you infiband-class machines like the C6in.metal (https://instances.vantage.sh/aws/ec2/c6in.metal) with 200Gb/s of bandwidth. With EFA (https://aws.amazon.com/hpc/efa/) you can use HPC features like rDMA.

justinclift · on April 11, 2023

Heh Heh Heh

A bulk purchase of ~60 FDR IB cards, cabling, and network switches to support them sounds pretty expensive.

That being said, IB FDR gear is "older tech" now so the cards and switches can commonly be found reasonably cheaply on Ebay. The switches tend to be bloody loud though, so they're not something you'd want nearby if you can help it.