Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That sounds really good.

I think reasons for slow adoption are probably a mix of:

1. Lack of developer awareness.

2. Security implications (or perceived implications) of exposing memory directly to a network without passing through CPU or application-level access control mechanisms.

The second point may be prohibitive for a lot of general purpose database systems which are intended to be run on shared infrastructure on virtualized instances.

Another reason may be that a lot of production systems are CPU-bound, not memory-bound. RDMA seems ideal for systems which require a lot of memory. I'm thinking maybe with recent advancements in AI/LLMs, it could be an interesting technology as these do require a huge amount of memory relative to CPU.



There are several good ideas in distributed databases that are effectively not deployable in cloud environments because the requisite hardware features don’t exist. Since cloud deployability is a pervasive prerequisite for commercial viability, database designs tend to overfit for the limitations of cloud environments even though we know how to do much better.

Basically, we are stuck in a local minima of making databases work well in cloud environments that are not designed to enable efficient distributed databases. It is wasteful and also provides an arbitrage opportunity for cloud companies.


What kind of hardware features are we missing on cloud?


- network packet timestamping via hardware

- this paper

- dedicated bandwidth (Azure gives you bandwidth based on instance size)

- XDP on network interface

Probably more, but those are what I know of from running into their non-existence.


The atomic clock is the key to enable distributed transactions and Google has a proprietary lock on their atomic clocks for Spanner.


You need this for <em>global</em> consistency, but for logical local (like a single entity) this is unnecessary.


Yeah, you have a much better understanding of this topic than me for sure :)


Do you know what is exactly proprietary in spanner? AFAIK most (all?) of the ideas existed before in the theoretical clock synchronization literature.


Nothing proprietary in Spanner but I believe no other vendor has an atomic clock similar to what Google has. Hence they are not able to implement the paxos global transaction lock which requires that all servers participating in the transaction be perfectly synchronized. This is what one of the comments above refers to I think.

So while the spanner paper is open to be implemented by any vendor, they don't have the proprietary advantage that Google has - the atomic clock. So Yugabyte, CockroachDB don't rely on atomic clocks. I tried to get to the ground level basics of this, but I haven't understood this matter completely yet.


I guess using worse clocks would mean using a (slightly?) slower spanner, but I'm not sure what is the impact. In any case, if a big vendor (e.g., Amazon IBM Oracle Dell...) would want something on par with Google's clock they probably can achieve it (though I don't know much about these clocks).


The problem is that the slower it is the more the window for error grows. And the harder it is to recover. It also impose a high boundary to latency. You cannot be faster than your clock error without risks.

Note that even Spanner had multiple downtime due to clock and/or network failures. In these case, any operation lal guarantees are lost. This makes it really dangerous.


Do you have indications why CPU bound? I have never seen Systems other than for example linear algebra or special algorithms to max out a modern CPU. Almost all loads today are in love way or another memory bound.


Bah Systems are either limited by CPU (processing) or by CPU (waiting for IO, especially memory)

Systems limited by memory as in "quantity of" are scarse


Only scarce since it's an easy distributed problem to solve compared to IO.


Distributed transactions are invariably latency (usually storage I/O) bound, rather than CPU or memory bound. I think your #2 is a big part of the challenge with RDMA, as well as the "trickiness" of the programming paradigm.


  The experimental setup involves using a cluster of 56 machines connected by an InfiniBand FDR network
That bit might have something to do with it.


Amazon will rent you infiband-class machines like the C6in.metal (https://instances.vantage.sh/aws/ec2/c6in.metal) with 200Gb/s of bandwidth. With EFA (https://aws.amazon.com/hpc/efa/) you can use HPC features like rDMA.


Heh Heh Heh

A bulk purchase of ~60 FDR IB cards, cabling, and network switches to support them sounds pretty expensive.

That being said, IB FDR gear is "older tech" now so the cards and switches can commonly be found reasonably cheaply on Ebay. The switches tend to be bloody loud though, so they're not something you'd want nearby if you can help it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: