> The final aggregate read throughput reached approximately 6.6 TiB/s with backg...

pat2man · on Feb 28, 2025

Seems easy to find: https://ceph.io/en/news/blog/2024/ceph-a-journey-to-1tibps/

menaerus · on Feb 28, 2025

3FS: 180 nodes, 2x200Gbps InfiniBand and 16x 14TiB NVMe SSDs per node, ~500 clients, 6.6 TiB/s of read throughput with training jobs workload

Ceph: 68 nodes, 2x100Gbps Mellanox and 10x 14TiB NVMe SSDs per node, 504 clients, 1TiB/s of FIO random read workload

ibotty · on March 4, 2025

The comparison is a little pears to apple. Similar nutritions but different enough to not draw conclusions. The hardware in the Ceph test is only capable of max 1.7TiB/s traffic (optimally without any overhead whatsoever).

I also assume that the batch size (block size) is different enough that this alone would make a big difference.

menaerus · on March 5, 2025

Even if we take different hardware into account we can readjust for measured vs theoretical throughput.

Ceph cluster achieves 1 TiB/s / 1.7 TiB/s = 0.58% of theoretical throughput.

3FS cluster achieves 6.6 TiB/s / 9 TiB/s = 0.73% of theoretical throughput.

ibotty · on March 11, 2025

That difference is still pronounced, yes. But the workload is so different. Training AI is hardly random read. Still not a comparison which should lead you to any conclusions.

nivertech · on Feb 28, 2025

I'd argue that they don't need a filesystem or an object storage, they need a purpose-built data serving layer optimized for their usecase.