Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The final aggregate read throughput reached approximately 6.6 TiB/s with background traffic from training jobs.

The Ceph team has been working on Crimson for years to get past performance bottlenecks inherent to the HDD-based design. I’m having troubles finding any ceph benchmark results that show any close to 100 GB/s.




3FS: 180 nodes, 2x200Gbps InfiniBand and 16x 14TiB NVMe SSDs per node, ~500 clients, 6.6 TiB/s of read throughput with training jobs workload

Ceph: 68 nodes, 2x100Gbps Mellanox and 10x 14TiB NVMe SSDs per node, 504 clients, 1TiB/s of FIO random read workload


The comparison is a little pears to apple. Similar nutritions but different enough to not draw conclusions. The hardware in the Ceph test is only capable of max 1.7TiB/s traffic (optimally without any overhead whatsoever).

I also assume that the batch size (block size) is different enough that this alone would make a big difference.


Even if we take different hardware into account we can readjust for measured vs theoretical throughput.

Ceph cluster achieves 1 TiB/s / 1.7 TiB/s = 0.58% of theoretical throughput.

3FS cluster achieves 6.6 TiB/s / 9 TiB/s = 0.73% of theoretical throughput.


That difference is still pronounced, yes. But the workload is so different. Training AI is hardly random read. Still not a comparison which should lead you to any conclusions.


I'd argue that they don't need a filesystem or an object storage, they need a purpose-built data serving layer optimized for their usecase.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: