Definitely not, but even with a comparison to 500 GPUs Groq will still come out ...

deepnotderp · on Feb 25, 2024

> GPUs Groq will still come out on top because you can never reduce latency by adding more parallel compute :)

You literally can, in fact that’s the entire reason to use multiple chips.

See eg the TPU group’s paper: https://arxiv.org/abs/2211.05102

varunvummadi · on Feb 19, 2024

So please let me know if I am wrong are you guys running a batch size of 1 in 500 GPU's? then why are the responses almost instant if you guys are using batch size 1 and also when can we expect bring your own fine tuned models kind of thing. Thanks!

tome · on Feb 19, 2024

We are not using 500 GPUs, we are using a large system built from many of our own custom ASICs. This allows us to do batch size 1 with no reduction in overall throughput. (We are doing pipelining though, so many users are using the same system at once).