Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Definitely not, but even with a comparison to 500 GPUs Groq will still come out on top because you can never reduce latency by adding more parallel compute :)


> GPUs Groq will still come out on top because you can never reduce latency by adding more parallel compute :)

You literally can, in fact that’s the entire reason to use multiple chips.

See eg the TPU group’s paper: https://arxiv.org/abs/2211.05102


So please let me know if I am wrong are you guys running a batch size of 1 in 500 GPU's? then why are the responses almost instant if you guys are using batch size 1 and also when can we expect bring your own fine tuned models kind of thing. Thanks!


We are not using 500 GPUs, we are using a large system built from many of our own custom ASICs. This allows us to do batch size 1 with no reduction in overall throughput. (We are doing pipelining though, so many users are using the same system at once).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: