There's a difference between token throughput and latency. Token throughput is t...

frozenport · on Feb 19, 2024

https://wow.groq.com/artificialanalysis-ai-llm-benchmark-dou...

Seems to have it. Looks cost competitive but a lot faster.

nabakin · on Feb 19, 2024

People are using throughput and latency differently in different locations/contexts. Here they are referring to token throughput per user and first token/chunk latency. They don't mention the token throughput of the entire 576-chip system[0] that runs Llama 2 70b which would be the number we're looking for.

[0] https://news.ycombinator.com/item?id=38742581