There's a difference between token throughput and latency. Token throughput is the token throughput of the whole GPU/system and latency is the token throughput for an individual user. Groq offers extremely low latency (aka extremely high token throughput per user) but we still don't have numbers on the token throughput of their entire system. Nvidia's metrics here on the other hand, show us the token throughput of the whole GPU/system. So, in reality, while you might be able to get 1.5k t/s on an H100, the latency (token throughput per user) will be something much lower like 20 t/s.
The really important metric to look for is cost per token because even though Groq is able to run at low latency, that doesn't mean it's able to do it cheaply. Determining the cost per token can be done many ways but a useful way for us is approximately the cost of the system divided by the total token throughput of the system per second. We don't have the total token throughput per second of Groq's system so we can't really say how efficient it is. It could very well be that Groq is subsidizing the cost of their system to lower prices and gain PR and will increase their prices later on.
People are using throughput and latency differently in different locations/contexts. Here they are referring to token throughput per user and first token/chunk latency. They don't mention the token throughput of the entire 576-chip system[0] that runs Llama 2 70b which would be the number we're looking for.
The really important metric to look for is cost per token because even though Groq is able to run at low latency, that doesn't mean it's able to do it cheaply. Determining the cost per token can be done many ways but a useful way for us is approximately the cost of the system divided by the total token throughput of the system per second. We don't have the total token throughput per second of Groq's system so we can't really say how efficient it is. It could very well be that Groq is subsidizing the cost of their system to lower prices and gain PR and will increase their prices later on.