I recently built a small open-source tool to benchmark different LLM API endpoints — including OpenAI, Claude, and self-hosted models (like llama.cpp).
It runs a configurable number of test requests and reports two key metrics:
• First-token latency (ms): How long it takes for the first token to appear
• Output speed (tokens/sec): Overall output fluency
Demo: https://llmapitest.com/
Code: https://github.com/qjr87/llm-api-test
The goal is to provide a simple, visual, and reproducible way to evaluate performance across different LLM providers, including the growing number of third-party “proxy” or “cheap LLM API” services.
It supports:
• OpenAI-compatible APIs (official + proxies)
• Claude (via Anthropic)
• Local endpoints (custom/self-hosted)
You can also self-host it with docker-compose.
Config is clean, adding a new provider only requires a simple plugin-style addition.
Would love feedback, PRs, or even test reports from APIs you’re using. Especially interested in how some lesser-known services compare.