Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Benchmarking has been already known to be far from a signal of quality for LLMs, but it's the "best" standardized way so far. Few exists like the food truck and the svg test. At the end of the day, there is only 1 way: having your own benchmark for your own application.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: