the most cited is terminal bench 2.0, but its also plagued by cheating accusatio... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		nikcub 18 days ago \| parent \| context \| favorite \| on: Show HN: OSS Agent I built topped the TerminalBenc... the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing. somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark [0] https://www.tbench.ai/leaderboard/terminal-bench/2.0

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact