Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are still blatant failure modes, when models engage into clear sycophancy, rather than expressing enthusiasm, etc.

I'd guess, in practice a benchmark (like this vibesbench), that could help catching unhelpful and blatant sycophancy fails may help.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: