Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The simple fact that they did not list the current SOTA for the size class in their comparison table tells you all you need to know about their confidence. And listing Gemma-2B is like shooting fish in a barrel, might as well also put RedPajama on there.

It's good to see MoE being attempted at the smaller sizes, and it may scale well downwards as well given their results. But regardless, 1.25T is very little training data compared to the 6T that Mistral 7B received and even that makes it barely usable and likely not yet saturated. Before it, the sub-13B size class was considered basically an academic exercise.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: