Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's not widely true. E.g the GPT 4 tech report pointed out nearly all their experiments were done on models 1000x smaller than the final model.


Fair point, though I’d argue that there’s inherent selection bias for improvements that could fit a scaling law curve in the small model regime here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: