Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That paper does show evidence of diminishing returns, for what it’s worth. You get less going from 64 to 540 than you do from 8 to 64. Combined with the increased costs of training gargantuan models, it’s not clear to me that models with trillions of parameters will really be worth it.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: