That paper does show evidence of diminishing returns, for what it’s worth. You g...

That paper does show evidence of diminishing returns, for what it’s worth. You get less going from 64 to 540 than you do from 8 to 64. Combined with the increased costs of training gargantuan models, it’s not clear to me that models with trillions of parameters will really be worth it.