I’m still learning about a lot of this stuff so apologies if this is wrong.
Isn’t part of the motivation for using deep learning that it can actually generalize over that many parameters? My understanding is that classical statistical models are prone to overfitting on data as you add more parameters to the model. Deep learning tries to learn the underlying distribution of data, enabling predictions not found in the dataset that still match the underlying positional statistics of the dataset.
In that context I’d say the comparison is apt. It’s not just throwing more compute at the problem - it’s doing so in a way that actually achieves scalable accuracy gains. With pre training and self-supervised learning, you can (feasibly) train such a model without even needing to understand statistics.
Isn’t part of the motivation for using deep learning that it can actually generalize over that many parameters? My understanding is that classical statistical models are prone to overfitting on data as you add more parameters to the model. Deep learning tries to learn the underlying distribution of data, enabling predictions not found in the dataset that still match the underlying positional statistics of the dataset.
In that context I’d say the comparison is apt. It’s not just throwing more compute at the problem - it’s doing so in a way that actually achieves scalable accuracy gains. With pre training and self-supervised learning, you can (feasibly) train such a model without even needing to understand statistics.