Didn't thinking tokens resolve the most problematic part of autoregressive model...

blurbleblurble · 2026-02-20T16:29:27 1771604967

The reason I mentioned "purely autoregressive" is that realistically I expect hybrid diffusion + autoregressive models to be the first popular diffusion models. I could be wrong though. And diffusion models have other tricks like really easy integration with simple classifiers.

Check out this paper where they use diffusion during inference on the autoencoded prediction of an autoregressive model: https://openreview.net/forum?id=c05qIG1Z2B