The semantic equivalence of possible outputs is already encoded in the model. Wh...

The semantic equivalence of possible outputs is already encoded in the model. While it is not necessarily recoverable from the logits of a particular sampling rollout it exists throughout prior layers.

So this is basically saying we shouldn't try to estimate entropy over logits, but should be able to learn a function from activations earlier in the network to a degree of uncertainty that would signal (aka be classifiable as) confabulation.