Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The semantic equivalence of possible outputs is already encoded in the model. While it is not necessarily recoverable from the logits of a particular sampling rollout it exists throughout prior layers.

So this is basically saying we shouldn't try to estimate entropy over logits, but should be able to learn a function from activations earlier in the network to a degree of uncertainty that would signal (aka be classifiable as) confabulation.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: