Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> However, sometimes there are several good answers and so all the good answers get a lower probability because there are 5 of them.

That's the result after softmax. If you want to act on the raw results, you can still do that.



The results before softmax don't sum to one so don't even act like a probability distribution. And that's the point. When you have the pre-softmax activations, there are infinitely many ways to convert them to something probability-like. You can normalize them after taking the square root, the square, raising to three, etc. Or you can exponentiate and for some reason that does better. Either way it's not a 'real' probability distribution.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: