The reason for exp(x) is that its derivative is exp(x), which makes it possible ...

dkislyuk · 2026-05-01T18:50:10 1777661410

I agree that "it has nice derivatives" is a great empirical reason to use a specific function in ML, but it doesn't sufficiently prove that it's the best function to use. And even if a derivative term looks more complex, that doesn't necessarily imply that it is more computationally expensive to compute, so that can't be the only criteria to select a function.

Luckily, there are more axiomatic reasons for why softmax is the preferred way to map inputs to a probability distribution.