softmax
The softmax of \(\mathbf{z} = \{z_1, z_2,\cdots\}\) is: \[\sigma(z_i) = \frac{e^{z_i}}{\sum_i e^{z_i}}\]
1. Why is the softmax used in machine learning?
- It is differentiable. The arg-max is not.
- In most cases the max element in the list gets exponentially much more weight than everything else. This makes the output look like the output of an arg-max.
2. comments
Really it should be called the soft-arg-max, because it returns something that is close to a one-hot encoding of the arg-max. See tim viera.