softmax

The softmax of \(\mathbf{z} = \{z_1, z_2,\cdots\}\) is: \[\sigma(z_i) = \frac{e^{z_i}}{\sum_i e^{z_i}}\]

1. Why is the softmax used in machine learning?

It is differentiable. The arg-max is not.
In most cases the max element in the list gets exponentially much more weight than everything else. This makes the output look like the output of an arg-max.

2. comments

Really it should be called the soft-arg-max, because it returns something that is close to a one-hot encoding of the arg-max. See tim viera.

3. Sources

Why use the softmax?

Created: 2024-07-15 Mon 01:27