UP | HOME

ReLU

1. why does ReLU work if the gradient is 0 on the left hand side?

  • The hope is that, across all inputs, there will be some inputs for which the activation falls on the right hand side. So a gradient can still flow through them. If there are no such inputs, the neuron is "dead" and will not change again, unless the inputs change. To avoid any completely dead neurons, you can use a leaky ReLU instead.
  • see this discussion

Created: 2025-11-02 Sun 18:55