forward forward
Instead of storing gradients at each layer, perform the following at each layer:
- maximize the activations for the positive examples
- minimize the activations for the negative examples
Note that gradient descent is used per-layer. But backprop is not used on the whole network