UP | HOME

LSTM

At each time-step \(t\), the LSTM sees:

1. Updating the cell

First, we need to determine what part of the old cell state we want to keep. We compute a forget vector: \[f_t = \sigma(W_f \cdot [h_{t-1}, x_{t}] + b_f)\] where \(W_f\) is a 2-D weight matrix and \(b_f\) is a vector of bias weights. Then, \(f_t\) is a vector of weights between 0 and 1.

Then, we need to determine what updates we are adding to the new cell state. Let \(\tilde{C}_t\) be the new cell state and \(i_t\) be the coefficient that determines how much of the new state get's added: \[ \tilde{C}_t = \tanh(W_C[h_{t-1},x_t] + b_C) \] and \[ i_t = \sigma(W_i[h_{t-1},x_t] + b_i) \]

Then, we have our updated cell state! \[ C_t = f_t * C_t + i_t * \tilde{C}_t \]

2. Creating output

First, we need to decide which pieces of the cell state to output. So, we create an output vector: \[ o_t = \sigma(W_o[h_{t-1},x_{t}] + b_o) \]

Then, the output is: \[ h_t = o_t * \tanh(C_t) \]

Note that all the matrices \(W_o\), \(W_f\), and \(W_i\) only deal with the previous output and the current input.

3. helpful links

Created: 2024-07-15 Mon 01:27