Understanding LSTM

3 Major gates used:

  1. Forget Gate
  2. Input Gate
  3. Output Gate

The equations used in each of these gates are:

1. Forget Gate

$$ f\_t = \sigma(W\f\cdot[h\\{t-1\},x\_t] + b\_f) $$

where,

$\sigma$ is the sigmoid function ($\sigma = \frac{1}{1 + e^{-x}}$)

$W\_f$ is the weight of the matrix for the forget gate

$[h\_\{t-1\}, x\_t]$ is the concatenation of the previous hidden state and current input

$b\_f$ is the bias vector for the forget gate

$f\_t$ is the vector of the forget gate for the current time

Note: $f\_t$ ranges between $0 - 1$.

2. Input Gate

$$ i\_t = \sigma ( W\i \cdot [h\\{t-1\}, x\_t] + b\_i ) \newline \sim C\_t = \tanh ( W\c \cdot [h\\{t-1\},x\_t] + b\_c ) $$

where,