LSTM(Long Short Term Memory)

RNN의 장기문맥 의존성을 해결하기 위해 탄생한 모델

선별적 게이트라는 개념으로 선별 기억 능력을 확보한다.

게이트의 여닫는 정도는 가중치로 표현되며 가중치는 학습으로 알아낸다.

가중치

순환 신경망의 $\{U, V, W\}$에 4개를 추가하여 $\{U, U_i , U_o , W, W_i , W_o , V\}$

$i$ : 입력 게이트
$o$ : 출력 게이트
다양한 구조 설계가 가능하다.

Model Concept

Cell State

LSTM의 핵심
모듈 그림에서 수평으로 그어진 윗 선에 해당
일종의 컨베이어 벨트

작은 linear interaction만을 적용시키면서 데이터의 흐름은 그대로 유지한다.

아무런 동작을 추가하지 않는다면, 정보는 전혀 바뀌지 않고 그대로 흐른다.

Cell State에서 gate에 의해 정보가 추가되거나 삭제된다.

Gate

Forget Gate

$$ f_t=\sigma\left(W_f \cdot\left[h_{t-1}, x_t\right]+b_f\right) $$
Input Gate

$$ \begin{aligned}i_t & =\sigma\left(W_i \cdot\left[h_{t-1}, x_t\right]+b_i\right) \\\tilde{C}t & =\tanh \left(W_C \cdot\left[h{t-1}, x_t\right]+b_C\right)\end{aligned} $$

Cell State 업데이트

$$ C_t=f_t * C_{t-1}+i_t * \tilde{C}_t $$

Output Gate

$$ \begin{aligned}o_t & =\sigma\left(W_o\left[h_{t-1}, x_t\right]+b_o\right) \\h_t & =o_t * \tanh \left(C_t\right)\end{aligned} $$

수식 요약

$$ \begin{aligned}f_t & =\sigma_g\left(W_f x_t+U_f h_{t-1}+b_f\right) \\i_t & =\sigma_g\left(W_i x_t+U_i h_{t-1}+b_i\right) \\o_t & =\sigma_g\left(W_o x_t+U_o h_{t-1}+b_o\right) \\\tilde{c}t & =\sigma_c\left(W_c x_t+U_c h{t-1}+b_c\right) \\c_t & =f_t \odot c_{t-1}+i_t \odot \tilde{c}_t \\h_t & =o_t \odot \sigma_h\left(c_t\right)\end{aligned} $$

ft = sigmoid(np.dot(xt, Wf) + np.dot(ht_1, Uf) + bf)  # forget gate
it = sigmoid(np.dot(xt, Wi) + np.dot(ht_1, Ui) + bi)  # input gate
ot = sigmoid(np.dot(xt, Wo) + np.dot(ht_1, Uo) + bo)  # output gate
Ct = ft * Ct_1 + it * np.tanh(np.dot(xt, Wc) + np.dot(ht_1, Uc) + bc)
ht = ot * np.tanh(Ct)

Model Concept

Cell State

Gate

수식 요약

모델 요약