2.attention | Notion

problem with seq2seq ?

can’t parse large paragraphs problem with encoder (grammatical context matters a lot)
decoder face problem with static data (instead can be have the dynamic one )

paper -NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE -Bahdanau attention 2015

paper -Effective Approaches to Attention-based Neural Machine Translation- Luong attention 2016

before attention based mechanism RNN based encoder decoder were popular choice -

How that RNN based works (for machine translation) -

encoder takes sequence of token from source language as a input
hidden state is added (ANN) which encodes a compressed representation of entire input sequence
then decoder uses its current hidden state to begin the translations , token by token

Problem with RNN based encoder decoder -

RNN’s can’t directly access earlier hidden states from encoder during decoder phase
Relies solely on current hidden state (which encapsulates all the information)
Lead to loss of context in complex sentences

three years later, researchers found that RNN architectures are not required for building deep neural networks for natural language processing and proposed the original transformer architecture (inspired by Bahdanau attention mechanism)

Self Attention- mechanism allow each position in input sequence to consider the relevancy of , or “attend to” all other positions in same sequence