problem with seq2seq ?
paper -NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE -Bahdanau attention 2015
paper -Effective Approaches to Attention-based Neural Machine Translation- Luong attention 2016
before attention based mechanism RNN based encoder decoder were popular choice -

How that RNN based works (for machine translation) -
Problem with RNN based encoder decoder -
three years later, researchers found that RNN architectures are not required for building deep neural networks for natural language processing and proposed the original transformer architecture (inspired by Bahdanau attention mechanism)
Self Attention- mechanism allow each position in input sequence to consider the relevancy of , or “attend to” all other positions in same sequence