3.transformer | Notion

self attention in transformer-

Self Attention works by computing attention scores for each word in a sequence based on its relationship with every other word. These scores determine how much focus each word receives during processing, allowing the model to prioritize relevant information and capture complex dependencies across the sequence.

its a neural network architecture to handle seq2seq task (instead of lstm there are self attention mechanism for parallel processing)

self-attention allows the model to directly compute relationships between all elements in a sequence, regardless of their distance from each other

ANN -tabular data

CNN -image based data

RNN -sequential data

Transformer -seq2seq task (machine translation, question & ans, text summarization)

then comes the bahdanau attention and luoyng architecture , after this we knew trransformer

paper- Attention is all you need -2017

no lstm/rnn are used instead we get self attention for parallel training
by various components stable architecture
hyperparameters are stable and robust