LLMs | Notion

https://blog.eleuther.ai/transformer-math/

https://kipp.ly/transformer-inference-arithmetic/

https://gwern.net

Positional Embeddings

Skip Connections

Why do we log the probabilities when doing forward pass and to calculate loss

Difference between Gradient Descent and Gradient Ascent

Why heavy weights are not good for a model, and makes the model highly sensitive

Numerical Stability with Smaller Weights

AdamW and Weight Decay

26/01/26

Speculative Decoding