https://blog.eleuther.ai/transformer-math/

https://kipp.ly/transformer-inference-arithmetic/

Positional Embeddings

Skip Connections

Why do we log the probabilities when doing forward pass and to calculate loss

Difference between Gradient Descent and Gradient Ascent

Why heavy weights are not good for a model, and makes the model highly sensitive

Numerical Stability with Smaller Weights

Adam Optimizer

AdamW and Weight Decay

26/01/26

Speculative Decoding

Decoding Strategies

Evolution of Attention Mechanisms

04/02/26

LLM Inference Engines and Optimization

06/02/26

KL Divergence