<aside>
💡
VIbhor Sharma. - June 20th 2025
</aside>
- LLM is an AI model trained to understand and generate language
- Based on transform architechture
- Learns from massive text datasets
OpenAi GPT 4.5: 128k Tokens, Claude: Sonnet and opus, Gemini 2.5: 1m+ token context Multimodel (adds a lot of context due to token size which can be annoying), Meta LLaMA 3
Key Concepts
- Parameters (total sum of weights + bias)
- Billions to Trillions (larger model detect subtle changes and small models are good at one thing)
- Zero shot (do it once), few shots and then fine tuning
- Chain of thought reasoning- based on how human thinks
- Mixture of experts - only the expert is called (only 32 billion out of 100 billion parameters are working at once) more efficient
How do LLMs Work
- Predict next token based on context
- Self-attention and postional encoding (looking at a token and comparing it to the tokens around it to understand context)
- Embedding: turn text into vectors so the computer can understand
- Large Scale training with human feedback