my notes, polished with claude
Sources
Finetuning means training a pre-trained network on new data to improve its performance on a specific task.
Two types: full fine-tuning and PEFT (parameter efficient fine tuning). PEFT has various methods - LoRA, QLoRA, etc.
As models get larger, full fine-tuning becomes infeasible on consumer hardware. Storing and deploying independently fine-tuned models also gets expensive fast - each one is the same size as the original pretrained model. PEFT addresses both.
Full fine-tuning also produces catastrophic forgetting - the model loses general capabilities as it overfits to new data. PEFT avoids this.
With PEFT, the small trained weights sit on top of the frozen pretrained LLM. Same base model, multiple tasks, just swap the small weights.
Less to train and store. If d = 1000, k = 5000, the original weight matrix has 5,000,000 parameters. With r = 5, the adaptation matrices have (1000×5) + (5×5000) = 30,000 parameters - less than 1% of the original.