CUDA out of memory when using Trainer with compute_metrics
Calculating GPU memory for serving LLMs
Fine-tuning LLM for RAG