useful reference for understanding model’s capabilities
types:
both benchmarks matter, but they serve diff purposes.
When you should run performance benchmark?
General load testing tools-
Locust and K6 used for simulating real-world traffic, focus on load testing: generating large numbers of concurrent requests to see how your LLM deployment performs
Specialized benchmarking tools-
NVIDIA GenAI-Perf and LLMPerf for LLM performance benchmarking, focus on inference level metrics such as throughput and latency.
Framework specific -
vLLM and SGLang offer their own benchmarking scripts, commands, and usage guidelines. They are helpful for quick experiment
end to end benchmarking with llm optimzer
llm-optimizer is an open-source tool for benchmarking and optimizing LLM inference performance. ,evaluates how an LLM behaves across different server parameters