useful reference for understanding model’s capabilities

types:

both benchmarks matter, but they serve diff purposes.

When you should run performance benchmark?

General load testing tools-

Locust and K6 used for simulating real-world traffic, focus on load testing: generating large numbers of concurrent requests to see how your LLM deployment performs

Specialized benchmarking tools-

NVIDIA GenAI-Perf and LLMPerf for LLM performance benchmarking, focus on inference level metrics such as throughput and latency.

Framework specific -

vLLM and SGLang offer their own benchmarking scripts, commands, and usage guidelines. They are helpful for quick experiment

end to end benchmarking with llm optimzer

llm-optimizer is an open-source tool for benchmarking and optimizing LLM inference performance. ,evaluates how an LLM behaves across different server parameters