#vllm

4 posts tagged with "vllm"

Apr 19, 2025

vLLM serves 10x more requests than naive PyTorch. PagedAttention, continuous batching, and memory management make the difference.

Apr 16, 2025

vLLM, SGLang, TensorRT-LLM—each optimizes for different things. Here's how to pick without running a 6-month bake-off.

Mar 12, 2025

vLLM doesn't use a faster model. It uses memory smarter. PagedAttention treats KV cache like virtual memory, and the results are dramatic.

Mar 8, 2025

Static batching wastes GPU cycles waiting for the slowest request. Continuous batching fills those gaps. The difference is 3-5x throughput.