#pagedattention

2 posts tagged with "pagedattention"

Apr 19, 2025

vLLM serves 10x more requests than naive PyTorch. PagedAttention, continuous batching, and memory management make the difference.

Mar 12, 2025

vLLM doesn't use a faster model. It uses memory smarter. PagedAttention treats KV cache like virtual memory, and the results are dramatic.