All Tags

#memory

17 posts tagged with "memory"

When to Move Data Off the GPU

GPU memory is precious. CPU memory is cheap. Moving the right data at the right time can 2x your concurrent requests.

Understanding What Makes vLLM Fast

vLLM serves 10x more requests than naive PyTorch. PagedAttention, continuous batching, and memory management make the difference.

How vLLM Serves 10x More Requests

vLLM doesn't use a faster model. It uses memory smarter. PagedAttention treats KV cache like virtual memory, and the results are dramatic.