#kv-cache

6 posts tagged with "kv-cache"

Nov 29, 2025

KV cache is 40% of memory for long contexts. Compression techniques trade compute for memory without significant quality loss. Know when to use them.

Jul 2, 2025

Everyone quantizes model weights. Few quantize the KV cache. But the cache is often the bigger memory consumer.

Jun 7, 2025

Where does memory go in a 70B model deployment? How do you know if KV cache is your bottleneck? Here's the diagnostic playbook.

Jun 4, 2025

Without the KV cache, generating 100 tokens would take 5,050 forward passes instead of 100. Here's how it works.

May 31, 2025

OOM at 32K context when your GPU 'should' handle it? Here's what's actually happening in GPU memory during long conversations.

Jan 15, 2025

A 2,000 token system prompt processed 10 million times a month. Without caching, you're paying to process the same tokens over and over.