Reducing KV Cache Size Without Quality Loss
KV cache is 40% of memory for long contexts. Compression techniques trade compute for memory without significant quality loss. Know when to use them.
1 post tagged with "compression"
KV cache is 40% of memory for long contexts. Compression techniques trade compute for memory without significant quality loss. Know when to use them.