Memory Planning for Multi-GPU Deployments
Four GPUs don't give you 4x the KV cache memory. Communication overhead, activation memory, and synchronization eat into the gains. Plan accordingly.
1 post tagged with "planning"
Four GPUs don't give you 4x the KV cache memory. Communication overhead, activation memory, and synchronization eat into the gains. Plan accordingly.