The Formula for Offloading Decisions
Transfer cost vs recompute cost. If moving data off GPU costs less than recomputing it, offload. If not, keep it. The math is straightforward.
6 posts tagged with "gpu"
Transfer cost vs recompute cost. If moving data off GPU costs less than recomputing it, offload. If not, keep it. The math is straightforward.
GPU memory is precious. CPU memory is cheap. Moving the right data at the right time can 2x your concurrent requests.
OOM at 32K context when your GPU 'should' handle it? Here's what's actually happening in GPU memory during long conversations.
nvidia-smi says 90% utilization. Actual compute is 30%. Here's what GPU utilization really means and what to measure instead.
H100 costs 2x more than A100 but delivers 2x memory bandwidth. For decode-bound inference, that math matters.
GPUs dominate LLM inference. TPUs offer interesting economics. Here's how to think about the choice.