#gpu

6 posts tagged with "gpu"

Dec 3, 2025

Transfer cost vs recompute cost. If moving data off GPU costs less than recomputing it, offload. If not, keep it. The math is straightforward.

Jun 11, 2025

GPU memory is precious. CPU memory is cheap. Moving the right data at the right time can 2x your concurrent requests.

May 31, 2025

OOM at 32K context when your GPU 'should' handle it? Here's what's actually happening in GPU memory during long conversations.

May 21, 2025

nvidia-smi says 90% utilization. Actual compute is 30%. Here's what GPU utilization really means and what to measure instead.

Apr 26, 2025

H100 costs 2x more than A100 but delivers 2x memory bandwidth. For decode-bound inference, that math matters.

Apr 23, 2025

GPUs dominate LLM inference. TPUs offer interesting economics. Here's how to think about the choice.