#multi-gpu

3 posts tagged with "multi-gpu"

Oct 1, 2025

Tensor vs Pipeline Parallelism: When Each Wins

Tensor parallelism cuts latency by splitting layers across GPUs. Pipeline parallelism increases throughput by splitting the model into stages. Choose based on your constraint.

Sep 27, 2025

Adding GPUs Without Linear Speedup

Four GPUs don't give you 4x throughput. Communication overhead, load imbalance, and synchronization eat into gains. Know the scaling curve before you buy.

Sep 24, 2025

Memory Planning for Multi-GPU Deployments

Four GPUs don't give you 4x the KV cache memory. Communication overhead, activation memory, and synchronization eat into the gains. Plan accordingly.