All Tags

#scaling

7 posts tagged with "scaling"

Adding GPUs Without Linear Speedup

Four GPUs don't give you 4x throughput. Communication overhead, load imbalance, and synchronization eat into gains. Know the scaling curve before you buy.

Designing Queues That Don't Explode

An unbounded queue is a memory leak waiting to happen. A too-small queue drops requests unnecessarily. Here's how to size and manage LLM request queues.

Managing Load Without Dropping Requests

Traffic spikes 10x. Do you queue requests until OOM, drop them randomly, or gracefully degrade? The answer shapes your system's behavior under pressure.