#batching

3 posts tagged with "batching"

Apr 2, 2025

Batch size 1 wastes GPU. Batch size 64 kills latency. Somewhere in between is your sweet spot. Here's how to find it.

Mar 22, 2025

100 requests sounds like 100 requests. But one 50k-token request consumes more resources than 99 short ones combined. Batch by tokens, not requests.

Mar 8, 2025

Static batching wastes GPU cycles waiting for the slowest request. Continuous batching fills those gaps. The difference is 3-5x throughput.