#decode

2 posts tagged with "decode"

Feb 12, 2025

Every LLM request has two distinct phases with different performance characteristics. Understanding them is the key to targeted optimization.

Jan 8, 2025

Input tokens are cheap. Output tokens are expensive. The physics of transformer inference explains why, and what you can do about it.