Prefill vs Decode: The Two Phases That Shape Latency
Every LLM request has two distinct phases with different performance characteristics. Understanding them is the key to targeted optimization.
2 posts tagged with "decode"
Every LLM request has two distinct phases with different performance characteristics. Understanding them is the key to targeted optimization.
Input tokens are cheap. Output tokens are expensive. The physics of transformer inference explains why, and what you can do about it.