Prefill vs Decode: The Two Phases That Shape Latency
Every LLM request has two distinct phases with different performance characteristics. Understanding them is the key to targeted optimization.
11 posts tagged with "architecture"
Every LLM request has two distinct phases with different performance characteristics. Understanding them is the key to targeted optimization.