#optimization

38 posts tagged with "optimization"

Feb 26, 2025

Separating Real Speedups from Benchmarketing

FlashAttention claims 2-4x speedup. CUDA graphs claim 10x. What actually helps in production, and what's just good marketing?

Feb 12, 2025

Prefill vs Decode: The Two Phases That Shape Latency

Every LLM request has two distinct phases with different performance characteristics. Understanding them is the key to targeted optimization.

Feb 5, 2025

Why First Token Latency Determines User Experience

Users don't perceive throughput. They perceive the silence before the first token appears. TTFT is the metric that determines whether your app feels fast.

Jan 29, 2025

Knowing Which Feature Burns Money

Your LLM bill is one number. Your product has twenty features. Without cost attribution, you're optimizing in the dark.

Jan 22, 2025

How to Think About Context as a Budget

A 128K context window doesn't mean you should use 128K tokens. Context is a budget with diminishing returns and escalating costs.

Jan 18, 2025

Calculating End-to-End Latency Correctly

E2EL = TTFT + generation time sounds simple. But where does that time actually go? Understanding the equation reveals where to optimize.

Jan 15, 2025

Why Your System Prompt Costs $50K/Month

A 2,000 token system prompt processed 10 million times a month. Without caching, you're paying to process the same tokens over and over.

Jan 11, 2025

Why Doubling Context Quadruples Your Problems

Double your context window, quadruple your compute. The O(n²) attention cost catches teams off guard when they scale.