#context

5 posts tagged with "context"

Oct 22, 2025

Why Tokens at Position 50K Get Ignored

Attention scores decay with distance. By position 50K, tokens may have near-zero influence. Positional encodings have practical limits, regardless of window size.

Oct 11, 2025

Getting 95% Quality at 12% Cost

Most queries don't need the full context. Selecting the right 12% often preserves 95% of quality at a fraction of the cost and latency.

May 31, 2025

What Senior Engineers Know About GPU Memory

OOM at 32K context when your GPU 'should' handle it? Here's what's actually happening in GPU memory during long conversations.

Jan 22, 2025

How to Think About Context as a Budget

A 128K context window doesn't mean you should use 128K tokens. Context is a budget with diminishing returns and escalating costs.

Jan 11, 2025

Why Doubling Context Quadruples Your Problems

Double your context window, quadruple your compute. The O(n²) attention cost catches teams off guard when they scale.