Why Tokens at Position 50K Get Ignored
Attention scores decay with distance. By position 50K, tokens may have near-zero influence. Positional encodings have practical limits, regardless of window size.
2 posts tagged with "limitations"
Attention scores decay with distance. By position 50K, tokens may have near-zero influence. Positional encodings have practical limits, regardless of window size.
Models advertise 128K context windows. But attention quality degrades with distance. The last 10% of context often contributes less than the first 10%.