Extending Context Beyond Training Length
Models trained on 4K context can work at 32K with position interpolation. Quality degrades, but predictably. Know the tradeoffs before extending.
2 posts tagged with "context-length"
Models trained on 4K context can work at 32K with position interpolation. Quality degrades, but predictably. Know the tradeoffs before extending.
Models advertise 128K context windows. But attention quality degrades with distance. The last 10% of context often contributes less than the first 10%.