Extending Context Beyond Training Length
Models trained on 4K context can work at 32K with position interpolation. Quality degrades, but predictably. Know the tradeoffs before extending.
1 post tagged with "rope"
Models trained on 4K context can work at 32K with position interpolation. Quality degrades, but predictably. Know the tradeoffs before extending.