Checklist
Describe the bug
One question: I notice that most models are trained with a context length of around 2K. When dealing with ultra-long contexts—such as 16K, 32K, or even 128K—can the draft model still maintain reasonable accuracy?
Reproduction
.
Environment
.
Checklist
Describe the bug
One question: I notice that most models are trained with a context length of around 2K. When dealing with ultra-long contexts—such as 16K, 32K, or even 128K—can the draft model still maintain reasonable accuracy?
Reproduction
.
Environment
.