Thank you for your very interesting work. I would like to ask about something mentioned in your paper: "Considering the auto-regressive inference pipeline of LLMs, we store these prefix tokens in the KV cache to prevent generating new outlier tokens during inference." I don't understand why storing outlier tokens in the prefix cache prevents the generation of new outlier tokens during inference. Could you please explain this further?
Thank you for your very interesting work. I would like to ask about something mentioned in your paper: "Considering the auto-regressive inference pipeline of LLMs, we store these prefix tokens in the KV cache to prevent generating new outlier tokens during inference." I don't understand why storing outlier tokens in the prefix cache prevents the generation of new outlier tokens during inference. Could you please explain this further?