Accepted by ICLR 2026
This repository contains the code used to evaluate KV-cache–based methods proposed in our ICLR 2026 submission, including MTEB evaluation, KV-CoE inference, and KV-based classification.
We follow the official MTEB evaluation protocol without modification.
Set up the environment according to the official MTEB instructions.
After the environment is ready, run custom_model.py, and another baselines with different suffix names.
The script evaluates our KV-cache–based setup by exposing KV representations as embeddings within the MTEB framework.
Create the environment using requirements.txt.
Run Scripts/llm_infer.sh to perform LLM inference based KV-based representations.
Run Scripts/llm_eval.sh to evaluate the performance.
The KVClassifier pipeline consists of the following steps:
- Set up the environment using
requirements.txt. - Run
prep_fast_slow_thinking_results.pyto generate baseline fast and slow thinking results. - Run
prep_kv_classfier_training_data.pyto construct the training dataset for KVClassifier. The distribution of difficulty labels can be inspected usingcount_difficulty.py. - Run
train_kv_classifier.pyto train the KVClassifier. - Run
eval_kv_classifier_classification.py, followed byparse_kv_classifier_results_classification.py, to evaluate the KVClassifier in the classification setting. - Run
eval_kv_classifier_generative.py, followed byparse_kv_classifier_results_generative.py, to evaluate the KVClassifier in the generative setting.
The core implementation of KVClassifier is provided in kv_classfier.py.
@inproceedings{
xing2026beyond,
title={Beyond Speedup - Utilizing {KV} Cache for Sampling and Reasoning},
author={Xing, Zeyu and Li, Xing and Zhen, Hui-Ling and Yuan, Mingxuan and Pan, Sinno Jialin},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=GUhmiJaAzv}
}