Skip to content

Latest commit

 

History

History
61 lines (44 loc) · 2.87 KB

File metadata and controls

61 lines (44 loc) · 2.87 KB

Likelihood Variance as Text Importance for Resampling Texts to Map Language Models (EMNLP 2025 Findings)

📄 Paper

Likelihood Variance as Text Importance for Resampling Texts to Map Language Models
Momose Oyama, Ryo Kishino, Hiroaki Yamagiwa, Hidetoshi Shimodaira
arXiv:2505.15428 | accepted to EMNLP 2025 Findings

🔑 Key Results

Performance of LS and KL Sampling (Figure2)

With approximately half the number of unique texts, both LS and KL sampling achieve model map errors comparable to those of uniform sampling.

💨 Code (generate data): fig2_resampling_error.py
🥒 Data (plot-ready): data/fig2_resampling_error.pkl
📙 Notebook (visualize): figure2.ipynb

fig2a fig2b

Model Map with Resampled Texts (Figure3)

LS sampling is as robust as uniform sampling, but requires fewer texts.
Using only texts selected through LS sampling allows new models to be efficiently added to the map.

💨 Code (generate data): fig3a_mapvariance.py | fig3b_addnew.py
🥒 Data (plot-ready): data/fig3a_mapvariance.pkl | data/fig3b_addnew.pkl
📙 Notebook (visualize): figure3.ipynb

fig3

Prediction of Model's Performance (Figure4)

Using model coordinates from unique texts, we predict the average performance across six downstream tasks with ridge regression. See code_for_prediction/ for details.

fig4

🦉 Misc.

📚 Citation

@inproceedings{oyama-etal-2025-likelihood,
    author = {Momose Oyama and Ryo Kishino and Hiroaki Yamagiwa and Hidetoshi Shimodaira},
    title = {Likelihood Variance as Text Importance for Resampling Texts to Map Language Models},
    booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025},
    year = {2025}
}