DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.
-
Updated
Apr 17, 2026 - Python
DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.
DataFlex is a data-centric training framework that enhances model performance by either selecting the most influential samples, optimizing their weights, or adjusting their mixing ratios.
(ACL 2026 Main) LLMSurgeon recovers the pretraining data mixture of any LLM from only its generated text — no weights, no training data. A calibrated domain classifier plus label-shift correction de-blurs biased predictions. Ships with LLMScan, a benchmark on 8 open-source LLMs.
Add a description, image, and links to the data-mixture topic page so that developers can more easily learn about it.
To associate your repository with the data-mixture topic, visit your repo's landing page and select "manage topics."