DynamiCS is an efficient and long-tail-aware data sampling method for vision-language model (VLM) pre-training. This repository contains the code used to build dynamic cluster-based sampling probabilities and to plug them into an OpenCLIP-style training pipeline.
Zero-shot top-1 classification on ImageNet-1K and Let It Wag! with full-training CLIP baselines. All models use a ViT-B/16 image encoder.
| Models | Dataset (Data Size) | Samples Seen @ Resolution | Tokens | ImageNet-1K | Let It Wag! | GPU-hours |
|---|---|---|---|---|---|---|
| OpenAI-WIT | --- (400M) |
12.8B@224 |
274 | 68.3 | 37.9 | 10700 |
| MetaCLIP-400M | --- (400M) |
12.8B@224 |
274 | 70.8 | 46.5 | ~10700 |
| OpenCLIP | LAION-400M (400M) |
12.8B@224 |
274 | 67.1 | 39.1 | 10736 |
| DynamiCS (Ours) | LAION-400M (298M) |
2.56B@112 + 128M@224 |
81 | 67.5 | 45.5 | 299 |
| DynamiCS (Ours) [ckpt] | DataComp-DFN (130M) |
1.28B@112 + 128M@224 |
81 | 71.3 | 50.2 | 163 |
| DynamiCS (Ours) [ckpt] | DataComp-DFN (130M) |
2.56B@112 + 128M@224 |
81 | 72.6 | 52.0 | 299 |
Key takeaways from the paper:
- On
LAION-400M, DynamiCS reaches67.5on ImageNet-1K and45.5on Let It Wag! using299GPU-hours, compared with10736GPU-hours for full-training OpenCLIP. - On
DataComp-DFN, DynamiCS reaches72.6on ImageNet-1K and52.0on Let It Wag! with only81tokens and299GPU-hours. - DynamiCS is especially strong on long-tail recognition, outperforming OpenCLIP, MetaCLIP-400M, and OpenAI-WIT on Let It Wag! while using substantially less compute.
The DataComp-DFN checkpoints and released SHA256-keyed sampling file are hosted on Hugging Face:
MingliangLiang3/DynamiCS-ViT-B-16-DataComp-DFN
This repository follows the standard OpenCLIP installation flow.
python3 -m venv .venv
source .venv/bin/activate
make install
make install-trainingFor the DynamiCS preprocessing pipeline and sampling-aware training, you will also need:
orjsonfor loading sampling-probability JSON files insrc/open_clip_train/data.pypyarrowfor parquet metadata written bytests/DynamiCS/embedding_dinov2.py- a FAISS build such as
faiss-cpuorfaiss-gpufor clustering and nearest-neighbor search
For the exact environment used in our experiments, see myenv.yml.
We use LAION-400M and a DataComp subset filtered by DFN. Choose one of the following options:
Option A — Download LAION-400M and DataComp directly:
- Follow the instructions at DataComp.
- Follow the instructions at img2dataset to download LAION-400M.
Option B — Build the DFN-filtered subset from scratch:
- Download the DFN filter index from apf1/datafilteringnetworks_2b.
- Match the index with DataComp using adams-story/dfn-200m.
- Download the matched subset using img2dataset.
The full DynamiCS workflow, including DINOv2 embedding extraction, FAISS clustering, sampling-probability generation, OpenCLIP training examples, and Slurm script references, is documented on a separate page:
That guide also explains the difference between the filename-keyed sampling JSON used during training and the SHA256-keyed companion file released for open-source distribution.
We use CLIP Benchmark to evaluate on a standard suite of 38 datasets in zero-shot classification and retrieval settings.
DynamiCS is implemented on top of OpenCLIP. Please also cite and acknowledge the OpenCLIP project if you use this repository in your work.
This work used the Dutch national e-infrastructure with the support of the SURF Cooperative. The computations were carried out on the Snellius supercomputer.
This repository is released under the MIT License.
If you use DynamiCS in your research, please cite the paper:
@article{liang2026dynamics,
title={Dynamic Cluster Data Sampling for Efficient and Long-Tail-Aware Vision-Language Pre-training},
author={Mingliang Liang and Zhuoran Liu and Arjen P. de Vries and Martha Larson},
journal={arXiv preprint arXiv:2604.27932},
year={2026}
}The repository metadata is also available in CITATION.cff.
For questions about DynamiCS, checkpoint access, or potential collaboration, please:
- open an issue at MingliangLiang3/DynamiCS
- contact Mingliang Liang at
mliang@cs.ru.nl
