This repository contains the code and configuration for creating image retrieval benchmarks for Sage Image Search using the imsearch_benchmaker framework. It also contains other datasets that we use to benchmark text-to-image retrieval systems in various scientific domains.
This repository provides tools and pipelines to create standardized benchmark datasets for evaluating text-to-image retrieval systems in various scientific domains. Each benchmark follows a consistent pipeline architecture that automates the entire dataset creation process, from raw image collection to publication on Hugging Face. It also contains other datasets that we use to benchmark text-to-image retrieval systems in various scientific domains.
| Dataset | Domain | Description | Final Dataset | Code |
|---|---|---|---|---|
| FireBench | Fire Science 🔥 | A benchmark dataset for evaluating text-to-image retrieval systems in the domain of fire science. | FireBench on Hugging Face | FireBenchMaker |
| CommonObjectsBench | General Objects & Scenes 🌍 | A benchmark dataset for evaluating text-to-image retrieval systems on general objects and common scenes. | CommonObjectsBench on Hugging Face | CommonObjectsBenchMaker |
| CloudBench | Nephology(Atmospheric Science) 🌥 | A benchmark dataset for evaluating text-to-image retrieval systems in the domain of Atmospheric Science specifically focused on clouds. | CloudBench on Hugging Face | CloudBenchMaker |
| Inquire | Biology 🌿 | A benchmark dataset for evaluating text-to-image retrieval systems in the domain of biology. | INQUIRE-Benchmark-small on Hugging Face | Inquire |
| SageBench | Sage Continuum 🌲 | A benchmark dataset for evaluating text-to-image retrieval systems on Sage Continuum sensor images when queries reference Sage metadata (vsn, zone, host, job, plugin, camera, project, address). | SageBench on Hugging Face | SageBenchMaker |
All benchmarks in this repository use the imsearch_benchmaker framework, which provides:
- Automated pipeline execution (preprocessing → annotation → query planning → judging → postprocessing)
- Integration with adapters for vision annotation and query generation (OpenAI, Google, etc.)
- Adapters for similarity scoring (apple/DFN5B-CLIP-ViT-H-14-378)
- Hugging Face dataset preparation and upload
- Exploratory data analysis tools
For detailed instructions, see the individual benchmark README files.
imsearch_benchmarks/
├── README.md # This file
├── docker/ # Docker config to run the pipeline in a container
├── FireBenchMaker/ # FireBench benchmark
│ ├── README.md # FireBench documentation
│ ├── config.toml # Pipeline configuration
│ ├── dataset_card.md # Dataset card for Hugging Face
│ ├── requirements.txt # Python dependencies
│ ├── tools/ # Data collection scripts
│ │ ├── get_figlib.py
│ │ ├── get_sage.py
│ │ └── get_wildfire.py
│ └── ...
└── ...
To add a new benchmark:
- Create a new directory for your benchmark
- Set up
config.tomlfollowing the imsearch_benchmaker configuration format - Add data collection tools if needed
- Create a README.md documenting your benchmark
- If needed, add a new adapter for your benchmark
You can use imsearch_benchmarks with imsearch_eval to provide imsearch_eval with a set of benchmarks to evaluate the performance of the image search system. If you need to create a new benchmark, you can use the imsearch_benchmaker framework to create a new benchmark.