How our dataset was curated? How to create the benchmark instances?
To obtain testable real-world repositories from GitHub, we propose a fully automated curation pipeline that utilizes GitHub Actions CI and LLM assistance, eliminating the need for human involvement in benchmark construction.
python -m dibench.curate.crawling --help-
Searches GitHub for repositories in
star_rangeforlanguage(10-star batches). -
Check each repo for workflows, if found, dump repo instance into JSONL.
python -m dibench.curate.curate --help- Locate the test CI file
- Locate the test job in the CI file
- Get the ACT command
- Sanitize & Mask
- Get the gold patch
python -m dibench.curate.verify --helpExpected:
- Tests Pass when dependencies unmasked
- Tests Fail when dependencies masked