Pre-computed zero-shot and agentic baseline results for all packaged benchmarks are hosted on the Hugging Face Hub:
https://huggingface.co/datasets/text2sql-eval-toolkit/text2sql-eval-results
# All benchmarks and pipelines (~7 GB)
text2sql-eval-toolkit results fetch
# Single benchmark only
text2sql-eval-toolkit results fetch --benchmarks bird_mini_dev_sqlite
# Specific toolkit version tag
text2sql-eval-toolkit results fetch --revision v1.1.0Results are placed in ${TEXT2SQL_DATA_ROOT:-./data}/results/, which is
the directory you are currently reading from.
You can also produce results yourself by running the evaluation pipeline:
python scripts/evaluation/run_evaluation.py --helpNote: This file is regenerated by
scripts/analysis/make_summary_report.pyafter a full evaluation run and will contain the actual benchmark summary table at that point.