Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 975 Bytes

File metadata and controls

36 lines (24 loc) · 975 Bytes

Evaluation Results

Pre-computed zero-shot and agentic baseline results for all packaged benchmarks are hosted on the Hugging Face Hub:

https://huggingface.co/datasets/text2sql-eval-toolkit/text2sql-eval-results

Downloading

# All benchmarks and pipelines (~7 GB)
text2sql-eval-toolkit results fetch

# Single benchmark only
text2sql-eval-toolkit results fetch --benchmarks bird_mini_dev_sqlite

# Specific toolkit version tag
text2sql-eval-toolkit results fetch --revision v1.1.0

Results are placed in ${TEXT2SQL_DATA_ROOT:-./data}/results/, which is the directory you are currently reading from.

Generating locally

You can also produce results yourself by running the evaluation pipeline:

python scripts/evaluation/run_evaluation.py --help

Note: This file is regenerated by scripts/analysis/make_summary_report.py after a full evaluation run and will contain the actual benchmark summary table at that point.