Evaluation Results

Pre-computed zero-shot and agentic baseline results for all packaged benchmarks are hosted on the Hugging Face Hub:

https://huggingface.co/datasets/text2sql-eval-toolkit/text2sql-eval-results

Downloading

# All benchmarks and pipelines (~7 GB)
text2sql-eval-toolkit results fetch

# Single benchmark only
text2sql-eval-toolkit results fetch --benchmarks bird_mini_dev_sqlite

# Specific toolkit version tag
text2sql-eval-toolkit results fetch --revision v1.1.0

Results are placed in ${TEXT2SQL_DATA_ROOT:-./data}/results/, which is the directory you are currently reading from.

Generating locally

You can also produce results yourself by running the evaluation pipeline:

python scripts/evaluation/run_evaluation.py --help

Note: This file is regenerated by scripts/analysis/make_summary_report.py after a full evaluation run and will contain the actual benchmark summary table at that point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation Results

Downloading

Generating locally

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Evaluation Results

Downloading

Generating locally