Template repo and skill to build CodaBench bundles with agents.
.venv ❯ python src/build.py --srcdir examples/matey_tutorial/ --outdir /tmp/test_output
Automation CodaBench
─────────────────────────────────────────────
Source: /Users/rzamora/Desktop/LBNL/AMSC/repos/Automation_CodaBench/examples/matey_tutorial
Output: /private/tmp/test_output
─────────────────────────────────────────────
▶ copy-template
done copy-template (0.0s)
▶ read-source
done read-source (0.0s)
▶ define-task
→ read-file /private/tmp/test_output/bundle/.source_dump
← --- docs/data.html --- <h1>MATEY: Data</h1> <h2>Spatiotemporal Prediction Task</h2> <p>This comp...
→ write-file /private/tmp/test_output/bundle/.task_lock.yaml
← ok
done define-task (5.8s)
▶ prepare-data
done prepare-data (0.2s)
▶ write-ingestion
→ read-file /private/tmp/test_output/bundle/.task_lock.yaml
← title: "MATEY: Spatiotemporal Prediction" track: dev_and_test data_format: npy file_names: trai...
→ read-file /private/tmp/test_output/bundle/ingestion_program/ingestion.py
← #!/usr/bin/env python3 """Ingestion program for CodaBench. Loads data, imports the participant's...
→ write-file /private/tmp/test_output/bundle/ingestion_program/ingestion.py
← ok
done write-ingestion (12.8s)
▶ write-scoring
→ read-file /private/tmp/test_output/bundle/scoring_program/score.py
← #!/usr/bin/env python3 """Scoring program for CodaBench. Compares participant predictions agains...
→ read-file /private/tmp/test_output/bundle/scoring_program/metrics.py
← """Evaluation metrics for the competition. TODO: Implement your metrics here using pure numpy. ...
→ write-file /private/tmp/test_output/bundle/scoring_program/metrics.py
← ok
→ write-file /private/tmp/test_output/bundle/scoring_program/score.py
← ok
done write-scoring (20.4s)
▶ write-starting-kit
→ read-file /private/tmp/test_output/bundle/starting_kit/sample_code_submission/model.py
← """Baseline model for the competition. Participants must implement a `model` class with: - fit...
→ write-file /private/tmp/test_output/bundle/starting_kit/sample_code_submission/model.py
← ok
done write-starting-kit (24.7s)
▶ configure-yaml
→ read-file /private/tmp/test_output/bundle/competition.yaml
← # CodaBench Competition Bundle # Reference: https://docs.codabench.org/latest/Organizers/Benchmar...
→ read-file /private/tmp/test_output/bundle/pages/overview.html
← <h1>Overview</h1> <!-- TODO: Describe the scientific problem and motivation --> <p>This competiti...
→ read-file /private/tmp/test_output/bundle/pages/data.html
← <h1>Data</h1> <!-- TODO: Describe the data format, shapes, and how to load it --> <h2>Format</h2...
→ read-file /private/tmp/test_output/bundle/pages/evaluation.html
← <h1>Evaluation</h1> <!-- TODO: Describe your metrics and how ranking works --> <h2>Metrics</h2> ...
→ read-file /private/tmp/test_output/bundle/pages/terms.html
← <h1>Terms and Conditions</h1> <!-- TODO: Update with your competition's terms --> <p>By participa...
→ write-file /private/tmp/test_output/bundle/competition.yaml
← ok
→ write-file /private/tmp/test_output/bundle/pages/overview.html
← ok
→ write-file /private/tmp/test_output/bundle/pages/data.html
← ok
→ write-file /private/tmp/test_output/bundle/pages/evaluation.html
← ok
→ write-file /private/tmp/test_output/bundle/pages/terms.html
← ok
done configure-yaml (190.8s)
▶ validate
done validate (0.3s)
▶ zip-bundle
done zip-bundle (0.0s)
done /private/tmp/test_output/bundle.zip
────────────────────────────────────────────────────────────
Run Scorecard
────────────────────────────────────────────────────────────
LLM Configuration
Model: claude-haiku-4-5-20251001
Max tokens: 4096
Input pricing: $1.00 / 1M tokens
Output pricing: $5.00 / 1M tokens
Skill Status Time
──────────────────────────────────────
copy-template done 0.0s
read-source done 0.0s
define-task done 5.8s
prepare-data done 0.2s
write-ingestion done 12.8s
write-scoring done 20.4s
write-starting-kit done 24.7s
configure-yaml done 190.8s
validate done 0.3s
zip-bundle done 0.0s
──────────────────────────────────────
Total 10 254.9s
Cost
LLM calls: 26
Input tokens: 270467
Output tokens: 13974
Total cost: $0.3403
Skill Calls In Out Cost
───────────────────────────────────────────────────
configure-yaml 11 162720 8203 $0.2037
define-task 3 9888 471 $0.0122
write-ingestion 4 22049 1718 $0.0306
write-scoring 5 43265 2767 $0.0571
write-starting-kit 3 32545 815 $0.0366
Agentic Loss
Loss: 0.00 (perfect)
Grade: A
────────────────────────────────────────────────────────────