Automation_CodaBench

Template repo and skill to build CodaBench bundles with agents.

Example Output:

.venv ❯ python src/build.py --srcdir examples/matey_tutorial/ --outdir /tmp/test_output

Automation CodaBench
─────────────────────────────────────────────
  Source:  /Users/rzamora/Desktop/LBNL/AMSC/repos/Automation_CodaBench/examples/matey_tutorial
  Output:  /private/tmp/test_output
─────────────────────────────────────────────

   ▶ copy-template
  done copy-template (0.0s)
   ▶ read-source
  done read-source (0.0s)
   ▶ define-task
    → read-file /private/tmp/test_output/bundle/.source_dump
    ← --- docs/data.html --- <h1>MATEY: Data</h1>  <h2>Spatiotemporal Prediction Task</h2> <p>This comp...
    → write-file /private/tmp/test_output/bundle/.task_lock.yaml
    ← ok
  done define-task (5.8s)
   ▶ prepare-data
  done prepare-data (0.2s)
   ▶ write-ingestion
    → read-file /private/tmp/test_output/bundle/.task_lock.yaml
    ← title: "MATEY: Spatiotemporal Prediction" track: dev_and_test data_format: npy file_names:   trai...
    → read-file /private/tmp/test_output/bundle/ingestion_program/ingestion.py
    ← #!/usr/bin/env python3 """Ingestion program for CodaBench.  Loads data, imports the participant's...
    → write-file /private/tmp/test_output/bundle/ingestion_program/ingestion.py
    ← ok
  done write-ingestion (12.8s)
   ▶ write-scoring
    → read-file /private/tmp/test_output/bundle/scoring_program/score.py
    ← #!/usr/bin/env python3 """Scoring program for CodaBench.  Compares participant predictions agains...
    → read-file /private/tmp/test_output/bundle/scoring_program/metrics.py
    ← """Evaluation metrics for the competition.  TODO: Implement your metrics here using pure numpy.  ...
    → write-file /private/tmp/test_output/bundle/scoring_program/metrics.py
    ← ok
    → write-file /private/tmp/test_output/bundle/scoring_program/score.py
    ← ok
  done write-scoring (20.4s)
   ▶ write-starting-kit
    → read-file /private/tmp/test_output/bundle/starting_kit/sample_code_submission/model.py
    ← """Baseline model for the competition.  Participants must implement a `model` class with:   - fit...
    → write-file /private/tmp/test_output/bundle/starting_kit/sample_code_submission/model.py
    ← ok
  done write-starting-kit (24.7s)
   ▶ configure-yaml
    → read-file /private/tmp/test_output/bundle/competition.yaml
    ← # CodaBench Competition Bundle # Reference: https://docs.codabench.org/latest/Organizers/Benchmar...
    → read-file /private/tmp/test_output/bundle/pages/overview.html
    ← <h1>Overview</h1> <!-- TODO: Describe the scientific problem and motivation --> <p>This competiti...
    → read-file /private/tmp/test_output/bundle/pages/data.html
    ← <h1>Data</h1> <!-- TODO: Describe the data format, shapes, and how to load it -->  <h2>Format</h2...
    → read-file /private/tmp/test_output/bundle/pages/evaluation.html
    ← <h1>Evaluation</h1> <!-- TODO: Describe your metrics and how ranking works -->  <h2>Metrics</h2> ...
    → read-file /private/tmp/test_output/bundle/pages/terms.html
    ← <h1>Terms and Conditions</h1> <!-- TODO: Update with your competition's terms --> <p>By participa...
    → write-file /private/tmp/test_output/bundle/competition.yaml
    ← ok
    → write-file /private/tmp/test_output/bundle/pages/overview.html
    ← ok
    → write-file /private/tmp/test_output/bundle/pages/data.html
    ← ok
    → write-file /private/tmp/test_output/bundle/pages/evaluation.html
    ← ok
    → write-file /private/tmp/test_output/bundle/pages/terms.html
    ← ok
  done configure-yaml (190.8s)
   ▶ validate
  done validate (0.3s)
   ▶ zip-bundle
  done zip-bundle (0.0s)

done /private/tmp/test_output/bundle.zip

────────────────────────────────────────────────────────────
Run Scorecard
────────────────────────────────────────────────────────────

LLM Configuration
  Model:           claude-haiku-4-5-20251001
  Max tokens:      4096
  Input pricing:   $1.00 / 1M tokens
  Output pricing:  $5.00 / 1M tokens

Skill                  Status     Time
──────────────────────────────────────
  copy-template          done     0.0s
  read-source            done     0.0s
  define-task            done     5.8s
  prepare-data           done     0.2s
  write-ingestion        done    12.8s
  write-scoring          done    20.4s
  write-starting-kit     done    24.7s
  configure-yaml         done   190.8s
  validate               done     0.3s
  zip-bundle             done     0.0s
──────────────────────────────────────
  Total                    10   254.9s

Cost
  LLM calls:          26
  Input tokens:    270467
  Output tokens:    13974
  Total cost:     $0.3403

  Skill                  Calls      In     Out     Cost
  ───────────────────────────────────────────────────
  configure-yaml            11  162720    8203 $0.2037
  define-task                3    9888     471 $0.0122
  write-ingestion            4   22049    1718 $0.0306
  write-scoring              5   43265    2767 $0.0571
  write-starting-kit         3   32545     815 $0.0366

Agentic Loss
  Loss:  0.00  (perfect)
  Grade: A
────────────────────────────────────────────────────────────

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
examples/matey_tutorial/docs		examples/matey_tutorial/docs
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automation_CodaBench

Example Output:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automation_CodaBench

Example Output:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages