Skip to content

feat: Improved metrics in ART#609

Merged
vivekkalyan merged 57 commits intomainfrom
feat/improved-metrics
Mar 12, 2026
Merged

feat: Improved metrics in ART#609
vivekkalyan merged 57 commits intomainfrom
feat/improved-metrics

Conversation

@vivekkalyan
Copy link
Copy Markdown
Collaborator

@vivekkalyan vivekkalyan commented Mar 10, 2026

Summary

This PR implements the W&B metrics taxonomy RFC in ART so run metrics land in stable top-level namespaces that are easier to panel and compare in W&B.

The main behavior change is hierarchical cost logging. Users can log leaf metrics like costs/train/llm_judge/correctness or costs/train/tinker_train, and ART will automatically emit parent rollups like costs/train and costs/all, plus cumulative costs/cum/* metrics across steps.

This PR also adds a MetricsBuilder API and a @track_api_cost(...) decorator so judge / external API spend can be logged into the same taxonomy with model-aware pricing.

What changed

  • Canonicalized ART/W&B metric sections toloss/*, throughput/*, costs/*, time/*, data/*, train/*, val/* test/* and discarded/*.
  • Routed backend-emitted metrics into those namespaces across local, serverless, unsloth, and tinker paths.
  • Added MetricsBuilder for user-owned metrics, hierarchical cost rollups, costs/cum/*, cumulative time/* and data/*, derived throughput metrics, exact data/cum/num_unique_scenarios, and persisted resume state.
  • Added automatic ART-owned metrics where available, including time/step_wall_s, time/step_trainer_s, time/step_actor_s, time/step_eval_s, data/step_*, train/*, and automatic local costs/gpu when pricing is known.
  • Added @track_api_cost(...) for OpenAI/Anthropic responses with explicit provider and model_name, cache-aware pricing, and hooks for custom extractors / registered pricing. Infer pricing when using litellm
  • Updated W&B metric definitions and docs for the new taxonomy.
  • Removed support for legacy costs_* keys in favor of hierarchical costs/... paths.

Notes

  • track_api_cost(...) requires explicit provider and model_name and raises if pricing is missing
  • ART still uses training_step as the x-axis;
  • But, we can now change the x-axis to metrics other than training_step (e.g. costs)
image

@vivekkalyan vivekkalyan marked this pull request as ready for review March 10, 2026 23:25
@vivekkalyan vivekkalyan merged commit f04c673 into main Mar 12, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants