Skip to content

feat(evaluators): add built-in budget evaluator for per-agent cost tracking #130

@amabito

Description

@amabito

Summary

Built-in budget evaluator for per-agent cumulative cost tracking.

Motivation

Retries and recursive tool chains pile up fast -- a 3-layer retry loop is 64 API calls from one user request. Current evaluators are stateless, so there's no way to express "deny after $X total spend" without maintaining a separate counter service outside the control plane.

Current behavior

Controls evaluate step content (regex, list, JSON, SQL) but can't track cumulative state across evaluations. Cost enforcement requires a custom evaluator with external state management.

Expected behavior

A built-in budget evaluator that:

  1. Tracks cumulative cost per agent (in-memory, or via PostgreSQL for persistence)
  2. Config: max_cost_usd, cost_per_1k_input_tokens, cost_per_1k_output_tokens
  3. On post-stage evaluation, reads token counts from step.output or step.context, accumulates
  4. Returns matched=True when ceiling is hit
  5. Pairs with existing actions -- deny for hard stop, steer with steering_context: {fallback_model: "..."} for degradation

Proposed solution

Should work as a regular evaluator. confidence could just be spent / limit (0-1 utilization ratio).

The stateful part is new -- current evaluators are stateless -- but the SQL evaluator already caches query analysis across calls, so there's precedent for evaluator-level state.

Additional context

LMK if a PR for this makes sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions