-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Summary
Built-in budget evaluator for per-agent cumulative cost tracking.
Motivation
Retries and recursive tool chains pile up fast -- a 3-layer retry loop is 64 API calls from one user request. Current evaluators are stateless, so there's no way to express "deny after $X total spend" without maintaining a separate counter service outside the control plane.
Current behavior
Controls evaluate step content (regex, list, JSON, SQL) but can't track cumulative state across evaluations. Cost enforcement requires a custom evaluator with external state management.
Expected behavior
A built-in budget evaluator that:
- Tracks cumulative cost per agent (in-memory, or via PostgreSQL for persistence)
- Config:
max_cost_usd,cost_per_1k_input_tokens,cost_per_1k_output_tokens - On post-stage evaluation, reads token counts from
step.outputorstep.context, accumulates - Returns
matched=Truewhen ceiling is hit - Pairs with existing actions --
denyfor hard stop,steerwithsteering_context: {fallback_model: "..."}for degradation
Proposed solution
Should work as a regular evaluator. confidence could just be spent / limit (0-1 utilization ratio).
The stateful part is new -- current evaluators are stateless -- but the SQL evaluator already caches query analysis across calls, so there's precedent for evaluator-level state.
Additional context
LMK if a PR for this makes sense.