Skip to content

fix(heartbeat-runner): capture cost_usd and token counts from Claude CLI#82

Open
mt-alarcon wants to merge 1 commit into
evolution-foundation:mainfrom
mt-alarcon:fix/heartbeat-runner-capture-cost-tokens
Open

fix(heartbeat-runner): capture cost_usd and token counts from Claude CLI#82
mt-alarcon wants to merge 1 commit into
evolution-foundation:mainfrom
mt-alarcon:fix/heartbeat-runner-capture-cost-tokens

Conversation

@mt-alarcon
Copy link
Copy Markdown

@mt-alarcon mt-alarcon commented May 14, 2026

Summary

  • heartbeat_runner.step7_invoke_claude was hardcoding tokens_in / tokens_out / cost_usd to None, with a comment claiming the Claude CLI "doesn't expose token counts easily".
  • That claim is outdated. The runner already invokes the CLI with --output-format json (line ~249), and the CLI emits a single-line JSON object on success with total_cost_usd at top level plus usage.{input,output}_tokens nested.
  • Result: the heartbeat_runs.cost_usd column was always NULL, making cost tracking and budget analysis impossible.
  • Fix: best-effort parse of the CLI JSON output. Failure paths (status != \"success\" or malformed output) keep all three fields None — no behavior change for the existing error contract.

Why this matters now

Anthropic just announced (2026-05-14) that starting 2026-06-15, Max 5x plan subscribers get a dedicated $100/month credit for Agent SDK / claude -p usage, separate from the subscription's interactive limit. Each subscriber will be asked to claim the credit and optionally enable "extra usage" billing for overage.

To make that decision rationally, an operator needs to know:

  1. Total programmatic cost per month per heartbeat (do the heartbeats fit in $100?)
  2. Which heartbeats are the biggest spenders (where to optimize first if they don't fit)
  3. The cache_creation overhead per cold run (which compounds heartbeat cost surprisingly fast — empirically ~$0.50 per run just to load Claude Code's system context)

None of this analysis is possible while cost_usd is NULL. With this fix, the data starts flowing immediately on the next dispatch cycle, and operators have 30 days of real data before the credit launches.

Output format reference

Validated against Claude CLI 2.1.141. Sample top-level JSON response:

{
  \"type\": \"result\",
  \"subtype\": \"success\",
  \"result\": \"ok\",
  \"total_cost_usd\": 0.5241925,
  \"usage\": {
    \"input_tokens\": 6,
    \"output_tokens\": 6,
    \"cache_creation_input_tokens\": 83842,
    \"cache_read_input_tokens\": 0,
    \"service_tier\": \"standard\"
  },
  \"modelUsage\": {
    \"claude-opus-4-7\": {
      \"inputTokens\": 6,
      \"outputTokens\": 6,
      \"costUSD\": 0.5241925
    }
  },
  \"...\": \"...\"
}

total_cost_usd aggregates across all turns in the run, including cache creation. usage.input_tokens / output_tokens are the visible Claude-side counts; cache fields are separate (intentionally not surfaced here — operators who want cache analysis can read the output column from the JSONL log, which already preserves the full response).

Test plan

  • CLI shape verified — ran claude --print --output-format json --max-turns 1 \"Responda apenas: ok\" against CLI 2.1.141; the response is a single-line JSON with all expected keys.
  • Inline parse — sample payload parses correctly: cost_usd=0.524, tokens_in=6, tokens_out=6.
  • Failure path unchanged — when status != \"success\", the try: block is skipped and all three fields remain None (existing behavior preserved).
  • Malformed outputoutput = \"not json\" falls into the except Exception: pass branch and returns None values (no crash).
  • Local deploy — restarted the dashboard with the fix applied; subsequent heartbeat invocations populate heartbeat_runs.cost_usd correctly (full validation pending the next ~1h dispatch cycle).

Notes for reviewers

  • The import json as _json is intentional — the function already imports json at module level in some forks but not all; the local-scoped import keeps the diff self-contained and avoids touching the import block.
  • output.strip().splitlines()[-1] defensively handles the case where the CLI prepends a stray log line before the JSON (rare but seen with verbose logging on some CLI builds). Last line is canonical.
  • No DB migration needed — the cost_usd column already exists in the heartbeat_runs schema.

Summary by Sourcery

Capture cost and token usage metrics from Claude CLI heartbeat invocations and store them in heartbeat run results.

New Features:

  • Record tokens_in, tokens_out, and cost_usd for successful Claude CLI heartbeat runs by parsing the CLI JSON output.

Enhancements:

  • Add a helper to parse Claude CLI JSON output defensively, including logging on parse failures without exposing raw content.
  • Introduce a module-level logger for heartbeat_runner to report non-fatal parsing issues.

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 14, 2026

Reviewer's Guide

Update heartbeat runner to parse Claude CLI JSON output on successful runs and populate heartbeat usage metrics (tokens_in, tokens_out, cost_usd) instead of always returning None, with robust error handling and non-sensitive logging on parse failures.

Sequence diagram for updated Claude CLI heartbeat invocation and cost parsing

sequenceDiagram
    participant HeartbeatRunner as step7_invoke_claude
    participant ClaudeCLI as Claude_CLI
    participant Parser as _parse_claude_cli_usage
    participant Logger as log

    HeartbeatRunner->>ClaudeCLI: invoke (claude --output-format json)
    ClaudeCLI-->>HeartbeatRunner: status, output, error

    alt status == success
        HeartbeatRunner->>Parser: _parse_claude_cli_usage(output)
        alt JSON parse succeeds
            Parser-->>HeartbeatRunner: tokens_in, tokens_out, cost_usd
        else JSON parse fails
            Parser->>Logger: log.warning(...)
            Parser-->>HeartbeatRunner: None, None, None
        end
    else status != success
        Note over HeartbeatRunner: tokens_in/tokens_out/cost_usd remain None
    end

    HeartbeatRunner-->>HeartbeatRunner: return result dict with tokens_in, tokens_out, cost_usd
Loading

File-Level Changes

Change Details Files
Populate heartbeat result tokens and cost from Claude CLI JSON output instead of hardcoding them to None.
  • Introduce _parse_claude_cli_usage helper to extract input/output tokens and total_cost_usd from Claude CLI --output-format json response, including defensive handling of empty or malformed output.
  • Wire step7_invoke_claude to call the parser only on successful CLI invocations and return parsed tokens_in, tokens_out, and cost_usd values in the result dict.
  • Round total_cost_usd to 6 decimal places to limit floating-point drift while avoiding a schema migration from FLOAT to a higher-precision type.
  • On parse failures, catch JSON/shape-related exceptions, log a metadata-only warning that omits raw output content for security/PII safety, and fall back to returning None values so existing failure behavior is preserved.
  • Add a module-level logger for heartbeat_runner to support the new warning log message.
dashboard/backend/heartbeat_runner.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • Catching a blanket Exception on the JSON parse makes debugging harder; consider narrowing this to json.JSONDecodeError (and maybe KeyError/TypeError) or at least logging a warning when parsing fails so silent data loss is visible.
  • Since total_cost_usd is monetary data, consider normalizing it (e.g., via Decimal or explicit rounding) before returning so that downstream consumers don’t depend on raw binary float values from the CLI.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Catching a blanket `Exception` on the JSON parse makes debugging harder; consider narrowing this to `json.JSONDecodeError` (and maybe `KeyError`/`TypeError`) or at least logging a warning when parsing fails so silent data loss is visible.
- Since `total_cost_usd` is monetary data, consider normalizing it (e.g., via `Decimal` or explicit rounding) before returning so that downstream consumers don’t depend on raw binary float values from the CLI.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@mt-alarcon mt-alarcon force-pushed the fix/heartbeat-runner-capture-cost-tokens branch from d3f6dca to 19e714a Compare May 14, 2026 13:52
@mt-alarcon
Copy link
Copy Markdown
Author

@sourcery-ai both points addressed in 19e714a:

  1. Narrowed the exception catch to (json.JSONDecodeError, IndexError, TypeError, AttributeError) — the four failure modes that can actually arise from json.loads(output.strip().splitlines()[-1]) + .get() chains. Plus a print(..., file=sys.stderr, flush=True) line so parse failures are visible (the whole point of this PR was undoing silent data loss, so swallowing a parse error would defeat the purpose). Stderr matches the pattern used elsewhere in the module — no new logging dependency.

  2. Rounded total_cost_usd to 6 decimal places (micro-dollar) before returning, plus an isinstance(raw_cost, (int, float)) guard for defensive handling. Full Decimal would require migrating the FLOAT column on heartbeat_runs — out of scope for a single-file fix. Six-place rounding is enough precision for aggregated SUM queries to stay stable.

Test plan re-run for the amended commit:

  • Round-trip 0.5241925 → 0.524192 ✓
  • Malformed output → JSONDecodeError caught, stderr line fires ✓
  • Empty output → IndexError on [-1] caught ✓
  • Non-numeric total_cost_usd → isinstance returns False, cost_usd stays None ✓
  • Original success path unchanged ✓

@mt-alarcon
Copy link
Copy Markdown
Author

@sourcery-ai review

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Consider reusing the module’s existing logging mechanism (or a shared helper) instead of print(..., file=sys.stderr) in the parse failure path so error reporting is consistent and easier to control.
  • The JSON parsing logic in step7_invoke_claude is starting to get fairly involved; factoring it into a small helper (e.g., _parse_claude_cli_usage(output)) would keep the main function focused on orchestration and make the parsing behavior easier to unit-test and evolve if the CLI format changes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider reusing the module’s existing logging mechanism (or a shared helper) instead of `print(..., file=sys.stderr)` in the parse failure path so error reporting is consistent and easier to control.
- The JSON parsing logic in `step7_invoke_claude` is starting to get fairly involved; factoring it into a small helper (e.g., `_parse_claude_cli_usage(output)`) would keep the main function focused on orchestration and make the parsing behavior easier to unit-test and evolve if the CLI format changes.

## Individual Comments

### Comment 1
<location path="dashboard/backend/heartbeat_runner.py" line_range="314" />
<code_context>
+            print(
+                f"[heartbeat_runner] step7 cost parse failed "
+                f"({parse_exc.__class__.__name__}); cost/tokens=None for this run. "
+                f"Output preview: {output[:200]!r}",
+                file=sys.stderr,
+                flush=True,
</code_context>
<issue_to_address>
**🚨 issue (security):** Logging a preview of the raw model output may leak sensitive data into stderr.

Since this is a failure path, logging up to 200 chars of the raw model output can still expose sensitive user content to stderr and downstream log collectors. Consider removing the preview, or logging only minimally sensitive metadata (e.g., length, a short fixed prefix, or a hash) so you preserve failure signal without exposing content.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread dashboard/backend/heartbeat_runner.py Outdated
@mt-alarcon mt-alarcon force-pushed the fix/heartbeat-runner-capture-cost-tokens branch from 19e714a to 12e3201 Compare May 14, 2026 13:57
@mt-alarcon
Copy link
Copy Markdown
Author

@sourcery-ai review

Round 2 addressed in 12e3201:

  1. Switched to loggingimport logging + log = logging.getLogger(__name__) at module level, parse failure now uses log.warning(...). Plays with whatever logging config the host already has — no new infra.

  2. Extracted _parse_claude_cli_usage(output) helper — pure function, input str → tuple[int|None, int|None, float|None]. step7_invoke_claude becomes a one-liner: tokens_in, tokens_out, cost_usd = _parse_claude_cli_usage(output) if status == 'success' else (None, None, None). Trivially unit-testable in isolation.

  3. 🚨 Security — removed raw output preview from the log line. The CLI output can contain PII or model-generated content from the user's prompt (emails, customer data, credentials in some flows). The log now contains only the exception class name and the output length — enough to triage, nothing to leak. Validated:

>>> _parse_claude_cli_usage('not json')
WARNINGstep7_invoke_claude: failed to parse Claude CLI JSON output (JSONDecodeError, output_len=8); cost/tokens will be NULL for this run.
(None, None, None)

All 4 sanity scenarios pass: happy path, malformed output, empty output, non-numeric cost (isinstance guard catches it without crashing).

@mt-alarcon
Copy link
Copy Markdown
Author

@sourcery-ai review

Copy link
Copy Markdown

@SourceryAI SourceryAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've reviewed your changes and they look great!


Hi @mt-alarcon! 👋

Thanks for trying out Sourcery by commenting with @sourcery-ai review! 🚀

Install the sourcery-ai bot to get automatic code reviews on every pull request ✨

Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The new _parse_claude_cli_usage helper uses json.loads but there is no json import in this file’s diff (and the description mentions import json as _json), so double-check that json is actually imported and that the name used in code matches the import to avoid a NameError.
  • Given that step7_invoke_claude already returns status, you might consider passing status into _parse_claude_cli_usage and logging it on parse failures as well, which can make it easier to correlate malformed output with specific failure modes without exposing any prompt content.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `_parse_claude_cli_usage` helper uses `json.loads` but there is no `json` import in this file’s diff (and the description mentions `import json as _json`), so double-check that `json` is actually imported and that the name used in code matches the import to avoid a `NameError`.
- Given that `step7_invoke_claude` already returns `status`, you might consider passing `status` into `_parse_claude_cli_usage` and logging it on parse failures as well, which can make it easier to correlate malformed output with specific failure modes without exposing any prompt content.

## Individual Comments

### Comment 1
<location path="dashboard/backend/heartbeat_runner.py" line_range="322-325" />
<code_context>
+        return None, None, None
+    try:
+        payload = json.loads(output.strip().splitlines()[-1])
+        raw_cost = payload.get("total_cost_usd")
+        cost_usd = round(raw_cost, 6) if isinstance(raw_cost, (int, float)) else None
+        usage = payload.get("usage") or {}
+        return usage.get("input_tokens"), usage.get("output_tokens"), cost_usd
</code_context>
<issue_to_address>
**suggestion:** Consider handling string-valued `total_cost_usd` from the CLI more gracefully.

If `total_cost_usd` is ever sent as a string (e.g. "0.000123"), `cost_usd` will end up as `None`. To keep cost tracking working if the upstream format changes, consider parsing string values with `float(raw_cost)` (catching `ValueError` and falling back to `None`).

```suggestion
        payload = json.loads(output.strip().splitlines()[-1])
        raw_cost = payload.get("total_cost_usd")

        if isinstance(raw_cost, (int, float)):
            cost_value = raw_cost
        elif isinstance(raw_cost, str):
            try:
                cost_value = float(raw_cost)
            except ValueError:
                cost_value = None
        else:
            cost_value = None

        cost_usd = round(cost_value, 6) if cost_value is not None else None
        usage = payload.get("usage") or {}
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread dashboard/backend/heartbeat_runner.py
step7_invoke_claude was hardcoding tokens_in/tokens_out/cost_usd to None
with a comment claiming "Claude CLI doesn't expose token counts easily".
That comment is outdated — the runner already passes `--output-format json`
to the CLI, which emits a single-line JSON object containing:

  - `total_cost_usd` (top level)
  - `usage.input_tokens`, `usage.output_tokens`
  - `usage.cache_creation_input_tokens`, `usage.cache_read_input_tokens`
  - `modelUsage.<model>.{costUSD,inputTokens,outputTokens,...}`

Result: the heartbeat_runs.cost_usd column was always NULL despite the
schema, making cost tracking and budget analysis impossible.

Sourcery review rounds 1, 2, and 3 incorporated (2026-05-14):

Round 1:
  - Narrowed the catch from blanket Exception to specific failure modes
    (json.JSONDecodeError + IndexError + TypeError + AttributeError).
  - Added warning logging on parse failure (silent data loss was the
    original problem; swallowing the new parse failure would be
    self-defeating).
  - Rounded total_cost_usd to 6 decimal places (micro-dollar) before
    returning so aggregated SUM queries don't accumulate float drift.

Round 2:
  - Switched from `print(file=sys.stderr)` to `logging.getLogger(__name__)`
    + `log.warning(...)` — consistent control plane, no new dependency.
  - Extracted `_parse_claude_cli_usage(output)` helper so step7 stays
    focused on subprocess orchestration. Helper is pure and unit-testable.
  - SECURITY: removed the 200-char raw output preview from the failure
    log. CLI output can contain PII or sensitive model-generated content
    from the user's prompt. Log now contains ONLY the exception class
    and the output length.

Round 3:
  - Accept stringified `total_cost_usd` defensively (forward-compat with
    a possible future Decimal-serializer wrap on the CLI side). Today
    CLI 2.1.141 always emits float, but the cost is one isinstance check
    and a try/float() — cheap forward-compat for monetary data.
  - Non-numeric / dict / list / None all still fall back to NULL safely.

Round 3 also raised two points we didn't change:
  - "json.loads with no import in the diff" → false positive. `json` is
    already imported at module top (line 21 of upstream/main, unchanged
    by this PR, hence not in the diff). No NameError.
  - "Pass status into the helper" → the helper is only called when
    status == 'success' (one-line ternary in the caller). Logging
    status inside the helper would be redundant.

## Why this matters now

Anthropic announced (2026-05-14) that starting 2026-06-15, Max 5x plan
subscribers get a dedicated $100/month credit for Agent SDK / claude -p
usage, separate from interactive subscription limits. To decide whether
the credit covers the workload (and how much extra-usage cap to set),
operators need per-run cost data — which is exactly what this fix
restores. With the fix deployed today, 30 days of real data accumulate
before the credit launches.

## Test plan

- [x] CLI shape verified against Claude CLI 2.1.141 (single-line JSON,
  `total_cost_usd` top-level, `usage.input_tokens`, `usage.output_tokens`).
- [x] Helper sanity scenarios:
  - Float cost (today's reality): `(tokens, tokens, 0.524192)` ✓
  - String cost (forward-compat): `(1, 2, 1.23e-4)` ✓
  - Garbage string cost: `(1, 2, None)` — ValueError caught ✓
  - Dict cost (defensive): `(3, 4, None)` — falls to None ✓
  - Missing `usage`: `(None, None, 1.0)` ✓
  - Malformed JSON: `(None, None, None)`, log.warning fires ✓
  - Empty output: `(None, None, None)`, no log ✓
- [x] Failure path unchanged: status != "success" → helper not invoked,
  all three fields remain None (existing contract preserved).
- [x] SECURITY: failure log contains only exception class + output
  length, never the raw output content.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@mt-alarcon mt-alarcon force-pushed the fix/heartbeat-runner-capture-cost-tokens branch from 12e3201 to b2064b7 Compare May 14, 2026 14:06
@mt-alarcon
Copy link
Copy Markdown
Author

@sourcery-ai review

Round 3 addressed in b2064b7:

Applied — string-valued total_cost_usd (your inline suggestion): isinstance + float(raw_cost) with ValueError catch + fall-through to None for everything else (dict/list/None/etc). 7 sanity scenarios pass — float (today's reality), stringified, garbage string, dict, missing usage, malformed JSON, empty.

Not changed — json import: false positive in your review. json is already imported at line 21 of upstream/main (import argparse, import json, …); it's not in the diff because the import line itself is unchanged by this PR. No NameError at runtime — verified by the sanity scenarios above all executing successfully.

Not changed — passing status into the helper: the helper is only called when status == 'success' (one-line ternary on the caller: _parse_claude_cli_usage(output) if status == 'success' else (None, None, None)). Logging status inside the helper would always print "success" — no signal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants