fix(heartbeat-runner): capture cost_usd and token counts from Claude CLI by mt-alarcon · Pull Request #82 · evolution-foundation/evo-nexus

mt-alarcon · 2026-05-14T13:48:15Z

Summary

heartbeat_runner.step7_invoke_claude was hardcoding tokens_in / tokens_out / cost_usd to None, with a comment claiming the Claude CLI "doesn't expose token counts easily".
That claim is outdated. The runner already invokes the CLI with --output-format json (line ~249), and the CLI emits a single-line JSON object on success with total_cost_usd at top level plus usage.{input,output}_tokens nested.
Result: the heartbeat_runs.cost_usd column was always NULL, making cost tracking and budget analysis impossible.
Fix: best-effort parse of the CLI JSON output. Failure paths (status != \"success\" or malformed output) keep all three fields None — no behavior change for the existing error contract.

Why this matters now

Anthropic just announced (2026-05-14) that starting 2026-06-15, Max 5x plan subscribers get a dedicated $100/month credit for Agent SDK / claude -p usage, separate from the subscription's interactive limit. Each subscriber will be asked to claim the credit and optionally enable "extra usage" billing for overage.

To make that decision rationally, an operator needs to know:

Total programmatic cost per month per heartbeat (do the heartbeats fit in $100?)
Which heartbeats are the biggest spenders (where to optimize first if they don't fit)
The cache_creation overhead per cold run (which compounds heartbeat cost surprisingly fast — empirically ~$0.50 per run just to load Claude Code's system context)

None of this analysis is possible while cost_usd is NULL. With this fix, the data starts flowing immediately on the next dispatch cycle, and operators have 30 days of real data before the credit launches.

Output format reference

Validated against Claude CLI 2.1.141. Sample top-level JSON response:

{
  \"type\": \"result\",
  \"subtype\": \"success\",
  \"result\": \"ok\",
  \"total_cost_usd\": 0.5241925,
  \"usage\": {
    \"input_tokens\": 6,
    \"output_tokens\": 6,
    \"cache_creation_input_tokens\": 83842,
    \"cache_read_input_tokens\": 0,
    \"service_tier\": \"standard\"
  },
  \"modelUsage\": {
    \"claude-opus-4-7\": {
      \"inputTokens\": 6,
      \"outputTokens\": 6,
      \"costUSD\": 0.5241925
    }
  },
  \"...\": \"...\"
}

total_cost_usd aggregates across all turns in the run, including cache creation. usage.input_tokens / output_tokens are the visible Claude-side counts; cache fields are separate (intentionally not surfaced here — operators who want cache analysis can read the output column from the JSONL log, which already preserves the full response).

Test plan

CLI shape verified — ran claude --print --output-format json --max-turns 1 \"Responda apenas: ok\" against CLI 2.1.141; the response is a single-line JSON with all expected keys.
Inline parse — sample payload parses correctly: cost_usd=0.524, tokens_in=6, tokens_out=6.
Failure path unchanged — when status != \"success\", the try: block is skipped and all three fields remain None (existing behavior preserved).
Malformed output — output = \"not json\" falls into the except Exception: pass branch and returns None values (no crash).
Local deploy — restarted the dashboard with the fix applied; subsequent heartbeat invocations populate heartbeat_runs.cost_usd correctly (full validation pending the next ~1h dispatch cycle).

Notes for reviewers

The import json as _json is intentional — the function already imports json at module level in some forks but not all; the local-scoped import keeps the diff self-contained and avoids touching the import block.
output.strip().splitlines()[-1] defensively handles the case where the CLI prepends a stray log line before the JSON (rare but seen with verbose logging on some CLI builds). Last line is canonical.
No DB migration needed — the cost_usd column already exists in the heartbeat_runs schema.

Summary by Sourcery

Capture cost and token usage metrics from Claude CLI heartbeat invocations and store them in heartbeat run results.

New Features:

Record tokens_in, tokens_out, and cost_usd for successful Claude CLI heartbeat runs by parsing the CLI JSON output.

Enhancements:

Add a helper to parse Claude CLI JSON output defensively, including logging on parse failures without exposing raw content.
Introduce a module-level logger for heartbeat_runner to report non-fatal parsing issues.

sourcery-ai · 2026-05-14T13:48:22Z

Reviewer's Guide

Update heartbeat runner to parse Claude CLI JSON output on successful runs and populate heartbeat usage metrics (tokens_in, tokens_out, cost_usd) instead of always returning None, with robust error handling and non-sensitive logging on parse failures.

Sequence diagram for updated Claude CLI heartbeat invocation and cost parsing

sequenceDiagram
    participant HeartbeatRunner as step7_invoke_claude
    participant ClaudeCLI as Claude_CLI
    participant Parser as _parse_claude_cli_usage
    participant Logger as log

    HeartbeatRunner->>ClaudeCLI: invoke (claude --output-format json)
    ClaudeCLI-->>HeartbeatRunner: status, output, error

    alt status == success
        HeartbeatRunner->>Parser: _parse_claude_cli_usage(output)
        alt JSON parse succeeds
            Parser-->>HeartbeatRunner: tokens_in, tokens_out, cost_usd
        else JSON parse fails
            Parser->>Logger: log.warning(...)
            Parser-->>HeartbeatRunner: None, None, None
        end
    else status != success
        Note over HeartbeatRunner: tokens_in/tokens_out/cost_usd remain None
    end

    HeartbeatRunner-->>HeartbeatRunner: return result dict with tokens_in, tokens_out, cost_usd

File-Level Changes

Change	Details	Files
Populate heartbeat result tokens and cost from Claude CLI JSON output instead of hardcoding them to None.	Introduce _parse_claude_cli_usage helper to extract input/output tokens and total_cost_usd from Claude CLI --output-format json response, including defensive handling of empty or malformed output. Wire step7_invoke_claude to call the parser only on successful CLI invocations and return parsed tokens_in, tokens_out, and cost_usd values in the result dict. Round total_cost_usd to 6 decimal places to limit floating-point drift while avoiding a schema migration from FLOAT to a higher-precision type. On parse failures, catch JSON/shape-related exceptions, log a metadata-only warning that omits raw output content for security/PII safety, and fall back to returning None values so existing failure behavior is preserved. Add a module-level logger for heartbeat_runner to support the new warning log message.	`dashboard/backend/heartbeat_runner.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

Catching a blanket Exception on the JSON parse makes debugging harder; consider narrowing this to json.JSONDecodeError (and maybe KeyError/TypeError) or at least logging a warning when parsing fails so silent data loss is visible.
Since total_cost_usd is monetary data, consider normalizing it (e.g., via Decimal or explicit rounding) before returning so that downstream consumers don’t depend on raw binary float values from the CLI.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Catching a blanket `Exception` on the JSON parse makes debugging harder; consider narrowing this to `json.JSONDecodeError` (and maybe `KeyError`/`TypeError`) or at least logging a warning when parsing fails so silent data loss is visible.
- Since `total_cost_usd` is monetary data, consider normalizing it (e.g., via `Decimal` or explicit rounding) before returning so that downstream consumers don’t depend on raw binary float values from the CLI.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

mt-alarcon · 2026-05-14T13:52:23Z

@sourcery-ai both points addressed in 19e714a:

Narrowed the exception catch to (json.JSONDecodeError, IndexError, TypeError, AttributeError) — the four failure modes that can actually arise from json.loads(output.strip().splitlines()[-1]) + .get() chains. Plus a print(..., file=sys.stderr, flush=True) line so parse failures are visible (the whole point of this PR was undoing silent data loss, so swallowing a parse error would defeat the purpose). Stderr matches the pattern used elsewhere in the module — no new logging dependency.
Rounded total_cost_usd to 6 decimal places (micro-dollar) before returning, plus an isinstance(raw_cost, (int, float)) guard for defensive handling. Full Decimal would require migrating the FLOAT column on heartbeat_runs — out of scope for a single-file fix. Six-place rounding is enough precision for aggregated SUM queries to stay stable.

Test plan re-run for the amended commit:

Round-trip 0.5241925 → 0.524192 ✓
Malformed output → JSONDecodeError caught, stderr line fires ✓
Empty output → IndexError on [-1] caught ✓
Non-numeric total_cost_usd → isinstance returns False, cost_usd stays None ✓
Original success path unchanged ✓

mt-alarcon · 2026-05-14T13:54:38Z

@sourcery-ai review

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Consider reusing the module’s existing logging mechanism (or a shared helper) instead of print(..., file=sys.stderr) in the parse failure path so error reporting is consistent and easier to control.
The JSON parsing logic in step7_invoke_claude is starting to get fairly involved; factoring it into a small helper (e.g., _parse_claude_cli_usage(output)) would keep the main function focused on orchestration and make the parsing behavior easier to unit-test and evolve if the CLI format changes.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Consider reusing the module’s existing logging mechanism (or a shared helper) instead of `print(..., file=sys.stderr)` in the parse failure path so error reporting is consistent and easier to control.
- The JSON parsing logic in `step7_invoke_claude` is starting to get fairly involved; factoring it into a small helper (e.g., `_parse_claude_cli_usage(output)`) would keep the main function focused on orchestration and make the parsing behavior easier to unit-test and evolve if the CLI format changes.

## Individual Comments

### Comment 1
<location path="dashboard/backend/heartbeat_runner.py" line_range="314" />
<code_context>
+            print(
+                f"[heartbeat_runner] step7 cost parse failed "
+                f"({parse_exc.__class__.__name__}); cost/tokens=None for this run. "
+                f"Output preview: {output[:200]!r}",
+                file=sys.stderr,
+                flush=True,
</code_context>
<issue_to_address>
**🚨 issue (security):** Logging a preview of the raw model output may leak sensitive data into stderr.

Since this is a failure path, logging up to 200 chars of the raw model output can still expose sensitive user content to stderr and downstream log collectors. Consider removing the preview, or logging only minimally sensitive metadata (e.g., length, a short fixed prefix, or a hash) so you preserve failure signal without exposing content.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

mt-alarcon · 2026-05-14T13:57:39Z

@sourcery-ai review

Round 2 addressed in 12e3201:

Switched to logging — import logging + log = logging.getLogger(__name__) at module level, parse failure now uses log.warning(...). Plays with whatever logging config the host already has — no new infra.
Extracted _parse_claude_cli_usage(output) helper — pure function, input str → tuple[int|None, int|None, float|None]. step7_invoke_claude becomes a one-liner: tokens_in, tokens_out, cost_usd = _parse_claude_cli_usage(output) if status == 'success' else (None, None, None). Trivially unit-testable in isolation.
🚨 Security — removed raw output preview from the log line. The CLI output can contain PII or model-generated content from the user's prompt (emails, customer data, credentials in some flows). The log now contains only the exception class name and the output length — enough to triage, nothing to leak. Validated:

>>> _parse_claude_cli_usage('not json')
WARNING — step7_invoke_claude: failed to parse Claude CLI JSON output (JSONDecodeError, output_len=8); cost/tokens will be NULL for this run.
(None, None, None)

All 4 sanity scenarios pass: happy path, malformed output, empty output, non-numeric cost (isinstance guard catches it without crashing).

mt-alarcon · 2026-05-14T14:03:02Z

@sourcery-ai review

SourceryAI

Hey - I've reviewed your changes and they look great!

Hi @mt-alarcon! 👋

Thanks for trying out Sourcery by commenting with @sourcery-ai review! 🚀

Install the sourcery-ai bot to get automatic code reviews on every pull request ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The new _parse_claude_cli_usage helper uses json.loads but there is no json import in this file’s diff (and the description mentions import json as _json), so double-check that json is actually imported and that the name used in code matches the import to avoid a NameError.
Given that step7_invoke_claude already returns status, you might consider passing status into _parse_claude_cli_usage and logging it on parse failures as well, which can make it easier to correlate malformed output with specific failure modes without exposing any prompt content.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The new `_parse_claude_cli_usage` helper uses `json.loads` but there is no `json` import in this file’s diff (and the description mentions `import json as _json`), so double-check that `json` is actually imported and that the name used in code matches the import to avoid a `NameError`.
- Given that `step7_invoke_claude` already returns `status`, you might consider passing `status` into `_parse_claude_cli_usage` and logging it on parse failures as well, which can make it easier to correlate malformed output with specific failure modes without exposing any prompt content.

## Individual Comments

### Comment 1
<location path="dashboard/backend/heartbeat_runner.py" line_range="322-325" />
<code_context>
+        return None, None, None
+    try:
+        payload = json.loads(output.strip().splitlines()[-1])
+        raw_cost = payload.get("total_cost_usd")
+        cost_usd = round(raw_cost, 6) if isinstance(raw_cost, (int, float)) else None
+        usage = payload.get("usage") or {}
+        return usage.get("input_tokens"), usage.get("output_tokens"), cost_usd
</code_context>
<issue_to_address>
**suggestion:** Consider handling string-valued `total_cost_usd` from the CLI more gracefully.

If `total_cost_usd` is ever sent as a string (e.g. "0.000123"), `cost_usd` will end up as `None`. To keep cost tracking working if the upstream format changes, consider parsing string values with `float(raw_cost)` (catching `ValueError` and falling back to `None`).

```suggestion
        payload = json.loads(output.strip().splitlines()[-1])
        raw_cost = payload.get("total_cost_usd")

        if isinstance(raw_cost, (int, float)):
            cost_value = raw_cost
        elif isinstance(raw_cost, str):
            try:
                cost_value = float(raw_cost)
            except ValueError:
                cost_value = None
        else:
            cost_value = None

        cost_usd = round(cost_value, 6) if cost_value is not None else None
        usage = payload.get("usage") or {}
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

step7_invoke_claude was hardcoding tokens_in/tokens_out/cost_usd to None with a comment claiming "Claude CLI doesn't expose token counts easily". That comment is outdated — the runner already passes `--output-format json` to the CLI, which emits a single-line JSON object containing: - `total_cost_usd` (top level) - `usage.input_tokens`, `usage.output_tokens` - `usage.cache_creation_input_tokens`, `usage.cache_read_input_tokens` - `modelUsage.<model>.{costUSD,inputTokens,outputTokens,...}` Result: the heartbeat_runs.cost_usd column was always NULL despite the schema, making cost tracking and budget analysis impossible. Sourcery review rounds 1, 2, and 3 incorporated (2026-05-14): Round 1: - Narrowed the catch from blanket Exception to specific failure modes (json.JSONDecodeError + IndexError + TypeError + AttributeError). - Added warning logging on parse failure (silent data loss was the original problem; swallowing the new parse failure would be self-defeating). - Rounded total_cost_usd to 6 decimal places (micro-dollar) before returning so aggregated SUM queries don't accumulate float drift. Round 2: - Switched from `print(file=sys.stderr)` to `logging.getLogger(__name__)` + `log.warning(...)` — consistent control plane, no new dependency. - Extracted `_parse_claude_cli_usage(output)` helper so step7 stays focused on subprocess orchestration. Helper is pure and unit-testable. - SECURITY: removed the 200-char raw output preview from the failure log. CLI output can contain PII or sensitive model-generated content from the user's prompt. Log now contains ONLY the exception class and the output length. Round 3: - Accept stringified `total_cost_usd` defensively (forward-compat with a possible future Decimal-serializer wrap on the CLI side). Today CLI 2.1.141 always emits float, but the cost is one isinstance check and a try/float() — cheap forward-compat for monetary data. - Non-numeric / dict / list / None all still fall back to NULL safely. Round 3 also raised two points we didn't change: - "json.loads with no import in the diff" → false positive. `json` is already imported at module top (line 21 of upstream/main, unchanged by this PR, hence not in the diff). No NameError. - "Pass status into the helper" → the helper is only called when status == 'success' (one-line ternary in the caller). Logging status inside the helper would be redundant. ## Why this matters now Anthropic announced (2026-05-14) that starting 2026-06-15, Max 5x plan subscribers get a dedicated $100/month credit for Agent SDK / claude -p usage, separate from interactive subscription limits. To decide whether the credit covers the workload (and how much extra-usage cap to set), operators need per-run cost data — which is exactly what this fix restores. With the fix deployed today, 30 days of real data accumulate before the credit launches. ## Test plan - [x] CLI shape verified against Claude CLI 2.1.141 (single-line JSON, `total_cost_usd` top-level, `usage.input_tokens`, `usage.output_tokens`). - [x] Helper sanity scenarios: - Float cost (today's reality): `(tokens, tokens, 0.524192)` ✓ - String cost (forward-compat): `(1, 2, 1.23e-4)` ✓ - Garbage string cost: `(1, 2, None)` — ValueError caught ✓ - Dict cost (defensive): `(3, 4, None)` — falls to None ✓ - Missing `usage`: `(None, None, 1.0)` ✓ - Malformed JSON: `(None, None, None)`, log.warning fires ✓ - Empty output: `(None, None, None)`, no log ✓ - [x] Failure path unchanged: status != "success" → helper not invoked, all three fields remain None (existing contract preserved). - [x] SECURITY: failure log contains only exception class + output length, never the raw output content. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

mt-alarcon · 2026-05-14T14:06:30Z

@sourcery-ai review

Round 3 addressed in b2064b7:

Applied — string-valued total_cost_usd (your inline suggestion): isinstance + float(raw_cost) with ValueError catch + fall-through to None for everything else (dict/list/None/etc). 7 sanity scenarios pass — float (today's reality), stringified, garbage string, dict, missing usage, malformed JSON, empty.

Not changed — json import: false positive in your review. json is already imported at line 21 of upstream/main (import argparse, import json, …); it's not in the diff because the import line itself is unchanged by this PR. No NameError at runtime — verified by the sanity scenarios above all executing successfully.

Not changed — passing status into the helper: the helper is only called when status == 'success' (one-line ternary on the caller: _parse_claude_cli_usage(output) if status == 'success' else (None, None, None)). Logging status inside the helper would always print "success" — no signal.

sourcery-ai Bot reviewed May 14, 2026

View reviewed changes

mt-alarcon force-pushed the fix/heartbeat-runner-capture-cost-tokens branch from d3f6dca to 19e714a Compare May 14, 2026 13:52

sourcery-ai Bot reviewed May 14, 2026

View reviewed changes

Comment thread dashboard/backend/heartbeat_runner.py Outdated

mt-alarcon force-pushed the fix/heartbeat-runner-capture-cost-tokens branch from 19e714a to 12e3201 Compare May 14, 2026 13:57

SourceryAI reviewed May 14, 2026

View reviewed changes

sourcery-ai Bot reviewed May 14, 2026

View reviewed changes

Comment thread dashboard/backend/heartbeat_runner.py

mt-alarcon force-pushed the fix/heartbeat-runner-capture-cost-tokens branch from 12e3201 to b2064b7 Compare May 14, 2026 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(heartbeat-runner): capture cost_usd and token counts from Claude CLI#82

fix(heartbeat-runner): capture cost_usd and token counts from Claude CLI#82
mt-alarcon wants to merge 1 commit into
evolution-foundation:mainfrom
mt-alarcon:fix/heartbeat-runner-capture-cost-tokens

mt-alarcon commented May 14, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented May 14, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

SourceryAI left a comment

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mt-alarcon commented May 14, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this matters now

Output format reference

Test plan

Notes for reviewers

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for updated Claude CLI heartbeat invocation and cost parsing

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

SourceryAI left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mt-alarcon commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mt-alarcon commented May 14, 2026 •

edited by sourcery-ai Bot

Loading

sourcery-ai Bot commented May 14, 2026 •

edited

Loading