fix(heartbeat-runner): capture cost_usd and token counts from Claude CLI#82
Conversation
Reviewer's GuideUpdate heartbeat runner to parse Claude CLI JSON output on successful runs and populate heartbeat usage metrics (tokens_in, tokens_out, cost_usd) instead of always returning None, with robust error handling and non-sensitive logging on parse failures. Sequence diagram for updated Claude CLI heartbeat invocation and cost parsingsequenceDiagram
participant HeartbeatRunner as step7_invoke_claude
participant ClaudeCLI as Claude_CLI
participant Parser as _parse_claude_cli_usage
participant Logger as log
HeartbeatRunner->>ClaudeCLI: invoke (claude --output-format json)
ClaudeCLI-->>HeartbeatRunner: status, output, error
alt status == success
HeartbeatRunner->>Parser: _parse_claude_cli_usage(output)
alt JSON parse succeeds
Parser-->>HeartbeatRunner: tokens_in, tokens_out, cost_usd
else JSON parse fails
Parser->>Logger: log.warning(...)
Parser-->>HeartbeatRunner: None, None, None
end
else status != success
Note over HeartbeatRunner: tokens_in/tokens_out/cost_usd remain None
end
HeartbeatRunner-->>HeartbeatRunner: return result dict with tokens_in, tokens_out, cost_usd
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- Catching a blanket
Exceptionon the JSON parse makes debugging harder; consider narrowing this tojson.JSONDecodeError(and maybeKeyError/TypeError) or at least logging a warning when parsing fails so silent data loss is visible. - Since
total_cost_usdis monetary data, consider normalizing it (e.g., viaDecimalor explicit rounding) before returning so that downstream consumers don’t depend on raw binary float values from the CLI.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Catching a blanket `Exception` on the JSON parse makes debugging harder; consider narrowing this to `json.JSONDecodeError` (and maybe `KeyError`/`TypeError`) or at least logging a warning when parsing fails so silent data loss is visible.
- Since `total_cost_usd` is monetary data, consider normalizing it (e.g., via `Decimal` or explicit rounding) before returning so that downstream consumers don’t depend on raw binary float values from the CLI.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
d3f6dca to
19e714a
Compare
|
@sourcery-ai both points addressed in 19e714a:
Test plan re-run for the amended commit:
|
|
@sourcery-ai review |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- Consider reusing the module’s existing logging mechanism (or a shared helper) instead of
print(..., file=sys.stderr)in the parse failure path so error reporting is consistent and easier to control. - The JSON parsing logic in
step7_invoke_claudeis starting to get fairly involved; factoring it into a small helper (e.g.,_parse_claude_cli_usage(output)) would keep the main function focused on orchestration and make the parsing behavior easier to unit-test and evolve if the CLI format changes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider reusing the module’s existing logging mechanism (or a shared helper) instead of `print(..., file=sys.stderr)` in the parse failure path so error reporting is consistent and easier to control.
- The JSON parsing logic in `step7_invoke_claude` is starting to get fairly involved; factoring it into a small helper (e.g., `_parse_claude_cli_usage(output)`) would keep the main function focused on orchestration and make the parsing behavior easier to unit-test and evolve if the CLI format changes.
## Individual Comments
### Comment 1
<location path="dashboard/backend/heartbeat_runner.py" line_range="314" />
<code_context>
+ print(
+ f"[heartbeat_runner] step7 cost parse failed "
+ f"({parse_exc.__class__.__name__}); cost/tokens=None for this run. "
+ f"Output preview: {output[:200]!r}",
+ file=sys.stderr,
+ flush=True,
</code_context>
<issue_to_address>
**🚨 issue (security):** Logging a preview of the raw model output may leak sensitive data into stderr.
Since this is a failure path, logging up to 200 chars of the raw model output can still expose sensitive user content to stderr and downstream log collectors. Consider removing the preview, or logging only minimally sensitive metadata (e.g., length, a short fixed prefix, or a hash) so you preserve failure signal without exposing content.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
19e714a to
12e3201
Compare
|
@sourcery-ai review Round 2 addressed in 12e3201:
>>> _parse_claude_cli_usage('not json')
WARNING — step7_invoke_claude: failed to parse Claude CLI JSON output (JSONDecodeError, output_len=8); cost/tokens will be NULL for this run.
(None, None, None)All 4 sanity scenarios pass: happy path, malformed output, empty output, non-numeric cost (isinstance guard catches it without crashing). |
|
@sourcery-ai review |
SourceryAI
left a comment
There was a problem hiding this comment.
Hey - I've reviewed your changes and they look great!
Hi @mt-alarcon! 👋
Thanks for trying out Sourcery by commenting with @sourcery-ai review! 🚀
Install the sourcery-ai bot to get automatic code reviews on every pull request ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The new
_parse_claude_cli_usagehelper usesjson.loadsbut there is nojsonimport in this file’s diff (and the description mentionsimport json as _json), so double-check thatjsonis actually imported and that the name used in code matches the import to avoid aNameError. - Given that
step7_invoke_claudealready returnsstatus, you might consider passingstatusinto_parse_claude_cli_usageand logging it on parse failures as well, which can make it easier to correlate malformed output with specific failure modes without exposing any prompt content.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `_parse_claude_cli_usage` helper uses `json.loads` but there is no `json` import in this file’s diff (and the description mentions `import json as _json`), so double-check that `json` is actually imported and that the name used in code matches the import to avoid a `NameError`.
- Given that `step7_invoke_claude` already returns `status`, you might consider passing `status` into `_parse_claude_cli_usage` and logging it on parse failures as well, which can make it easier to correlate malformed output with specific failure modes without exposing any prompt content.
## Individual Comments
### Comment 1
<location path="dashboard/backend/heartbeat_runner.py" line_range="322-325" />
<code_context>
+ return None, None, None
+ try:
+ payload = json.loads(output.strip().splitlines()[-1])
+ raw_cost = payload.get("total_cost_usd")
+ cost_usd = round(raw_cost, 6) if isinstance(raw_cost, (int, float)) else None
+ usage = payload.get("usage") or {}
+ return usage.get("input_tokens"), usage.get("output_tokens"), cost_usd
</code_context>
<issue_to_address>
**suggestion:** Consider handling string-valued `total_cost_usd` from the CLI more gracefully.
If `total_cost_usd` is ever sent as a string (e.g. "0.000123"), `cost_usd` will end up as `None`. To keep cost tracking working if the upstream format changes, consider parsing string values with `float(raw_cost)` (catching `ValueError` and falling back to `None`).
```suggestion
payload = json.loads(output.strip().splitlines()[-1])
raw_cost = payload.get("total_cost_usd")
if isinstance(raw_cost, (int, float)):
cost_value = raw_cost
elif isinstance(raw_cost, str):
try:
cost_value = float(raw_cost)
except ValueError:
cost_value = None
else:
cost_value = None
cost_usd = round(cost_value, 6) if cost_value is not None else None
usage = payload.get("usage") or {}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
step7_invoke_claude was hardcoding tokens_in/tokens_out/cost_usd to None
with a comment claiming "Claude CLI doesn't expose token counts easily".
That comment is outdated — the runner already passes `--output-format json`
to the CLI, which emits a single-line JSON object containing:
- `total_cost_usd` (top level)
- `usage.input_tokens`, `usage.output_tokens`
- `usage.cache_creation_input_tokens`, `usage.cache_read_input_tokens`
- `modelUsage.<model>.{costUSD,inputTokens,outputTokens,...}`
Result: the heartbeat_runs.cost_usd column was always NULL despite the
schema, making cost tracking and budget analysis impossible.
Sourcery review rounds 1, 2, and 3 incorporated (2026-05-14):
Round 1:
- Narrowed the catch from blanket Exception to specific failure modes
(json.JSONDecodeError + IndexError + TypeError + AttributeError).
- Added warning logging on parse failure (silent data loss was the
original problem; swallowing the new parse failure would be
self-defeating).
- Rounded total_cost_usd to 6 decimal places (micro-dollar) before
returning so aggregated SUM queries don't accumulate float drift.
Round 2:
- Switched from `print(file=sys.stderr)` to `logging.getLogger(__name__)`
+ `log.warning(...)` — consistent control plane, no new dependency.
- Extracted `_parse_claude_cli_usage(output)` helper so step7 stays
focused on subprocess orchestration. Helper is pure and unit-testable.
- SECURITY: removed the 200-char raw output preview from the failure
log. CLI output can contain PII or sensitive model-generated content
from the user's prompt. Log now contains ONLY the exception class
and the output length.
Round 3:
- Accept stringified `total_cost_usd` defensively (forward-compat with
a possible future Decimal-serializer wrap on the CLI side). Today
CLI 2.1.141 always emits float, but the cost is one isinstance check
and a try/float() — cheap forward-compat for monetary data.
- Non-numeric / dict / list / None all still fall back to NULL safely.
Round 3 also raised two points we didn't change:
- "json.loads with no import in the diff" → false positive. `json` is
already imported at module top (line 21 of upstream/main, unchanged
by this PR, hence not in the diff). No NameError.
- "Pass status into the helper" → the helper is only called when
status == 'success' (one-line ternary in the caller). Logging
status inside the helper would be redundant.
## Why this matters now
Anthropic announced (2026-05-14) that starting 2026-06-15, Max 5x plan
subscribers get a dedicated $100/month credit for Agent SDK / claude -p
usage, separate from interactive subscription limits. To decide whether
the credit covers the workload (and how much extra-usage cap to set),
operators need per-run cost data — which is exactly what this fix
restores. With the fix deployed today, 30 days of real data accumulate
before the credit launches.
## Test plan
- [x] CLI shape verified against Claude CLI 2.1.141 (single-line JSON,
`total_cost_usd` top-level, `usage.input_tokens`, `usage.output_tokens`).
- [x] Helper sanity scenarios:
- Float cost (today's reality): `(tokens, tokens, 0.524192)` ✓
- String cost (forward-compat): `(1, 2, 1.23e-4)` ✓
- Garbage string cost: `(1, 2, None)` — ValueError caught ✓
- Dict cost (defensive): `(3, 4, None)` — falls to None ✓
- Missing `usage`: `(None, None, 1.0)` ✓
- Malformed JSON: `(None, None, None)`, log.warning fires ✓
- Empty output: `(None, None, None)`, no log ✓
- [x] Failure path unchanged: status != "success" → helper not invoked,
all three fields remain None (existing contract preserved).
- [x] SECURITY: failure log contains only exception class + output
length, never the raw output content.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
12e3201 to
b2064b7
Compare
|
@sourcery-ai review Round 3 addressed in b2064b7: Applied — string-valued Not changed — Not changed — passing |
Summary
heartbeat_runner.step7_invoke_claudewas hardcodingtokens_in/tokens_out/cost_usdtoNone, with a comment claiming the Claude CLI "doesn't expose token counts easily".--output-format json(line ~249), and the CLI emits a single-line JSON object on success withtotal_cost_usdat top level plususage.{input,output}_tokensnested.heartbeat_runs.cost_usdcolumn was alwaysNULL, making cost tracking and budget analysis impossible.status != \"success\"or malformed output) keep all three fieldsNone— no behavior change for the existing error contract.Why this matters now
Anthropic just announced (2026-05-14) that starting 2026-06-15, Max 5x plan subscribers get a dedicated
$100/monthcredit for Agent SDK /claude -pusage, separate from the subscription's interactive limit. Each subscriber will be asked to claim the credit and optionally enable "extra usage" billing for overage.To make that decision rationally, an operator needs to know:
$100?)~$0.50per run just to load Claude Code's system context)None of this analysis is possible while
cost_usdisNULL. With this fix, the data starts flowing immediately on the next dispatch cycle, and operators have 30 days of real data before the credit launches.Output format reference
Validated against Claude CLI
2.1.141. Sample top-level JSON response:{ \"type\": \"result\", \"subtype\": \"success\", \"result\": \"ok\", \"total_cost_usd\": 0.5241925, \"usage\": { \"input_tokens\": 6, \"output_tokens\": 6, \"cache_creation_input_tokens\": 83842, \"cache_read_input_tokens\": 0, \"service_tier\": \"standard\" }, \"modelUsage\": { \"claude-opus-4-7\": { \"inputTokens\": 6, \"outputTokens\": 6, \"costUSD\": 0.5241925 } }, \"...\": \"...\" }total_cost_usdaggregates across all turns in the run, including cache creation.usage.input_tokens/output_tokensare the visible Claude-side counts; cache fields are separate (intentionally not surfaced here — operators who want cache analysis can read theoutputcolumn from the JSONL log, which already preserves the full response).Test plan
claude --print --output-format json --max-turns 1 \"Responda apenas: ok\"against CLI2.1.141; the response is a single-line JSON with all expected keys.cost_usd=0.524,tokens_in=6,tokens_out=6.status != \"success\", thetry:block is skipped and all three fields remainNone(existing behavior preserved).output = \"not json\"falls into theexcept Exception: passbranch and returnsNonevalues (no crash).heartbeat_runs.cost_usdcorrectly (full validation pending the next ~1h dispatch cycle).Notes for reviewers
import json as _jsonis intentional — the function already importsjsonat module level in some forks but not all; the local-scoped import keeps the diff self-contained and avoids touching the import block.output.strip().splitlines()[-1]defensively handles the case where the CLI prepends a stray log line before the JSON (rare but seen with verbose logging on some CLI builds). Last line is canonical.cost_usdcolumn already exists in theheartbeat_runsschema.Summary by Sourcery
Capture cost and token usage metrics from Claude CLI heartbeat invocations and store them in heartbeat run results.
New Features:
Enhancements: