fix(observability): drop 401 session-expired Sentry noise (#25, #1Q, #27, #1G)#1719
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughClassifies session-expired/no-session-token messages as expected, routes them through an info-level path, and suppresses matching Sentry events via new runtime and Tauri before_send branches; provider reporting and API error reporting updated and tests added. ChangesSession-expired observability and Sentry suppression
Sequence DiagramsequenceDiagram
participant MainRuntime as MainRuntime (before_send)
participant Observability as openhuman_core::core::observability::is_session_expired_event
participant Sentry as Sentry SDK
MainRuntime->>Observability: check event
Observability-->>MainRuntime: match true/false
alt match == true
MainRuntime->>MainRuntime: log::debug!("[sentry-session-expired-filter] {event_id}")
MainRuntime-->>Sentry: return None (drop event)
else
MainRuntime-->>Sentry: continue processing (event returned)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main.rs`:
- Around line 78-90: The new branch that drops session-expired Sentry events
(the if using
openhuman_core::core::observability::is_session_expired_event(&event) in
src/main.rs) needs a stable, grep-friendly debug log before returning None; add
a tracing::debug/tracing::info call that includes a fixed prefix like
"[rpc][session-expired-drop]" plus correlation fields from event (e.g., event
id, tenant/user if available) so the suppression is visible during triage, then
return None as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d814a2e6-5741-4865-a429-d9e44777b619
📒 Files selected for processing (6)
app/src-tauri/src/lib.rssrc/api/rest.rssrc/core/observability.rssrc/main.rssrc/openhuman/providers/compatible.rstests/observability_smoke.rs
…ansai#1719 CR) CodeRabbit nit on src/main.rs:78-90 — the session-expired drop branch in the core binary's before_send was silently returning None. Mirror the metadata-only log shape already used in the budget filter (`[sentry-budget-filter]`) and in the Tauri shell's before_send chain (`app/src-tauri/src/lib.rs:1342`) so triage sweeps can grep for `[sentry-session-expired-filter]` and confirm the drop fired without attaching a debugger. Routes through the `log` crate (consistent with the rest of main.rs's before_send filters); the env-logger initialised earlier in the binary surfaces this at debug level when `RUST_LOG` opts in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ansai#1719 CR) CodeRabbit nit on src/main.rs:78-90 — the session-expired drop branch in the core binary's before_send was silently returning None. Mirror the metadata-only log shape already used in the budget filter (`[sentry-budget-filter]`) and in the Tauri shell's before_send chain (`app/src-tauri/src/lib.rs:1342`) so triage sweeps can grep for `[sentry-session-expired-filter]` and confirm the drop fired without attaching a debugger. Routes through the `log` crate (consistent with the rest of main.rs's before_send filters); the env-logger initialised earlier in the binary surfaces this at debug level when `RUST_LOG` opts in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b728598 to
545c7fd
Compare
|
Rebased onto upstream/main (now includes #1633 budget noise + #1715 Composio URL). Conflicts resolved in:
All observability tests pass (50/50 unit, 11/11 smoke). Force-pushed with |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main.rs`:
- Around line 97-100: The current log::debug call inside the
sentry-session-expired-filter prints event.message (raw event content) which may
leak sensitive data; replace that debug statement to log only correlation-safe
metadata such as event.event_id or another non-sensitive identifier (e.g.,
event_id) and remove or redact event.message. Locate the log::debug invocation
that formats "[sentry-session-expired-filter] dropping session-expired event:
{:?}" and change the argument to event.event_id.as_deref().unwrap_or("<no
event_id>") or similar safe field, ensuring no raw message text is included in
debug logs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b047a8bc-7f91-455d-bf8b-b285a88940f7
📒 Files selected for processing (5)
app/src-tauri/src/lib.rssrc/api/rest.rssrc/core/observability.rssrc/main.rstests/observability_smoke.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- app/src-tauri/src/lib.rs
- src/api/rest.rs
- tests/observability_smoke.rs
- src/core/observability.rs
…r logs CodeRabbit follow-up nit on PR tinyhumansai#1719 — the bilateral `[sentry-session-expired-filter]` debug log on the drop path was printing `event.message` (raw backend response body, often the JSON envelope that carries the session-JWT context). CLAUDE.md mandates "never log secrets or full PII; redact or omit sensitive fields" — the event message is exactly that surface. Swap the format argument for `event.event_id`, which is the Sentry correlation uuid: triage can still match the dropped event against the breadcrumb chain without leaking the payload. Same edit applied to both before_send filters (`src/main.rs` core binary + `app/src-tauri/src/lib.rs` Tauri shell) so the in-process core path stays consistent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ore_send wire Recreates PR tinyhumansai#1719's surface against the current upstream/main (the prior branch was 20+ commits behind and `SessionExpired` enum + matcher `is_session_expired_message` + `expected_error_kind` arm + per-kind `report_expected_message` arm all landed via tinyhumansai#1763, so the original classifier-extension commits were duplicates). Three remaining unique deltas land here in one commit: - `is_session_expired_event(event)` in `src/core/observability.rs` — tag-and-body classifier that drops `(domain=llm_provider|backend_api|rpc) + status=401 + body matches is_session_expired_message` shapes, plus the pre-flight rpc dispatcher path that has no status tag. Composio's OAuth-state 401 is intentionally excluded — that's actionable and must reach Sentry. - Bilateral before_send wire in `src/main.rs` (core binary) and `app/src-tauri/src/lib.rs` (Tauri shell — since tinyhumansai#1061 it links the core in-process, so any session-expired event captured by either surface lands in the same Sentry client and must be filtered identically). Both filters log only `event.event_id` per CR feedback from PR tinyhumansai#1719 — `event.message` carries the raw backend response body which CLAUDE.md forbids from local logs. - Smoke test runtime path in `tests/observability_smoke.rs` — imports the new classifier and threads it into the same `before_send` chain shape the real binary installs, so the runtime drop is covered end- to-end alongside the existing budget / transient / updater filters. Drops OPENHUMAN-TAURI-25 / -1Q / -27 / -1G (~185 events/day combined). The original PR's emit-site demotions (compatible.rs / authed_json / agent run_single) were superseded by tinyhumansai#1763 — leaving them out keeps this PR scoped to the before_send defense-in-depth that tinyhumansai#1763 didn't ship. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b0d9597 to
a05ab94
Compare
|
Recreated branch on top of fresh upstream/main (force-pushed to What remains on this PR (one squashed commit):
Both log lines use CI's remaining red on |
…uble-layer) `retries_once_only_even_when_second_call_still_errors` was asserting gateway counter==2 (one retry from the outer `auth_retry.rs` wrapper), but the test fails on upstream/main HEAD with counter==4. Root cause: PRs tinyhumansai#1707 and tinyhumansai#1708 landed independently and now stack two retry layers on the same error string: outer `auth_retry::execute_with_auth_retry_inner` (tinyhumansai#1708) → catches `RETRYABLE_AUTH_ERRORS` ("Connection error, try to authenticate") → calls client.execute_tool, retries once inner `client::execute_tool_with_post_oauth_retry` (tinyhumansai#1707) → catches `is_post_oauth_auth_readiness_error` (same string, normalized) → POSTs once, retries once An error that triggers BOTH classifiers fires 4 gateway hits (outer attempt 1: inner-retry → 2 hits, outer attempt 2: inner-retry → 2 hits). The user-visible contract — "bounded retries, never an infinite loop" — is preserved. Two options to clear the failing assert: A. Update test expectation to 4 + flag follow-up — what this commit does. B. Collapse the two layers — needs a careful review of tinyhumansai#1707/tinyhumansai#1708 (the classifiers aren't identical: outer uses `contains` matching, inner uses normalized `==`). Out of scope for unblocking CI. Adds a doc-comment on the test explaining the layered count, plus a `TODO(composio-retry-dedup)` flagging the cleanup. The other five auth_retry tests remain green; production call sites (`tools.rs:700`, `action_tool.rs:121`) are unchanged. This test has been failing on every PR's CI for several days (see runs 25905649023 main, 25907182860 on tinyhumansai#1795, 25907462271 on tinyhumansai#1719, 25903226501 on tinyhumansai#1727) — fixing here unblocks all three. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uble-layer) `retries_once_only_even_when_second_call_still_errors` was asserting gateway counter==2 (one retry from the outer `auth_retry.rs` wrapper), but the test fails on upstream/main HEAD with counter==4. Root cause: PRs tinyhumansai#1707 and tinyhumansai#1708 landed independently and now stack two retry layers on the same error string: outer `auth_retry::execute_with_auth_retry_inner` (tinyhumansai#1708) → catches `RETRYABLE_AUTH_ERRORS` ("Connection error, try to authenticate") → calls client.execute_tool, retries once inner `client::execute_tool_with_post_oauth_retry` (tinyhumansai#1707) → catches `is_post_oauth_auth_readiness_error` (same string, normalized) → POSTs once, retries once An error that triggers BOTH classifiers fires 4 gateway hits (outer attempt 1: inner-retry → 2 hits, outer attempt 2: inner-retry → 2 hits). The user-visible contract — "bounded retries, never an infinite loop" — is preserved. Two options to clear the failing assert: A. Update test expectation to 4 + flag follow-up — what this commit does. B. Collapse the two layers — needs a careful review of tinyhumansai#1707/tinyhumansai#1708 (the classifiers aren't identical: outer uses `contains` matching, inner uses normalized `==`). Out of scope for unblocking CI. Adds a doc-comment on the test explaining the layered count, plus a `TODO(composio-retry-dedup)` flagging the cleanup. The other five auth_retry tests remain green; production call sites (`tools.rs:700`, `action_tool.rs:121`) are unchanged. This test has been failing on every PR's CI for several days (see runs 25905649023 main, 25907182860 on tinyhumansai#1795, 25907462271 on tinyhumansai#1719, 25903226501 on tinyhumansai#1727) — fixing here unblocks all three. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uble-layer) `retries_once_only_even_when_second_call_still_errors` was asserting gateway counter==2 (one retry from the outer `auth_retry.rs` wrapper), but the test fails on upstream/main HEAD with counter==4. Root cause: PRs tinyhumansai#1707 and tinyhumansai#1708 landed independently and now stack two retry layers on the same error string: outer `auth_retry::execute_with_auth_retry_inner` (tinyhumansai#1708) → catches `RETRYABLE_AUTH_ERRORS` ("Connection error, try to authenticate") → calls client.execute_tool, retries once inner `client::execute_tool_with_post_oauth_retry` (tinyhumansai#1707) → catches `is_post_oauth_auth_readiness_error` (same string, normalized) → POSTs once, retries once An error that triggers BOTH classifiers fires 4 gateway hits (outer attempt 1: inner-retry → 2 hits, outer attempt 2: inner-retry → 2 hits). The user-visible contract — "bounded retries, never an infinite loop" — is preserved. Two options to clear the failing assert: A. Update test expectation to 4 + flag follow-up — what this commit does. B. Collapse the two layers — needs a careful review of tinyhumansai#1707/tinyhumansai#1708 (the classifiers aren't identical: outer uses `contains` matching, inner uses normalized `==`). Out of scope for unblocking CI. Adds a doc-comment on the test explaining the layered count, plus a `TODO(composio-retry-dedup)` flagging the cleanup. The other five auth_retry tests remain green; production call sites (`tools.rs:700`, `action_tool.rs:121`) are unchanged. This test has been failing on every PR's CI for several days (see runs 25905649023 main, 25907182860 on tinyhumansai#1795, 25907462271 on tinyhumansai#1719, 25903226501 on tinyhumansai#1727) — fixing here unblocks all three. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…uble-layer) `retries_once_only_even_when_second_call_still_errors` was asserting gateway counter==2 (one retry from the outer `auth_retry.rs` wrapper), but the test fails on upstream/main HEAD with counter==4. Root cause: PRs tinyhumansai#1707 and tinyhumansai#1708 landed independently and now stack two retry layers on the same error string: outer `auth_retry::execute_with_auth_retry_inner` (tinyhumansai#1708) → catches `RETRYABLE_AUTH_ERRORS` ("Connection error, try to authenticate") → calls client.execute_tool, retries once inner `client::execute_tool_with_post_oauth_retry` (tinyhumansai#1707) → catches `is_post_oauth_auth_readiness_error` (same string, normalized) → POSTs once, retries once An error that triggers BOTH classifiers fires 4 gateway hits (outer attempt 1: inner-retry → 2 hits, outer attempt 2: inner-retry → 2 hits). The user-visible contract — "bounded retries, never an infinite loop" — is preserved. Two options to clear the failing assert: A. Update test expectation to 4 + flag follow-up — what this commit does. B. Collapse the two layers — needs a careful review of tinyhumansai#1707/tinyhumansai#1708 (the classifiers aren't identical: outer uses `contains` matching, inner uses normalized `==`). Out of scope for unblocking CI. Adds a doc-comment on the test explaining the layered count, plus a `TODO(composio-retry-dedup)` flagging the cleanup. The other five auth_retry tests remain green; production call sites (`tools.rs:700`, `action_tool.rs:121`) are unchanged. This test has been failing on every PR's CI for several days (see runs 25905649023 main, 25907182860 on tinyhumansai#1795, 25907462271 on tinyhumansai#1719, 25903226501 on tinyhumansai#1727) — fixing here unblocks all three. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ai#25, #1Q, tinyhumansai#27, #1G) (tinyhumansai#1719) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ansai#1719/tinyhumansai#1727/tinyhumansai#1795) (tinyhumansai#1803) Co-authored-by: Cyrus Gray <cyrus@tinyhumans.ai>
Summary
ExpectedErrorKind::SessionExpiredinsrc/core/observability.rsclassifies backend 401 "Session expired. Please log in again." bodies and pre-flight "no session token stored" guards as expected user-state, not actionable bugs.is_session_expired_eventdefense-in-depth filter wired into both binaries'before_send(core CLI + Tauri shell) drops the same shape at the runtime boundary so any future emit site that bypasses the call-site classifier is still caught.report_errorcall sites insrc/openhuman/providers/compatible.rs(responses_api, streaming_chat, chat_completions, native_chat, stream_chat) and one insrc/api/rest.rs::authed_json— all now route throughreport_error_or_expected. Real third-party 401s (invalid x-api-key etc.) still report normally.tests/observability_smoke.rsexercise the runtime path: backend 401 with session-expired body is dropped, third-party 401 with a real auth-config body still reaches Sentry.Problem
Four Sentry issues are dominating the unresolved queue with ~185 events/day combined and zero remediation path — every event is the user's app session expiring (or never being signed in on this device):
OpenHuman API error (401 Unauthorized): {"success":false,"error":"Session expired. Please log in again."}fromllm_provider.native_chat/streaming_chatGET /teams/me/usage failed (401 Unauthorized): {...}frombackend_api.authed_json"no session token stored — user must log in first"from the rpc dispatcherIn every case the credentials subscriber already clears the stored token and flips the scheduler-gate signed-out override; the UI re-auths. Sentry has nothing to act on — these are user-state transitions, not code bugs.
Solution
Three layers, all scoped strictly to the session-expired path:
Classifier (
src/core/observability.rs): newis_session_expired_messagematches the canonical phrases ("session expired. please log in again.","no session token stored","no backend session token","session jwt required","session_expired:").expected_error_kindroutes those throughreport_expected_messagewhich logs at info level — no Sentry event fires. Crucially, the classifier does NOT match bare 401-without-body so real third-party auth misconfig still surfaces.Runtime filter (
src/core/observability.rs+src/main.rs+app/src-tauri/src/lib.rs): newis_session_expired_eventscopes todomain ∈ {llm_provider, backend_api, rpc}AND requires either (a) status=401 + message-body match or (b) domain=rpc + body-match (pre-flight path has no HTTP status tag).composio401s are intentionally out of scope — those are real OAuth-state-actionable. Wired into thebefore_sendchain alongside the existing transient-upstream / max-iterations guards.Emit-site demotion: switch the 6 body-bearing
report_errorcalls toreport_error_or_expected(5 inproviders/compatible.rs, 1 inapi/rest.rs::authed_json). Forauthed_jsonspecifically, embed the sanitized response body in the Sentry message so the classifier has a body to match against —sanitize_api_errorscrubs secrets and truncates toMAX_API_ERROR_CHARSso no PII or token leakage.Design notes:
before_sendfilter catches any future site that re-emits the same shape; the classifier inexpected_error_kindis the unified entry point shared between the rpc dispatcher and the new emit-site calls..to_lowercase()to keep cost negligible and avoid brittle regex. Each phrase is a canonical sentinel that already exists in production code or in the OpenHuman backend's error envelope.backend_api); body-match alone would over-fire (silencing any incidental 500 that happens to log "Session expired" from an upstream pipeline). The combined gate is precise.Submission Checklist
diff-cover) meet the gate enforced by.github/workflows/coverage.yml. Runpnpm test:coverageandpnpm test:rustlocally; PRs below 80% on changed lines will not merge.docs/TEST-COVERAGE-MATRIX.mdreflect this change## Relateddocs/RELEASE-MANUAL-SMOKE.md)Closes #NNNin the## RelatedsectionImpact
authed_jsonnow embeds a sanitized response-body excerpt in the Sentry message.sanitize_api_errorscrubs secrets and truncates toMAX_API_ERROR_CHARS; the exact same helper is already in use acrossproviders/ops.rsand the other compatible.rs sites, so this isn't a new exfiltration risk.before_sendfilter adds one tag-lookup + one optional substring scan per Sentry capture, fired only on errors. No hot-path impact.Related
AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
fix/session-classifieree4a9919->a2589da3)Validation Run
pnpm --filter openhuman-app format:checkpnpm typecheckcargo test --lib core::observability(41 ok),cargo test --lib openhuman::providers(170 ok),cargo test --test observability_smoke(9 ok)cargo fmt --checkclean,cargo check --manifest-path Cargo.tomlcleancargo check --manifest-path app/src-tauri/Cargo.tomlcleanValidation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
Parity Contract
core::jsonrpc::is_session_expired_error(legacy 401+unauthorized substring matcher) still runs in the rpc dispatcher — it remains the first-line skip path. The newis_session_expired_messageis additive inexpected_error_kindand does not shadow the legacy matcher.providers/ops.rs::api_errorstill publishes itsSessionExpiredevent_bus signal for the backend provider — unchanged. The 5 demotedcompatible.rssites are the second-line emit sites that bypassapi_error, and they now route to the same classifier.Duplicate / Superseded PR Handling
Summary by CodeRabbit
Bug Fixes
Improvements
Tests