fix(observability): Wave 4 classifier — socket transport + custom-provider config-rejection (~366 events, 13 IDs) by oxoxDev · Pull Request #2309 · tinyhumansai/openhuman

oxoxDev · 2026-05-20T08:22:26Z

Summary

Extends expected_error_kind matcher ladder with 11 new substring arms across two existing buckets — no new variants, no new emit sites, no behavior change beyond reclassifying known wire shapes that were leaking past Wave 1-3 anchors.
Closes 13 Sentry IDs (~366 events) — all deterministic user-environment / user-config errors with no remediation path (offline, DNS fail, captive portal, bad API key, out of credits, wrong model id, missing OAuth scope, region block).
Pure classifier — no UI, retry, or fallback logic touched. The reliable-provider stack still falls back to OpenHuman's hosted tier for the Lane O bodies; users keep their chat. Lane N socket failures already gated through report_error_or_expected at ws_loop.rs:191, so the threshold-escalation event simply demotes to a warn breadcrumb instead of paging.
Two micro-commits — Lane O (is_provider_config_rejection_message PHRASES const) and Lane N (is_network_unreachable_message substring arms) — each independently revertible.

Problem

After the Wave 1-3 sweep landed, fresh Sentry triage surfaced 13 unresolved IDs whose bodies match the spirit of an existing classifier bucket but use wire-shape variants the current substring anchors miss:

Lane O — custom-provider config rejection (~250 events)

The ProviderConfigRejection variant + is_provider_config_rejection_message exist precisely for "user pointed OpenHuman at a custom_openai endpoint with a model / temperature / region / credential that provider doesn't accept." But the 7 phrases shipped in Wave 1-3 (#2079 / #2076 / #2202) only cover the DeepSeek / OpenRouter abstract-tier-leak and Moonshot temperature shapes. New surfaces:

Sentry ID	Events	Real wire body
`R1`	44	`{"error":{"message":"This model is not available in your region.","code":403}}`
`R4`	14	`{"code":403,"reason":"ModelNotAllowed","message":"模型不允许访问"…}` (Doubao/ChatGLM)
`YC`	16	`{"error":{"type":"invalid_authentication_error"…}}`
`S5`	14	`{"error":{"message":"This request requires more credits, or fewer max_tokens…"}}` (OpenRouter 402)
`Y0`	13	`{'error': '/chat/completions: Invalid model name passed in model=reasoning-v1…'}`
`JN`	14	`{"error":{"message":"No active credentials for provider: openai"…}}`
`KB`	16	Same `No active credentials for provider` shape from OpenHuman backend re-emit
`JK`	17	`litellm.BadRequestError: Github_copilotException - Bad Request…`
`J2` / `J5` / `J4`	62	`{"error":{"message":"model 'llama3.3' not found","type":"not_found_error"…}}`

Lane N — socket WebSocket-connect transport (~116 events)

is_network_unreachable_message already catches connection refused / dns error / network is unreachable but two real shapes escape:

Sentry ID	Events	Real wire body
`44`	50	`[socket] Connection failed: WebSocket connect: IO error: failed to lookup address information: nodename nor servname provided, or not known`
`4P`	66	`[socket] Connection failed: WebSocket connect: HTTP error: 200 OK` (captive portal / corporate proxy intercepting the WS upgrade handshake)

Every one of these is deterministic user-environment / user-configuration state — the maintainers have no remediation. Sentry has no signal to act on. Every event was pure noise.

Solution

Lane O — `src/openhuman/inference/provider/config_rejection.rs:55-79`

Append 8 new phrases to the PHRASES const (case-insensitive substring match, same precedence as existing 7):

"not available in your region",        // R1
"modelnotallowed",                     // R4
"invalid_authentication_error",        // YC
"requires more credits",               // S5
"invalid model name passed in model=", // Y0
"no active credentials for provider",  // JN + KB
"litellm.badrequesterror",             // JK
"not_found_error",                     // J2 + J5 + J4

Each anchor is intentionally narrow (e.g. passed in model= not bare invalid model name; litellm.badrequesterror not bare litellm) so a stray log line elsewhere can't accidentally demote a real provider/backend bug. The HTTP-layer wrapper (is_provider_config_rejection_http) still guards on provider != openhuman_backend::PROVIDER_LABEL, so a model rejection from our own backend (which would be a real regression) still reaches Sentry.

Lane N — `src/core/observability.rs:299-319`

Three new substring arms appended to is_network_unreachable_message:

failed to lookup address + nodename nor servname — libc getaddrinfo() failure renderings on macOS / BSD / POSIX resolvers when tungstenite wraps as IO error without the reqwest dns error prefix.
http error: 200 ok — tungstenite-only render. Reqwest renders HTTP 200 as "HTTP status server error (200)", so no collision with the regular HTTP call path. A negative precedence test (http_200_classifier_does_not_silence_unrelated_log_lines) pins this against benign HTTP/1.1 200 OK / status: 200 OK prose so a future broadening cannot silence success traces.

Design trade-off — classifier vs. root-fix validation layer

The Lane O bugs do have a more durable root fix: a pre-save provider validation layer (test the API key, validate the model id against /v1/models, surface region-blocks at config-save time). That's a real product initiative requiring UX design, per-provider model-list infrastructure, and meaningful spec work — out of scope for a triage sweep. This PR follows the Wave 1-3 maintainer precedent (classifier-first noise suppression) so the Sentry signal/noise ratio improves immediately; the validation layer remains tracked as a separate follow-up. If/when it lands, these classifier arms become redundant belt-and-suspenders and can be deleted without conflict.

Lane N has no root-fix alternative — offline / firewall / captive-portal / DNS failures are pure user-environment state. Classifier-demote is the correct disposition.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. 11 positive tests pinned to real Sentry bodies + 1 negative precedence test cover every new substring arm; only the rustdoc comments and the negative unrelated_* test body are non-executable lines.
N/A: behaviour-only change — Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change
All affected feature IDs from the matrix are listed in the PR description under ## Related
No new external network dependencies introduced (mock backend used per Testing Strategy)
N/A: classifier-only change with no UI / release-cut surface touched — Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md)
Linked issue closed via Closes #NNN in the ## Related section

Impact

Runtime: no behavior change for users — chat / agent / web-channel / socket paths continue to fall back / retry exactly as before. The only delta is that report_error_or_expected now classifies these 13 wire shapes as ExpectedErrorKind::* instead of escalating as tracing::error, so Sentry stops receiving the events.
Performance: negligible — is_provider_config_rejection_message lowercases once and runs 15 substring scans (was 7); is_network_unreachable_message runs 11 substring scans (was 8). Both already on the error-path, never on the hot path.
Security: no new attack surface. Substring matchers run on already-rendered error messages — no parsing, no allocation beyond the existing to_ascii_lowercase().
Migration: none.
Compatibility: forward-compatible. New phrases are append-only — no existing matcher behavior changes. If a future maintainer ships the root-fix validation layer, deleting these arms is a clean local edit.

Closes OPENHUMAN-TAURI-R1
Closes OPENHUMAN-TAURI-R4
Closes OPENHUMAN-TAURI-YC
Closes OPENHUMAN-TAURI-S5
Closes OPENHUMAN-TAURI-Y0
Closes OPENHUMAN-TAURI-JN
Closes OPENHUMAN-TAURI-KB
Closes OPENHUMAN-TAURI-JK
Closes OPENHUMAN-TAURI-J2
Closes OPENHUMAN-TAURI-J5
Closes OPENHUMAN-TAURI-J4
Closes OPENHUMAN-TAURI-44
Closes OPENHUMAN-TAURI-4P
Follow-up PR(s)/TODOs: pre-save provider-validation layer (validate API key + model id + region at config-save time) tracked as separate product initiative; would supersede the Lane O arms.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

Key: N/A (Sentry-driven triage; no Linear ticket)
URL: N/A

Commit & Branch

Branch: fix/sentry-wave4-classifier-no
Commit SHA: e16414a (tip)

Agent

Claude Code (Sonnet 4.6) running locally

Summary by CodeRabbit

Bug Fixes
- Enhanced detection of network connectivity failures and provider configuration rejection scenarios for more accurate error classification.
Tests
- Added comprehensive test coverage for network-related transport failures and provider configuration rejection cases.

… (Wave 4 Lane O) Extend `is_provider_config_rejection_message` PHRASES with 8 new substrings covering wire shapes that the Wave 1-3 phrases miss: - `not available in your region` — R1 (region block) - `modelnotallowed` — R4 (Doubao/ChatGLM allowlist) - `invalid_authentication_error` — YC (user key rejected by upstream) - `requires more credits` — S5 (OpenRouter 402 out-of-credits) - `invalid model name passed in model=` — Y0 (litellm proxy pre-routing reject) - `no active credentials for provider` — JN + KB (upstream API key gap) - `litellm.badrequesterror` — JK (github_copilot OAuth gap) - `not_found_error` — J2 + J5 + J4 (litellm envelope `type`) Each is a deterministic user-state error (wrong model, wrong region, bad key, out of credits, missing OAuth scope) — the reliable-provider stack already falls back to OpenHuman's hosted tier, so the UX is intact; only the Sentry spam was leaking. Closes ~250 events across 11 issue IDs. Pinned tests against the literal Sentry event bodies from each ID so a future provider rename doesn't silently un-classify them. Closes OPENHUMAN-TAURI-R1 Closes OPENHUMAN-TAURI-R4 Closes OPENHUMAN-TAURI-YC Closes OPENHUMAN-TAURI-S5 Closes OPENHUMAN-TAURI-Y0 Closes OPENHUMAN-TAURI-JN Closes OPENHUMAN-TAURI-KB Closes OPENHUMAN-TAURI-JK Closes OPENHUMAN-TAURI-J2 Closes OPENHUMAN-TAURI-J5 Closes OPENHUMAN-TAURI-J4

…ve 4 Lane N) Extend `is_network_unreachable_message` with three substring arms for wire shapes the existing `dns error` / status-bearing matchers miss: - `failed to lookup address` — libc `getaddrinfo()` rendering when tungstenite wraps the resolver fail as an `IO error` without the `dns error` prefix (OPENHUMAN-TAURI-44 ~50 events). - `nodename nor servname` — companion phrase from the macOS/BSD libc resolver — same OPENHUMAN-TAURI-44 wire shape, second anchor. - `http error: 200 ok` — tungstenite's `WsError::Http(200)` rendering when a captive portal / corporate proxy intercepts the WS upgrade handshake and returns a plain HTML 200 page (OPENHUMAN-TAURI-4P ~66 events). Tungstenite-only — reqwest renders HTTP 200 as `HTTP status server error (200)` so there is no collision with the regular HTTP path. A precedence test (`http_200_classifier_does_not_silence_unrelated_log_lines`) pins the substring against benign `HTTP/1.1 200 OK` / `status: 200 OK` prose so a future broadening does not silence success traces. Sentry has no remediation path for any of these — the user must change their network (firewall / proxy / DNS). Closes ~116 additional events. Closes OPENHUMAN-TAURI-44 Closes OPENHUMAN-TAURI-4P

coderabbitai · 2026-05-20T08:22:44Z

📝 Walkthrough

Walkthrough

This PR extends two error classification matchers to recognize additional Wave 4 Sentry integration patterns: network transport failures (POSIX resolver and WebSocket captive-portal variants) and provider configuration rejections (region blocking, invalid auth, credit limits, and litellm error formats), with corresponding regression tests for each.

Changes

Wave 4 Error Pattern Classification

Layer / File(s)	Summary
Network unreachable transport patterns `src/core/observability.rs`	`is_network_unreachable_message` adds substring matching for POSIX resolver failures (`failed to lookup address`, `nodename nor servname`) and WebSocket proxy errors (`http error: 200 ok`); tests verify the new matchers and assert the `http error: 200 ok` classifier does not misclassify unrelated HTTP success strings.
Provider config rejection patterns `src/openhuman/inference/provider/config_rejection.rs`	Module documentation enumerates additional Wave 4 provider-rejection cases; the `PHRASES` matcher list extends with region-block, invalid-authentication, litellm, and credential-missing tokens; regression tests assert classification of concrete custom-openai and litellm error bodies.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

tinyhumansai/openhuman#2063: Both modify src/core/observability.rs expected-error classification; this PR expands is_network_unreachable_message for Wave 4 variants while #2063 adjusted matcher precedence and LoopbackUnavailable handling.
tinyhumansai/openhuman#2239: Introduced the provider config-rejection classification and demotion logic; this PR extends its is_provider_config_rejection_message matcher with additional Wave 4 phrase patterns.

Suggested labels

working

Suggested reviewers

graycyrus

Poem

🐰 Four Waves bring errors, patterns in the fold,
From POSIX whispers to captive-portal gold.
We catch the rejections that users have made—
Substring by substring, a classification parade!
Wave four shapes recognized, tests light the way. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and specifically summarizes the main changes: extending Wave 4 classifiers for socket transport and custom-provider config-rejection with specific metrics (~366 events, 13 IDs).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

oxoxDev added 2 commits May 20, 2026 13:50

oxoxDev requested a review from a team May 20, 2026 08:22

coderabbitai Bot added the working A PR that is being worked on by the team. label May 20, 2026

coderabbitai Bot approved these changes May 20, 2026

View reviewed changes

senamakel approved these changes May 20, 2026

View reviewed changes

senamakel merged commit f7dfa94 into tinyhumansai:main May 20, 2026
31 of 36 checks passed

This was referenced May 21, 2026

fix(chat): reset web session after rate limits #2412

Closed

fix(agent): handle config rejection in streaming_chat path #2346

Merged

fix(observability): demote OpenHuman backend unknown-model 400 (TAURI-RUST-2Z1) #2464

Merged

oxoxDev mentioned this pull request May 22, 2026

fix(observability,composio): demote direct-mode Composio 401/Invalid API key noise (Sentry TAURI-RUST-X9) #2481

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(observability): Wave 4 classifier — socket transport + custom-provider config-rejection (~366 events, 13 IDs)#2309

fix(observability): Wave 4 classifier — socket transport + custom-provider config-rejection (~366 events, 13 IDs)#2309
senamakel merged 2 commits into
tinyhumansai:mainfrom
oxoxDev:fix/sentry-wave4-classifier-no

oxoxDev commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

oxoxDev commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Lane O — custom-provider config rejection (~250 events)

Lane N — socket WebSocket-connect transport (~116 events)

Solution

Lane O — src/openhuman/inference/provider/config_rejection.rs:55-79

Lane N — src/core/observability.rs:299-319

Design trade-off — classifier vs. root-fix validation layer

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Agent

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

oxoxDev commented May 20, 2026 •

edited

Loading

Lane O — `src/openhuman/inference/provider/config_rejection.rs:55-79`

Lane N — `src/core/observability.rs:299-319`

coderabbitai Bot commented May 20, 2026 •

edited

Loading