fix(reliable): fail fast on SESSION_EXPIRED in provider retry loop by YellowSnnowmann · Pull Request #2200 · tinyhumansai/openhuman

YellowSnnowmann · 2026-05-19T10:57:23Z

Summary

Treat SESSION_EXPIRED errors as non-retryable in ReliableProvider classification.
Stop wasting retry budget on auth-state failures that can only be resolved by re-auth/sign-in.
Reduce noisy aggregate failures like repeated attempt 1/3 ... attempt 3/3 for the same expired-session condition.
Add regression coverage in reliable_tests.rs to verify session-expired errors short-circuit retries.
Keep existing retry behavior unchanged for transient upstream failures (429/5xx/timeouts).

Problem

The reliable provider layer retried SESSION_EXPIRED as if it were a transient provider/network failure.
That caused repeated failed attempts with no chance of recovery, slower user feedback, and noisy Sentry events.
The expected behavior for expired backend session is immediate failure so the app can prompt sign-in/re-auth.

Solution

Updated is_non_retryable in src/openhuman/inference/provider/reliable.rs to classify messages matching is_session_expired_message(...) as non-retryable.
This ensures the retry loop exits after the first failed attempt for expired-session boundaries.

Added tests in src/openhuman/inference/provider/reliable_tests.rs:

classification test for SESSION_EXPIRED
end-to-end retry-loop test asserting only one call occurs and aggregate marks non_retryable.

Tradeoff: this relies on canonical session-expired message patterns, but those are already centralized in observability and used across the core.

Submission Checklist

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.
Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change (or N/A: behaviour-only change)
All affected feature IDs from the matrix are listed in the PR description under ## Related
No new external network dependencies introduced (mock backend used per Testing Strategy)
Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md)
Linked issue closed via Closes #NNN in the ## Related section

Impact

Runtime/platform impact: Rust core provider reliability path (desktop app core behavior) only.
User impact: faster, clearer failure on expired session; fewer redundant retries before sign-in flow is needed.
Observability impact: reduced noise for this auth-state class; errors are treated as expected non-retryable flow.
Performance: avoids unnecessary retry delays/work for unrecoverable auth-state failures.
Security/compatibility: no new permissions, migrations, or external dependencies.

Summary by CodeRabbit

Bug Fixes
- "Session expired" errors are now treated as non-retryable, causing failed requests (both standard and streaming) to abort immediately instead of entering retry/poll loops, improving responsiveness and error clarity.
Tests
- Added unit and streaming tests to verify immediate abort behavior and proper failure aggregation for session-expired errors.

… reliable provider

…avior

coderabbitai · 2026-05-19T10:57:30Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2e3f6c07-4902-418e-b2e6-fb273fde6e34

📥 Commits

Reviewing files that changed from the base of the PR and between 6e57893 and a0bdee0.

📒 Files selected for processing (2)

src/openhuman/inference/provider/reliable.rs
src/openhuman/inference/provider/reliable_tests.rs

📝 Walkthrough

Walkthrough

Adds an upfront "session expired" detection to ReliableProvider's non-retryable classifiers (sync and streaming), and adds unit and integration tests that assert ReliableProvider aborts retries/polling immediately on the SESSION_EXPIRED marker.

Changes

Session-expired boundary detection in ReliableProvider

Layer / File(s)	Summary
Error classification: session-expired shortcut `src/openhuman/inference/provider/reliable.rs`	`is_non_retryable` converts the error to a string, checks for the session-expired marker and returns true immediately if present, then falls back to the existing `reqwest::Error` status and message-digit heuristics. `is_stream_error_non_retryable` mirrors this for `StreamError::Provider(msg)` by returning non-retryable when the marker is found.
Unit test: exact SESSION_EXPIRED assertion `src/openhuman/inference/provider/reliable_tests.rs` (lines 219–221)	Updated the `is_non_retryable` test to assert against the exact SESSION_EXPIRED string used by the classifier.
Integration tests: abort-on-session-expired (non-streaming) `src/openhuman/inference/provider/reliable_tests.rs` (lines 259–295)	New Tokio test `session_expired_aborts_retries` verifies ReliableProvider fails fast on SESSION_EXPIRED: single provider call, aggregated error marked `non_retryable`, and no later-attempt text present.
Integration tests: abort-on-session-expired (streaming) `src/openhuman/inference/provider/reliable_tests.rs` (lines 297–419)	Added `StreamingErrorMock` and `session_expired_aborts_retries_streaming` to confirm streaming retry/polling is short-circuited on SESSION_EXPIRED (one stream created, one poll, terminal aggregated streaming error returned).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

tinyhumansai/openhuman#1763: Introduced/implemented the is_session_expired_message observability helper used by this PR.
tinyhumansai/openhuman#1719: Uses the same session-expired detection; relates to how session-expired is classified across observability and retry logic.
tinyhumansai/openhuman#2022: Related changes to streaming retry/failover logic that interact with is_stream_error_non_retryable behavior.

Suggested reviewers

senamakel

Poem

🐰 A session expired, so we stop the chase,
One swift fail, no retries to trace.
The provider sighs, the loop is through,
One clear error — honest and true.
Hoppity-hop, the rabbit says "phew!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 72.73% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(reliable): fail fast on SESSION_EXPIRED in provider retry loop' accurately describes the main change: treating SESSION_EXPIRED errors as non-retryable to exit the retry loop immediately.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/openhuman/inference/provider/reliable_tests.rs (1)

259-295: ⚡ Quick win

Add a streaming fail-fast regression test for SESSION_EXPIRED.

This new test is good for simple_chat, but the streaming retry classifier is separate. A focused streaming test would lock in the expected fail-fast auth behavior across both execution paths.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/inference/provider/reliable_tests.rs` around lines 259 - 295,
Add a new tokio::test (e.g., session_expired_aborts_retries_streaming) that
mirrors session_expired_aborts_retries but exercises the provider's streaming
path: construct a ReliableProvider with the same MockProvider (calls Arc,
fail_until_attempt = usize::MAX, error containing "SESSION_EXPIRED"), invoke the
streaming API on ReliableProvider (the streaming equivalent of simple_chat),
await the error and assert that only one call was made to MockProvider, the
error is classified as non_retryable, and the aggregate message does not include
further attempts; reuse the same assertions as session_expired_aborts_retries
but target the streaming method to lock in fail-fast auth behavior for the
streaming retry classifier.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/inference/provider/reliable.rs`:
- Around line 16-22: The streaming retry logic misses the SESSION_EXPIRED check:
update the streaming-specific non-retryable path by adding the same
session-expired classification used in is_non_retryable to
is_stream_error_non_retryable (or the function handling streaming retry
decisions) so that
crate::core::observability::is_session_expired_message(&err.to_string()) returns
true and causes an immediate non-retryable result for streaming requests; locate
the streaming retry branch that currently calls is_stream_error_non_retryable
and add the session-expired check there (or delegate to is_non_retryable) to
ensure parity with non-streaming behavior.

---

Nitpick comments:
In `@src/openhuman/inference/provider/reliable_tests.rs`:
- Around line 259-295: Add a new tokio::test (e.g.,
session_expired_aborts_retries_streaming) that mirrors
session_expired_aborts_retries but exercises the provider's streaming path:
construct a ReliableProvider with the same MockProvider (calls Arc,
fail_until_attempt = usize::MAX, error containing "SESSION_EXPIRED"), invoke the
streaming API on ReliableProvider (the streaming equivalent of simple_chat),
await the error and assert that only one call was made to MockProvider, the
error is classified as non_retryable, and the aggregate message does not include
further attempts; reuse the same assertions as session_expired_aborts_retries
but target the streaming method to lock in fail-fast auth behavior for the
streaming retry classifier.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e5299b6a-46f0-4818-9ade-8e13589e99e6

📥 Commits

Reviewing files that changed from the base of the PR and between 4384cd1 and 6e57893.

📒 Files selected for processing (2)

src/openhuman/inference/provider/reliable.rs
src/openhuman/inference/provider/reliable_tests.rs

…stream error handling

…n handling in streaming

…inyhumansai#2200)

YellowSnnowmann added 2 commits May 19, 2026 16:18

fix(inference): handle session expiration as a non-retryable error in…

e54b1c0

… reliable provider

test(inference): add session expired test to verify non-retryable beh…

6e57893

…avior

YellowSnnowmann marked this pull request as ready for review May 19, 2026 11:20

YellowSnnowmann requested a review from a team May 19, 2026 11:20

coderabbitai Bot added the working A PR that is being worked on by the team. label May 19, 2026

coderabbitai Bot requested changes May 19, 2026

View reviewed changes

Comment thread src/openhuman/inference/provider/reliable.rs

YellowSnnowmann added 2 commits May 19, 2026 17:40

fix(inference): treat session expiration as a non-retryable error in …

1b1e390

…stream error handling

test(inference): add StreamingErrorMock to validate session expiratio…

a0bdee0

…n handling in streaming

coderabbitai Bot approved these changes May 19, 2026

View reviewed changes

senamakel merged commit d6a99fc into tinyhumansai:main May 19, 2026
27 checks passed

This was referenced May 20, 2026

fix(jsonrpc): keep scoped 401s from expiring session #2292

Merged

fix(jsonrpc): narrow SessionExpired to backend-boundary signal (#2286) #2302

Closed

CodeGhost21 pushed a commit to CodeGhost21/openhuman that referenced this pull request May 22, 2026

fix(reliable): fail fast on SESSION_EXPIRED in provider retry loop (t…

665c0f3

…inyhumansai#2200)

AusAgentSmith pushed a commit to AusAgentSmith/openhuman that referenced this pull request May 23, 2026

fix(reliable): fail fast on SESSION_EXPIRED in provider retry loop (t…

9e87336

…inyhumansai#2200)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reliable): fail fast on SESSION_EXPIRED in provider retry loop#2200

fix(reliable): fail fast on SESSION_EXPIRED in provider retry loop#2200
senamakel merged 4 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/reliable-session-expired-non-retryable

YellowSnnowmann commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

YellowSnnowmann commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YellowSnnowmann commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading