fix(reliable): fail fast on SESSION_EXPIRED in provider retry loop#2200
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds an upfront "session expired" detection to ReliableProvider's non-retryable classifiers (sync and streaming), and adds unit and integration tests that assert ReliableProvider aborts retries/polling immediately on the SESSION_EXPIRED marker. ChangesSession-expired boundary detection in ReliableProvider
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/openhuman/inference/provider/reliable_tests.rs (1)
259-295: ⚡ Quick winAdd a streaming fail-fast regression test for
SESSION_EXPIRED.This new test is good for
simple_chat, but the streaming retry classifier is separate. A focused streaming test would lock in the expected fail-fast auth behavior across both execution paths.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/inference/provider/reliable_tests.rs` around lines 259 - 295, Add a new tokio::test (e.g., session_expired_aborts_retries_streaming) that mirrors session_expired_aborts_retries but exercises the provider's streaming path: construct a ReliableProvider with the same MockProvider (calls Arc, fail_until_attempt = usize::MAX, error containing "SESSION_EXPIRED"), invoke the streaming API on ReliableProvider (the streaming equivalent of simple_chat), await the error and assert that only one call was made to MockProvider, the error is classified as non_retryable, and the aggregate message does not include further attempts; reuse the same assertions as session_expired_aborts_retries but target the streaming method to lock in fail-fast auth behavior for the streaming retry classifier.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/inference/provider/reliable.rs`:
- Around line 16-22: The streaming retry logic misses the SESSION_EXPIRED check:
update the streaming-specific non-retryable path by adding the same
session-expired classification used in is_non_retryable to
is_stream_error_non_retryable (or the function handling streaming retry
decisions) so that
crate::core::observability::is_session_expired_message(&err.to_string()) returns
true and causes an immediate non-retryable result for streaming requests; locate
the streaming retry branch that currently calls is_stream_error_non_retryable
and add the session-expired check there (or delegate to is_non_retryable) to
ensure parity with non-streaming behavior.
---
Nitpick comments:
In `@src/openhuman/inference/provider/reliable_tests.rs`:
- Around line 259-295: Add a new tokio::test (e.g.,
session_expired_aborts_retries_streaming) that mirrors
session_expired_aborts_retries but exercises the provider's streaming path:
construct a ReliableProvider with the same MockProvider (calls Arc,
fail_until_attempt = usize::MAX, error containing "SESSION_EXPIRED"), invoke the
streaming API on ReliableProvider (the streaming equivalent of simple_chat),
await the error and assert that only one call was made to MockProvider, the
error is classified as non_retryable, and the aggregate message does not include
further attempts; reuse the same assertions as session_expired_aborts_retries
but target the streaming method to lock in fail-fast auth behavior for the
streaming retry classifier.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e5299b6a-46f0-4818-9ade-8e13589e99e6
📒 Files selected for processing (2)
src/openhuman/inference/provider/reliable.rssrc/openhuman/inference/provider/reliable_tests.rs
…stream error handling
…n handling in streaming
Summary
Problem
Solution
Added tests in src/openhuman/inference/provider/reliable_tests.rs:
Tradeoff: this relies on canonical session-expired message patterns, but those are already centralized in observability and used across the core.
Submission Checklist
.github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.docs/TEST-COVERAGE-MATRIX.mdreflect this change (or N/A: behaviour-only change)docs/RELEASE-MANUAL-SMOKE.md)Impact
Related
Summary by CodeRabbit
Bug Fixes
Tests