fix: pre-CEF single-instance mutex guard on Windows + provider retry for 502s#1723
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a Windows pre-CEF named mutex to enforce single-instance behavior before CEF initializes, and wraps the OpenHuman backend provider with a ReliableProvider configured from runtime reliability settings before routing. ChangesInfrastructure and Resilience Improvements
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/openhuman/providers/ops.rs`:
- Around line 368-380: Add verbose debug diagnostics around the ReliableProvider
wrapper initialization: log a grep-friendly prefix (e.g. "reliable:init") and
include INFERENCE_BACKEND_ID, the chosen retries and backoff
(config.reliability.provider_retries, provider_backoff_ms), and the
model_fallbacks value when constructing reliable::ReliableProvider in the block
that creates reliable_remote (after calling create_backend_inference_provider
and before/after ReliableProvider::new().with_model_fallbacks). Use the
project's tracing/log facility (tracing::debug! or log::debug!) at debug/trace
level so retry/backoff configuration and the fact that the provider was wrapped
are recorded for diagnostics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d00c1aa-4e6c-47bd-a29e-fa7121e2a6a1
📒 Files selected for processing (3)
app/src-tauri/Cargo.tomlapp/src-tauri/src/lib.rssrc/openhuman/providers/ops.rs
…for 502s Two independent production fixes: 1. Windows CEF init race (Sentry OPENHUMAN-TAURI-A, 598 events): `tauri_plugin_single_instance` detects duplicate launches inside `.setup()`, which runs AFTER `Builder::build()` triggers `CefRuntime::init` → `cef::initialize()`. On a second launch, `cef::initialize()` returns 0 (primary holds the CEF cache lock) and the vendored runtime asserts `result == 1`, panicking with `assertion left == right failed left: 0 right: 1` (fatal, Windows-only). Added a `#[cfg(windows)]` pre-build named Win32 mutex guard (`com.openhuman.app-cef-init`) at the top of `run()`, mirroring the macOS `cef_preflight::check_default_cache()` pattern. Secondary instances now exit cleanly before touching CEF. Added `Win32_System_Threading` feature to `windows-sys` accordingly. 2. Agent 502 surfacing as fatal (Sentry agent.run_single failed): `create_intelligent_routing_provider` wrapped the backend in a raw `OpenAiCompatibleProvider` with no retry logic. A single transient 502 from the backend bypassed `ReliableProvider` entirely and propagated as a fatal error to `run_single`. Now wraps the raw provider in `ReliableProvider` (same `reliability.provider_retries` / `provider_backoff_ms` config as all other provider paths).
771aac7 to
5c68f15
Compare
…sys 0.59 CreateMutexW's SECURITY_ATTRIBUTES parameter is individually gated behind the Win32_Security feature in windows-sys 0.59 in addition to the module-level Win32_System_Threading gate. Without it the Windows E2E build fails with "no `CreateMutexW` in `Win32::System::Threading`".
Summary
run()so secondary instances exit beforecef::initialize()is ever called — eliminates Sentry OPENHUMAN-TAURI-A (598 fatal panics, Windows-only)ReliableProviderinsidecreate_intelligent_routing_providerso transient 502/503/504 errors are retried instead of surfacing as fatalagent.run_singlefailures#[cfg(windows)]onlyProblem
1. OPENHUMAN-TAURI-A — Windows CEF init race (598 events)
tauri_plugin_single_instancedetects duplicate launches inside its.setup()hook. But.setup()runs afterBuilder::build(), which callsCefRuntime::init→cef::initialize(). When a second instance launches while the primary is running,cef::initialize()returns0(primary holds the CEF user-data-dir cache lock). The vendored runtime then hitsassert_eq!(result, 1)→ fatal panic:The macOS path is protected by
cef_preflight::check_default_cache()which inspects Chromium'sSingletonLocksymlink before the builder. Windows had no equivalent (that module usesnixand Unix symlinks). Thetauri_plugin_single_instancecomment in Cargo.toml claimed the plugin fires before builder work — it doesn't; it fires insetup().Sentry: https://tinyhumans.sentry.io/issues/7458830272/
2. Agent 502s surfacing as fatal
create_intelligent_routing_providerpassed a rawOpenAiCompatibleProvideras the remote arm ofIntelligentRoutingProvider— no retry wrapper. A single transient 502 from the backend propagated directly torun_singleand logged[observability] agent.run_single failed: OpenHuman API error (502 Bad Gateway): error code: 502. TheReliableProviderretry layer (used by every other provider path) was bypassed entirely.Solution
1. Windows pre-build mutex guard (
app/src-tauri/src/lib.rs)At the very top of
run(), before any CEF or Tauri builder work:-cef-initis distinct from the plugin's-simmutex — no interference with WM_COPYDATA forwarding for the fully-started caseWin32_System_Threadingfeature towindows-sysinCargo.tomlcef_preflight::check_default_cache()exactly2. ReliableProvider wrap (
src/openhuman/providers/ops.rs)Submission Checklist
#[cfg(windows)]platform code with no testable surface on macOS CI; the provider retry path is covered by existingReliableProvidertests which already exercise 502 retry behaviourReliableProvider, Win32 mutex) is already testedImpact
Related
tauri-plugin-single-instanceto reflect accurate timingAI Authored PR Metadata
Linear Issue
Commit & Branch
Validation Run
pnpm --filter openhuman-app format:check— passed (pre-push hook)pnpm typecheck— N/A: no TS changescargo fmtapplied by pre-push hook;cargo checkpassedValidation Blocked
Behavior Changes
Parity Contract
ReliableProviderconfig (retries/backoff) unchangedIntelligentRoutingProviderremote arm now goes through same retry layer ascreate_resilient_provider_with_optionsDuplicate / Superseded PR Handling
Summary by CodeRabbit