fix(desktop): resolve WSL backend getting stuck connecting (#3611) by Jgratton24 · Pull Request #3623 · pingdotgg/t3code

Jgratton24 · 2026-06-30T21:56:08Z

What Changed

Five related fixes for the WSL backend getting permanently stuck on "Connecting to WSL...":

Hold the WSL bootstrap stdin open instead of closing it right after writing the bootstrap envelope (DesktopBackendManager.ts), fixing a race where the envelope could be lost if stdin closed before the WSL-relayed reader finished reading it. Same mechanism as fix(desktop): prevent WSL backend exiting with code 0 by keeping stdi… #3613.
Validate the WSL Node version against the server's required engine range even when a Node binary does resolve, not just when none resolves at all (DesktopWslEnvironment.ts), so an incompatible nvm default (e.g. Node 18) is caught at preflight instead of the server silently no-op'ing, and fail closed if the probe doesn't report a version at all when a range is required. Same fix as fix(desktop): Validate WSL node version against engine range after probe success #3621.
Cap consecutive post-preflight backend exits and fall back to Windows instead of restarting forever (DesktopBackendManager.ts), so any backend that keeps exiting before becoming ready surfaces an actionable error and recovers automatically — including across reinstall, where a persisted wslOnly: true previously left the app stuck with no way out. Uses a dedicated counter scoped strictly to post-spawn exits, separate from the general restart-backoff counter, so ordinary pre-spawn preflight retries (e.g. WSL cold-start) can't trip it early, and resets on a fresh start so the cap firing once doesn't permanently remove the retry budget.
Detach the bootstrap reader once the envelope is parsed, closing the readline interface and releasing the fd on every terminal path instead of only on interruption (apps/server/src/bootstrap.ts).
Listen for the bootstrap reader's errors on the readline interface, not the raw stream (apps/server/src/bootstrap.ts), closing a gap where an error on the stream before any line ever arrived was still an uncaught exception even with fix light mode #4 in place — Node's readline.Interface re-emits stream errors onto itself, and nothing was listening there.

Why

#3611 has two independent root causes that produce the identical "stuck connecting" symptom, plus a structural gap that turns either one into a permanent trap:

The stdin race (oxc stack #1 above), already identified and fixed by fix(desktop): prevent WSL backend exiting with code 0 by keeping stdi… #3613.
A WSL Node-engine-version preflight gap (Add open-in-editor feature with Cmd+O shortcut #2 above), already identified and fixed by fix(desktop): Validate WSL node version against engine range after probe success #3621 — this is the reporter's own diagnosed cause on their machine.
No cap on post-preflight restart loops (Git integration: branch picker + worktrees #3 above) — this is why restart, and even reinstall, didn't help the reporter; nothing ever flipped wslOnly back to false.

Fixes #4 and #5 exist because of what we found testing #1 against a real WSL2 box rather than just unit tests: holding stdin open forever (#1 / #3613's approach) only works if something on the server side actually stops reading from that now-permanently-open stream once it has what it needs, and never crashes on a stream-level error at any point in its lifecycle. The existing bootstrap reader did neither. We observed this live once: the WSL backend crash-looped 5/5 times with an uncaught EAGAIN before ever logging "Listening on", with #1 applied but not yet #4. We were not able to independently reproduce that exact EAGAIN trigger in isolation afterward (a standalone script holding stdin open against real wsl.exe for several minutes did not reproduce it), so we're not claiming it's a guaranteed, deterministic failure — treat it as a real observed failure, not a proven-deterministic one.

What we have deterministically reproduced, independent of that one observation, is a structurally related bug in the same reader: verified directly against the production WSL Node version (v24.14.0), an error on the bootstrap stream before any line is ever read is an uncaught exception with only a stream-level error listener (fix #4 alone doesn't help here, since cleanup() is never reached — the process crashes before our own handler ever runs), and is handled cleanly once the listener is registered on the readline interface instead (fix #5). Combined, #4 and #5 mean the bootstrap reader no longer crashes the process on a stream error at any point in its lifecycle, before or after the envelope is parsed. #3613 alone leaves both exposed.

We bundled all five here because #4 and #5 only make sense in the context of #1, and #3 is a generic safety net for the failure class #1's and #2's bugs both fall into. We're aware #1 and #2 overlap with #3613 and #3621 respectively and are happy to drop either piece from this PR if those merge first — #3, #4, and #5 are the parts that aren't covered anywhere else yet.

UI Changes

No new UI, but fix #3 means the existing "WSL backend couldn't start, falling back to Windows" dialog (already shown in #3621's after-screenshot) now also fires for failures that happen after a successful preflight, not just preflight failures.

Validation

Validated against the real running app on Windows + WSL2, and against the actual production WSL Node version directly, not just unit tests:

Forced the original stuck-connecting bug on unpatched main.
Forced the EAGAIN crash live once with fix oxc stack #1 applied alone (no fix light mode #4) — confirmed 5/5 crash-loop. This specific trigger was not independently reproduced in isolation afterward; see the "Why" section for what that does and doesn't establish.
Independently and deterministically verified (against production WSL Node v24.14.0, outside the full app) that an error on the bootstrap stream before any line arrives is an uncaught exception without fix feat: Github Integration #5, and is handled cleanly with it.
Confirmed WSL-only mode reaches ready in ~25s with zero crashes once all five fixes are applied, and parallel (Windows + WSL) mode has no regression.
New unit tests added across DesktopBackendManager.test.ts, DesktopWslEnvironment.test.ts, and apps/server/src/bootstrap.test.ts, including regression tests for the never-ready-counter isolation/reset (Git integration: branch picker + worktrees #3) and the fail-closed missing-version case (Add open-in-editor feature with Cmd+O shortcut #2); existing suites pass with no regressions; typecheck and lint clean. (We could not find a reliable way to add an automated in-process unit test for fix feat: Github Integration #5's exact scenario — Node's own fs.ReadStream internals don't honor module-level mocking of node:fs from the test process — so that one rests on the direct production-Node verification above rather than an automated test.)

Checklist

This PR is small and focused — it bundles five fixes; see "Why" for why they're grouped
I explained what changed and why
I included before/after screenshots for any UI changes — no new UI, see "UI Changes"
I included a video for animation/interaction changes — n/a

Note

Medium Risk
Changes desktop child-process bootstrap I/O, WSL preflight gating, and backend restart/fallback behavior—high user impact on Windows+WSL but scoped with fd3/stdin branching and new regression tests.

Overview
Addresses #3611 by fixing several ways the WSL backend could stay on "Connecting" forever.

For stdin-delivered bootstrap (bootstrapDelivery: 'stdin'), the desktop now keeps stdin open after writing the JSON line (Stream.never) so early EOF cannot race the child's readline reader and drop the envelope. The server readBootstrapEnvelope now tears down readline and the fd stream on every exit path and listens for errors on the readline interface instead of the raw stream, which matters once stdin stays open.

DesktopBackendManager adds a neverReadyAttempt counter (separate from preflight/restart backoff): after five consecutive post-spawn exits before HTTP readiness on the stdin path, it surfaces a fatal failure via onPreflightFailed (Windows fallback dialog) and stops looping; fd3 Windows-native backends are not capped. Counters reset on readiness and on a fresh start after parking.

WSL preflight (ensureNodePtyImpl) now emits nodeVersion: from the existing probe and enforces engines.node when configured—fatal with retryLimit: 1 for a known mismatch, fail-closed if the version is missing when a range is required.

Extensive unit tests cover stdin vs fd3 streams, never-ready escalation, counter isolation, and engine-range preflight.

^{Reviewed by Cursor Bugbot for commit 80f8711. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Fix WSL backend getting stuck in connecting state by capping never-ready exits and closing bootstrap stdin

For bootstrapDelivery:'stdin', stdin is now held open indefinitely (via Stream.never) instead of closing after writing the bootstrap line, preventing the backend from seeing premature EOF.
Adds a neverReadyAttempt counter in DesktopBackendManager.ts: after MAX_PREFLIGHT_FAILURE_ATTEMPTS consecutive exits before readiness, a fatal failure is surfaced and the instance either restarts or stops.
The bootstrap reader in bootstrap.ts now closes its fd immediately after parsing the envelope or on error, preventing lingering readers and late error events.
WSL preflight in DesktopWslEnvironment.ts now validates the resolved Node.js version against a configured engine range, returning a fatal failure if the version is missing or out of range.
Risk: WSL instances using bootstrapDelivery:'stdin' that previously restarted indefinitely will now stop after hitting the never-ready cap.

^{Macroscope summarized 80f8711.}

Three independently-mergeable changes addressing two distinct root causes of the same code=0/no-port-bound symptom, plus the missing recovery path that turns either into a permanent hang: 1. Hold the WSL bootstrap stdin stream open (Stream.concat(Stream.never)) instead of closing it the instant the single JSON chunk drains. The prior behavior raced stdin EOF against the child's readline 'line' vs 'close' listeners across the extra wsl.exe relay hop, sometimes losing the bootstrap envelope entirely. Adds a regression test capturing command.options.stdin and asserting it never completes for the WSL path, plus a companion assertion that the Windows-native fd3 path is unaffected. 2. Validate the resolved WSL Node's version against nodeEngineRange during ensureNodePty preflight. Previously, once any node resolved the version was never checked, so a too-old Node (e.g. nvm default 18) would load and run the server bundle, but import.meta.main is undefined pre-20.11/22.x, so the launch gate never fires and the process exits 0 with no output -- identical symptoms to (1), entirely independent cause. 3. Cap post-preflight, pre-readiness restart loops the same way the pre-spawn fatal-preflight path already is. Previously any clean exit before readiness (including both causes above) looped forever with no escalation, which is why a persisted wslOnly:true could trap a user across restarts/reinstalls. Gated to bootstrapDelivery:"stdin" so a Windows-native primary crash-looping for an unrelated reason never shows the WSL-specific "falling back to Windows" dialog.

readBootstrapEnvelope's cleanup() (closes the readline interface, destroys its backing stream) was only ever wired as the Effect async registration's interrupt finalizer, so it never ran on a normal successful parse, decode failure, fd-unavailable, or stream-close outcome. The readline interface and its stream stayed attached to the bootstrap fd for the rest of the process lifetime. That was harmless while the desktop side closed stdin immediately after writing the bootstrap line. It stopped being harmless once the WSL bootstrap path (this branch's earlier stdin-EOF-race fix) started holding stdin open indefinitely: a readline interface left idling on a never-closing, wsl.exe-relayed stdin pipe surfaced a spurious EAGAIN 'error' event well after the envelope was already read. Node's readline.Interface registers its own internal 'error' listener on the input stream and re-emits it onto itself; since nothing here listened for 'error' on the interface itself, the unhandled re-emit crashed the process before it ever logged "Listening on" -- the WSL backend crash-looping observed during live validation. Call cleanup() at the start of every terminal branch (handleError, handleLine, handleClose) so the reader stops touching the fd as soon as it resolves one way or another, instead of only on interruption. Updated bootstrap.test.ts: now that cleanup() always destroys the underlying autoClose stream, two tests that separately closed the same fd in an acquireRelease finalizer raced that destroy and hit EBADF; switched them to the no-acquireRelease pattern already used elsewhere in this file for the same reason. Added a regression test asserting the fd is actually closed after a successful parse.

coderabbitai · 2026-06-30T21:56:19Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7f58174d-3ee5-47b7-852c-e16a1fa1eda1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

macroscopeapp · 2026-06-30T22:01:08Z

Approvability

Verdict: Needs human review

1 blocking correctness issue found. Multiple unresolved review comments identify potential bugs in the never-ready counter logic - it may count exits that already reached readiness and may mishandle fd3 vs stdin paths. These substantive correctness concerns about the fix itself warrant human review before merging.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63c4409383

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-30T22:01:31Z

+                  config.value.bootstrapDelivery === "stdin" &&
+                  nextState.restartAttempt >= MAX_PREFLIGHT_FAILURE_ATTEMPTS


Track never-ready exits separately from restart attempts

When WSL has several transient preflight retries first (for example WSL cold-start/wslpath failures), scheduleRestart has already incremented restartAttempt, and a later clean preflight does not reset it. This new cap therefore treats the first backend process exit before readiness as if it had already exited five times, invoking onPreflightFailed and potentially disabling/falling back from WSL after a single process failure rather than after consecutive never-ready exits. A separate consecutive-exit counter, or resetting this counter once preflight succeeds and the process is spawned, would avoid conflating these paths.

Useful? React with 👍 / 👎.

Check the resolved Node version against nodeEngineRange only once the rest of the preflight probe has succeeded (exitCode === 0), instead of unconditionally as soon as a node path resolves. This avoids masking a more specific, unrelated probe failure (e.g. the exitCode === 3 unpacked-deps case) behind a Node-version error in the rare case both conditions are true at once.

…ing Node version Three independent automated reviewers (Macroscope, Cursor Bugbot, Codex) flagged that the never-ready cap reused restartAttempt, which is also incremented by ordinary pre-spawn preflight retries (e.g. WSL cold-start) and never reset on a fresh start. This let the cap trip on far fewer than 5 actual post-spawn never-ready exits, and gave no fresh retry budget after it fired once. Add neverReadyAttempt, a counter scoped strictly to post-spawn exits, reset on a successful onReady or a clean start from a stopped state (mirroring the existing preflightFailureAttempt pattern). Also fail closed in the WSL Node engine-range check: if a range is required but the probe didn't report a version, reject rather than silently letting an unchecked Node through (Cursor Bugbot).

macroscopeapp · 2026-06-30T22:33:06Z

🟡 Medium

t3code/apps/desktop/src/backend/DesktopBackendManager.ts

Line 625 in 4d3e49b

const next = {

finalizeRun unconditionally increments neverReadyAttempt on every exit, including runs that had already reached readiness. If a healthy WSL backend becomes ready, later exits, and the restarted process starts failing before readiness, the counter starts from 1 instead of 0, so the never-ready fallback trips after only 4 consecutive never-ready exits instead of the intended 5. The counter should only increment for runs that never reached readiness.

🤖 Copy this AI Prompt to have your agent fix this:

In file @apps/desktop/src/backend/DesktopBackendManager.ts around line 625: `finalizeRun` unconditionally increments `neverReadyAttempt` on every exit, including runs that had already reached readiness. If a healthy WSL backend becomes ready, later exits, and the restarted process starts failing before readiness, the counter starts from `1` instead of `0`, so the never-ready fallback trips after only 4 consecutive never-ready exits instead of the intended 5. The counter should only increment for runs that never reached readiness.

cursor · 2026-06-30T22:36:22Z

                    ...latest,
                    active: Option.none<ActiveBackendRun>(),
                    ready: false,
+                    neverReadyAttempt: latest.neverReadyAttempt + 1,


Counts ready run exits

Medium Severity

finalizeRun always increments neverReadyAttempt on every process exit, even when latest.ready was already true for that run. That counts operational shutdowns after a successful HTTP readiness check toward the “exited without becoming ready” stdin cap, which can push a healthy WSL backend toward the Windows fallback after later bootstrap failures.

^{Reviewed by Cursor Bugbot for commit 4d3e49b. Configure here.}

… raw stream An adversarial review correctly found that registering the error handler on the raw stream (rather than the readline Interface) left a gap: an error on the stream before any line ever arrives is an uncaught exception regardless of cleanup(), because readline's own internal error listener is attached to the stream first and re-emits onto the interface with no listener, which throws before our handler ever runs. Verified directly against the production WSL Node version (v24.14.0): an error emitted on the stream before any line/close event crashes the process when only a stream-level listener is registered, and is handled cleanly once the listener moves to the interface. The previous fix's cleanup()-on-every-terminal-path logic still matters for errors *after* a line has been read or the reader has otherwise started winding down; this closes the remaining pre-first-line gap.

…sn't take Extract surfaceFatalFailure as the common tail of the fatal-preflight cap and the never-ready cap, and give the never-ready path the same already-surfaced guard the preflight path has: if the cap fired and the config still resolves a stdin backend that exits before ready, park the instance instead of re-showing the fallback dialog on every exit. Also fix two type errors in the new tests (stdin captures the full CommandInput | StdinConfig union, fd3 keeps its PlatformError channel).

Drop parseNodeVersion in favor of parseToolchainReport().nodeVersion, and source the out-of-range message from formatMissingToolsReason so the two paths can't give conflicting nvm advice (the resolved node path is still appended to pin down which install needs switching). A version mismatch is deterministic between back-to-back preflights, so return retryLimit: 1 and surface the dialog on the first attempt instead of burning the default five rounds; the fail-closed missing-version case keeps the default allowance for one-off probe hiccups. The probe also no longer pays a second Node cold start per preflight: the existing heredoc invocation now reports nodeVersion over fd 3.

…gg#3611) Consolidates fixes from PRs pingdotgg#3613, pingdotgg#3621, and pingdotgg#3623 to address both root causes and add a fail-safe: 1. DesktopWslEnvironment: Parse the Node version in the WSL node-pty probe and validate it against options.nodeEngineRange. Preflight now fails cleanly with an actionable error if the distro's default Node is incompatible. 2. DesktopBackendManager: Track neverReadyAttempt to cap consecutive post-spawn exits on stdin delivery. Prevents the desktop from permanently getting stuck restarting if the WSL backend consistently fails before readiness. 3. bootstrap: Clean up the readline interface on all event paths to prevent leaks when stdin stays open, and register the error listener on the readline interface to safely catch early stream errors.

cursor

Cursor Bugbot has reviewed your changes using high effort and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 80f8711. Configure here.}

cursor · 2026-07-02T13:08:33Z

+                    yield* logInstanceError("backend still never-ready after fallback; stopping", {
+                      reason,
+                      attempt: nextState.neverReadyAttempt,
+                    });


Never-ready cap counts fd3 exits

Medium Severity

neverReadyAttempt increases on every backend exit before readiness, including bootstrapDelivery:'fd3', but the over-cap branch that stops without calling onPreflightFailed runs only for stdin. After a Windows fallback keeps crash-looping with desiredRunning still true, switching back to WSL can exceed the cap on the first failed spawn and park silently with no fallback dialog.

Additional Locations (1)

apps/desktop/src/backend/DesktopBackendManager.ts#L520-L529

^{Reviewed by Cursor Bugbot for commit 80f8711. Configure here.}

Jgratton24 added 2 commits June 30, 2026 17:50

github-actions Bot added vouch:unvouched PR author is not yet trusted in the VOUCHED list. size:L 100-499 changed lines (additions + deletions). labels Jun 30, 2026

macroscopeapp Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread apps/desktop/src/backend/DesktopBackendManager.ts

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Comment thread apps/desktop/src/backend/DesktopBackendManager.ts

Comment thread apps/desktop/src/backend/DesktopBackendManager.ts

Comment thread apps/desktop/src/wsl/DesktopWslEnvironment.ts Outdated

chatgpt-codex-connector Bot reviewed Jun 30, 2026

View reviewed changes

Jgratton24 added 2 commits June 30, 2026 18:13

macroscopeapp Bot reviewed Jun 30, 2026

View reviewed changes

cursor Bot reviewed Jun 30, 2026

View reviewed changes

Jgratton24 added 3 commits June 30, 2026 18:39

cursor Bot reviewed Jul 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(desktop): resolve WSL backend getting stuck connecting (#3611)#3623

fix(desktop): resolve WSL backend getting stuck connecting (#3611)#3623
Jgratton24 wants to merge 7 commits into
pingdotgg:mainfrom
Jgratton24:fix/3611-wsl-stuck-connecting

Jgratton24 commented Jun 30, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Uh oh!

macroscopeapp Bot Jun 30, 2026

Uh oh!

cursor Bot Jun 30, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		config.value.bootstrapDelivery === "stdin" &&
		nextState.restartAttempt >= MAX_PREFLIGHT_FAILURE_ATTEMPTS

Uh oh!

Conversation

Jgratton24 commented Jun 30, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Changed

Why

UI Changes

Validation

Checklist

Fix WSL backend getting stuck in connecting state by capping never-ready exits and closing bootstrap stdin

Uh oh!

coderabbitai Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macroscopeapp Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

macroscopeapp Bot Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 30, 2026

Choose a reason for hiding this comment

Counts ready run exits

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jul 2, 2026

Choose a reason for hiding this comment

Never-ready cap counts fd3 exits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jgratton24 commented Jun 30, 2026 •

edited by macroscopeapp Bot

Loading

coderabbitai Bot commented Jun 30, 2026 •

edited

Loading

macroscopeapp Bot commented Jun 30, 2026 •

edited

Loading