feat(api): migrate thv API transport from HTTP to UNIX socket / Windows named pipe#1694
feat(api): migrate thv API transport from HTTP to UNIX socket / Windows named pipe#1694
Conversation
Three sites in main were still building http://localhost:${port} for the thv API: fetchWorkloads() in chat/mcp-tools.ts, the workload-validation step in chat/settings-storage.ts, and defaultBuildClient() in chat/agents/builtin-agent-tools/skills.ts. After the socket migration getToolhivePort() returns undefined for the managed thv, so those calls hit http://localhost:undefined and threw — silently breaking the playground MCP-tools toggle (handleToggleTool catches and toasts), the post-install MCP-tools cleanup, and the skills agent.
Add createMainProcessApiClient() and hasToolhiveConnection() helpers in unix-socket-fetch.ts that wrap createMainProcessFetch(). Callers get a hey-api Client with the live transport (socket or TCP fallback) without having to know which one is in use; the baseUrl is a sentinel that the custom fetch ignores.
Replace the three call sites with the helper. Surfaced by playground.spec.ts which used to time out waiting for the 'N tools' badge after enabling an MCP server.
Windows AF_UNIX sockets created in %TEMP% fail with EACCES on connect because of the DACLs Go's os.Chmod(0660) sets in pkg/api/server.go — same-user access is not enough on common Windows builds. Named pipes are the canonical Windows IPC (Podman Desktop uses the same approach), have no filesystem ACL semantics to fight, are released by the kernel when the listener closes, and are accepted by Node's http.request({ socketPath }) natively.
generateSocketPath() now branches on process.platform === 'win32' to return \\.\pipe\toolhive-<pid>; the existing /tmp/toolhive-<pid>.sock path is unchanged on POSIX. The address flows through --socket=<addr> to thv unchanged. cleanupSocketFile() no-ops on Windows since there is no filesystem entry to remove.
Tests stub process.platform per case (Object.defineProperty + restore in finally): the existing .sock assertion is pinned to linux, and a new Windows test asserts the spawned --socket= arg matches /^--socket=\\\\\.\\pipe\\toolhive-\d+$/.
Requires a thv release that includes the matching winio.ListenPipe branch in pkg/api/socket_windows.go; the studio change is a no-op until that lands.
The studio now talks to its managed thv exclusively over a UNIX socket / Windows named pipe; the renderer's hey-api fetch is forwarded over IPC. The previous 'point dev studio at an external thv on TCP' workflow no longer composes cleanly with that — and the TCP fallback path in createMainProcessFetch was dead code in every shipping configuration. Replace THV_PORT with THV_SOCKET, expecting a socket path / named pipe address that the developer's external 'thv serve --socket=...' is listening on. Drop getToolhivePort(), the get-toolhive-port IPC channel, the electronAPI.getToolhivePort / isUsingCustomPort surfaces, and the connect-src http://localhost:<port> branch in CSP. Rename CustomPortBanner -> CustomSocketBanner; the banner now shows the socket / pipe path. THV_MCP_PORT stays because the experimental MCP backend is still TCP. Tests updated accordingly; main 548/548, renderer 1625/1625, knip clean. docs/README has the new external-thv recipe with both POSIX and Windows examples.
Allows iterable DOM types (Headers, FormData, NodeList, etc.) to be consumed by Object.fromEntries / Array.from / for...of in the renderer without forEach helpers.
The hey-api client routes API requests through ipcFetch (Electron IPC → UNIX socket / named pipe) instead of window.fetch, so Sentry browser tracing cannot auto-inject trace headers on outgoing requests. Inject them manually in the renderer using Sentry.getTraceData with propagateTraceparent: true so the W3C traceparent header is emitted alongside sentry-trace / baggage — thv API server uses otelhttp which reads traceparent. In the main process, wrap the socket call in Sentry.suppressTracing to keep @sentry/node httpIntegration from replacing the renderer headers with main own (parentless) trace context.
There was a problem hiding this comment.
Pull request overview
This PR migrates the Studio ↔ managed thv API transport away from a localhost TCP port to an IPC-forwarded HTTP-over-UNIX-socket (macOS/Linux) or Windows named pipe, so the renderer no longer talks to thv directly and no local port needs to be opened.
Changes:
- Added a renderer
fetchadapter (ipcFetch) and main-process IPC forwarder that proxies hey-api requests over Electron IPC and dialsthvviahttp.request({ socketPath }). - Updated
thvlifecycle management to spawnthv serve --socket=...with per-process socket/pipe addressing and updated CSP generation accordingly. - Updated unit tests and E2E helpers to use the new socket/pipe transport and added Windows-platform assertions.
Reviewed changes
Copilot reviewed 29 out of 30 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.app.json | Adds DOM.Iterable to support iterable DOM types (e.g., Headers iteration). |
| renderer/src/vite-env.d.ts | Simplifies env typing to ImportMetaEnv and removes the old custom env extension. |
| renderer/src/routes/__root.tsx | Swaps dev banner from custom port to custom socket/pipe. |
| renderer/src/renderer.tsx | Updates renderer bootstrap check to validate the new API bridge (apiFetch). |
| renderer/src/lib/client-config.ts | Configures hey-api to use ipcFetch with a sentinel baseUrl. |
| renderer/src/common/mocks/electronAPI.ts | Extends the Electron API test stub with apiFetch/apiFetchAbort. |
| renderer/src/common/lib/ipc-fetch.ts | New fetch-compatible IPC adapter including null-body status handling + tracing headers. |
| renderer/src/common/components/custom-socket-banner.tsx | Updates dev banner UX for external socket/pipe configuration. |
| preload/src/api/toolhive.ts | Exposes getToolhiveSocketPath, apiFetch, and apiFetchAbort to the renderer. |
| package.json | Adds start:inspect for Electron network inspection during development. |
| main/src/unix-socket-fetch.ts | New main-process socket/pipe HTTP forwarder + IPC handlers + main-process API client helpers. |
| main/src/toolhive-manager.ts | Generates socket/pipe paths, passes --socket, and exposes socket-path getters. |
| main/src/tests/toolhive-manager.test.ts | Updates spawn assertions for --socket and adds Windows pipe test coverage. |
| main/src/tests/graceful-exit.test.ts | Updates stopAllServers calls to the new options signature. |
| main/src/tests/auto-update.test.ts | Updates update-flow tests to the new shutdown path and mocks socket fetch bridge. |
| main/src/ipc-handlers/toolhive.ts | Replaces port IPC with socket-path IPC and registers the API fetch handlers. |
| main/src/graceful-exit.ts | Allows shutdown to use either TCP (port) or a provided custom fetch implementation. |
| main/src/csp.ts | Removes localhost connect-src requirement and tightens CSP directives. |
| main/src/chat/settings-storage.ts | Switches workload validation to use the new main-process API client helper. |
| main/src/chat/mcp-tools.ts | Switches workload listing to use the new main-process API client helper. |
| main/src/chat/agents/builtin-agent-tools/skills.ts | Switches skills agent client creation to the socket/pipe-aware helper. |
| main/src/chat/agents/builtin-agent-tools/tests/skills.test.ts | Updates mocks to target unix-socket-fetch helpers instead of port-based client creation. |
| main/src/chat/tests/mcp-tools.test.ts | Updates mocks to target createMainProcessApiClient. |
| main/src/auto-update.ts | Updates shutdown path to always attempt server teardown via socket/pipe fetch bridge. |
| main/src/app-events/when-ready.ts | Registers the API fetch handlers and updates CSP setup call signature. |
| main/src/app-events/process-signals.ts | Switches teardown to use socket/pipe fetch bridge during signal handling. |
| main/src/app-events/block-quit.ts | Switches teardown to use socket/pipe fetch bridge during quit interception. |
| e2e-tests/mcp-optimizer-startup-cleanup.spec.ts | Updates optimizer cleanup test to seed/verify state via socket/pipe helpers. |
| e2e-tests/helpers/app-relaunch.ts | Rewrites E2E out-of-band API helper to use http.request({ socketPath }). |
| docs/README.md | Updates dev docs for externally-managed thv to use THV_SOCKET and --socket. |
- isToolhiveRunning() now reports true when THV_SOCKET is set so the renderer setupSecretProvider loader and tray UI behave the same as with a managed thv binary. Without this, external dev mode silently skipped required backend setup. - THV_MCP_PORT is parsed via a small helper that validates the integer is in 1..65535 and logs a warning on rejects, falling back to undefined instead of leaking NaN downstream as a number. - createMainProcessFetch now mirrors the renderer-side NULL_BODY_STATUSES guard so 204/304 responses (e.g. /health) do not throw "Response with null body status cannot have body" when a Response is reconstructed in main. - Enable DOM.Iterable in tsconfig.node.json so Object.fromEntries can consume the Headers iterator directly, matching the renderer.
aponcedeleonch
left a comment
There was a problem hiding this comment.
A few comments on lifecycle coupling between studio and the spawned thv. None block the PR landing once the matching thv is in.
| }, | ||
| }) | ||
|
|
||
| log.info(`[startToolhive] Process spawned with PID: ${toolhiveProcess.pid}`) |
There was a problem hiding this comment.
Re: the spawn(...) call just above (line 218 onwards): this config doesn't tie thv's lifetime to studio's. detached: false on POSIX only controls the process group, it doesn't kill the child when the parent dies, and I don't see a parent-death watcher on the thv side either (no PR_SET_PDEATHSIG, no ppid polling).
If studio is SIGKILL'd (force-quit, OOM, debugger detach, hard crash), thv keeps running and keeps holding the discovery file. The next studio launch calls Discover, sees the orphan as healthy via /health, gets back StateRunning, and exits with "another ToolHive server is already running." The user lands on AlreadyRunningError until they manually pkill thv.
Under the 1:1 invariant this is the only failure mode I'd call a real bug introduced by the new architecture, since it's sticky across launches.
Options:
- thv opens a pipe inherited from studio and exits when the read end closes on parent death (cross-platform, no platform-specific calls).
- Linux:
PR_SET_PDEATHSIG(SIGTERM); macOS: kqueueNOTE_EXITon ppid; Windows: studio creates a job object withKILL_ON_JOB_CLOSEand assigns the thv child to it. - Studio-side stopgap: at startup, if
DiscoverreturnsStateRunning, check whether the thv's parent PID is alive; if it's an orphan, reap it before bailing out withALREADY_RUNNING.
I'd lean toward the inherited-pipe approach since it's portable and the contract is easy to reason about. Happy to file a follow-up issue if you'd prefer to land this PR first.
| @@ -74,18 +75,20 @@ export async function launchApp(userDataDir: string): Promise<LaunchedApp> { | |||
|
|
|||
| await window.getByRole('link', { name: /mcp servers/i }).waitFor() | |||
There was a problem hiding this comment.
Re: the env: {} block in electron.launch just above (around line 57): the discovery dir is hardcoded to xdg.StateHome/toolhive/server (pkg/server/discovery/discovery.go:51). No flag, no env override exposed on the thv side, but xdg.StateHome itself honors XDG_STATE_HOME.
This helper isolates userDataDir per launch but not the discovery state, so any parallel run on the same host collides on the global discovery file:
- dev running studio locally + e2e suite on the same box
- two parallel e2e jobs on a shared CI runner
- retry attempts that overlap
Two reasonable fixes:
- Quick: set
XDG_STATE_HOMEto a per-launch tmp dir in the env block here, and mirror that intoolhive-manager.tswhenTOOLHIVE_E2E=true. - Cleaner: have thv expose
--discovery-dir=so studio can pin it explicitly without piggybacking on XDG.
Not blocking, but worth noting in the test plan since the change makes the discovery file load-bearing for startup.
| } | ||
|
|
||
| if (toolhiveSocketPath) { | ||
| cleanupSocketFile(toolhiveSocketPath) |
There was a problem hiding this comment.
After cleanupSocketFile, toolhiveSocketPath is still set to the path we just unlinked. getToolhiveSocketPath() keeps returning it, hasToolhiveConnection() keeps returning true, and requireSocketPath() in unix-socket-fetch happily hands the dead path to http.request, which errors with ENOENT.
Mostly invisible during app shutdown (renderer is going away anyway), but it bites during restartToolhive (auto-update's safeServerShutdown, registry-auth recovery), where the renderer is still live and will surface ENOENT errors during the gap between stop and the next start.
The same issue exists in the 'exit' handler around line 273: it clears toolhiveProcess = undefined but leaves toolhiveSocketPath behind, so an unexpected thv crash leaves studio with stale state.
Suggest clearing it in both places:
// in stopToolhive, after cleanupSocketFile
toolhiveSocketPath = undefined
// in the 'exit' handler
toolhiveProcess = undefined
toolhiveSocketPath = undefined
The
thvbinary previously served the studio API on a localhost TCP port assigned at startup, requiring CSP relaxation, a127.0.0.1listener anyone on the box could probe, and agetToolhivePort()round-trip in every renderer fetch. This PR moves the managed-thv transport to a UNIX domain socket on macOS/Linux and a Windows named pipe on Windows, with the renderer's hey-apifetchforwarded over Electron IPC to the main process — which dials the socket/pipe directly via Node'shttp.request({ socketPath }). The studio is now the only process that can talk to thv; no localhost port is opened for the managed instance.Blocked by stacklok/toolhive#5201 for Windows support.
Summary
thvserves the API on a per-process UNIX socket (POSIX) or named pipe (Windows). Rendererfetchgoes throughipc-fetch.ts→ IPC → mainunix-socket-fetch.ts(http.request({ socketPath })). No localhost listener for the managed API; CSPcsp.tsno longer whiteliststhvinconnect-src.thv—THV_PORT→THV_SOCKET; optionalTHV_MCP_PORT(validated).isToolhiveRunning()is true in this mode so readiness (e.g. secret provider) works. Dev-onlyCustomSocketBannerwhen using an external socket.mcp-tools,settings-storage, and builtinskillsusecreateMainProcessApiClient()instead ofhttp://localhost:${port}(fixesundefinedport / broken playground & skills).sentry-trace/baggage/traceparentviagetTraceData({ propagateTraceparent: true }). Main wraps the socket request inSentry.suppressTracing()so headers aren’t overwritten.thvrelease; see linked PR).app-relaunch.tstalks tothvover the real socket path.ipc-fetch,unix-socket-fetch, CSP,graceful-exitcreateFetchpath,ipc-handlers/toolhive, plus existingtoolhive-managersocket /THV_SOCKETcases.How to validate
pnpm dev. Confirm app boots, MCP list/registry/playground tool toggle work, noconnect-srcviolations.thvships: pipe path fromgetToolhiveSocketPath(), smoke the same flows.thv—THV_SOCKET=… pnpm dev: banner appears, APIs work,isToolhiveRunningtrue.thv, confirm one trace id links renderer and backend spans.pnpm vitest run(or CI) green for the new/updated test files above.Out of scope
npipe://cross-process checks on Windows (follow-up). The studio doesn’t pass--nonce, so it doesn’t write the discovery file.thv.exeinutils/constants.tsuntil a taggedthvrelease includes the pipe listener.