Skip to content

feat(api): migrate thv API transport from HTTP to UNIX socket / Windows named pipe#1694

Open
samuv wants to merge 15 commits intomainfrom
unix-socket
Open

feat(api): migrate thv API transport from HTTP to UNIX socket / Windows named pipe#1694
samuv wants to merge 15 commits intomainfrom
unix-socket

Conversation

@samuv
Copy link
Copy Markdown
Collaborator

@samuv samuv commented Mar 5, 2026

The thv binary previously served the studio API on a localhost TCP port assigned at startup, requiring CSP relaxation, a 127.0.0.1 listener anyone on the box could probe, and a getToolhivePort() round-trip in every renderer fetch. This PR moves the managed-thv transport to a UNIX domain socket on macOS/Linux and a Windows named pipe on Windows, with the renderer's hey-api fetch forwarded over Electron IPC to the main process — which dials the socket/pipe directly via Node's http.request({ socketPath }). The studio is now the only process that can talk to thv; no localhost port is opened for the managed instance.

Blocked by stacklok/toolhive#5201 for Windows support.

Summary

  • Transport — Managed thv serves the API on a per-process UNIX socket (POSIX) or named pipe (Windows). Renderer fetch goes through ipc-fetch.ts → IPC → main unix-socket-fetch.ts (http.request({ socketPath })). No localhost listener for the managed API; CSP csp.ts no longer whitelists thv in connect-src.
  • External thvTHV_PORTTHV_SOCKET; optional THV_MCP_PORT (validated). isToolhiveRunning() is true in this mode so readiness (e.g. secret provider) works. Dev-only CustomSocketBanner when using an external socket.
  • Main-process callersmcp-tools, settings-storage, and builtin skills use createMainProcessApiClient() instead of http://localhost:${port} (fixes undefined port / broken playground & skills).
  • Tracing — Renderer adds sentry-trace / baggage / traceparent via getTraceData({ propagateTraceparent: true }). Main wraps the socket request in Sentry.suppressTracing() so headers aren’t overwritten.
  • Windows — Pipes instead of temp-dir AF_UNIX (blocked on matching thv release; see linked PR).
  • E2Eapp-relaunch.ts talks to thv over the real socket path.
  • Tests — New/extended coverage for ipc-fetch, unix-socket-fetch, CSP, graceful-exit createFetch path, ipc-handlers/toolhive, plus existing toolhive-manager socket / THV_SOCKET cases.

How to validate

  1. macOS / Linuxpnpm dev. Confirm app boots, MCP list/registry/playground tool toggle work, no connect-src violations.
  2. Windows — After matching thv ships: pipe path from getToolhiveSocketPath(), smoke the same flows.
  3. External thvTHV_SOCKET=… pnpm dev: banner appears, APIs work, isToolhiveRunning true.
  4. Tracing — With Sentry on studio + thv, confirm one trace id links renderer and backend spans.
  5. Testspnpm vitest run (or CI) green for the new/updated test files above.

Out of scope

  • thv discovery file / npipe:// cross-process checks on Windows (follow-up). The studio doesn’t pass --nonce, so it doesn’t write the discovery file.
  • Bumping bundled thv.exe in utils/constants.ts until a tagged thv release includes the pipe listener.

@github-actions github-actions Bot added the size/M label Mar 5, 2026
@samuv samuv changed the title feat(api): move from http to socket POC feat(api): move from TCP/IP host:port to UNIX socket POC Mar 13, 2026
@github-actions github-actions Bot added size/L and removed size/M labels May 6, 2026
@samuv samuv changed the title feat(api): move from TCP/IP host:port to UNIX socket POC feat(api): migrate thv API transport from HTTP to UNIX socket / Windows named pipe May 6, 2026
samuv added 12 commits May 6, 2026 15:26
Three sites in main were still building http://localhost:${port} for the thv API: fetchWorkloads() in chat/mcp-tools.ts, the workload-validation step in chat/settings-storage.ts, and defaultBuildClient() in chat/agents/builtin-agent-tools/skills.ts. After the socket migration getToolhivePort() returns undefined for the managed thv, so those calls hit http://localhost:undefined and threw — silently breaking the playground MCP-tools toggle (handleToggleTool catches and toasts), the post-install MCP-tools cleanup, and the skills agent.

Add createMainProcessApiClient() and hasToolhiveConnection() helpers in unix-socket-fetch.ts that wrap createMainProcessFetch(). Callers get a hey-api Client with the live transport (socket or TCP fallback) without having to know which one is in use; the baseUrl is a sentinel that the custom fetch ignores.

Replace the three call sites with the helper. Surfaced by playground.spec.ts which used to time out waiting for the 'N tools' badge after enabling an MCP server.
Windows AF_UNIX sockets created in %TEMP% fail with EACCES on connect because of the DACLs Go's os.Chmod(0660) sets in pkg/api/server.go — same-user access is not enough on common Windows builds. Named pipes are the canonical Windows IPC (Podman Desktop uses the same approach), have no filesystem ACL semantics to fight, are released by the kernel when the listener closes, and are accepted by Node's http.request({ socketPath }) natively.

generateSocketPath() now branches on process.platform === 'win32' to return \\.\pipe\toolhive-<pid>; the existing /tmp/toolhive-<pid>.sock path is unchanged on POSIX. The address flows through --socket=<addr> to thv unchanged. cleanupSocketFile() no-ops on Windows since there is no filesystem entry to remove.

Tests stub process.platform per case (Object.defineProperty + restore in finally): the existing .sock assertion is pinned to linux, and a new Windows test asserts the spawned --socket= arg matches /^--socket=\\\\\.\\pipe\\toolhive-\d+$/.

Requires a thv release that includes the matching winio.ListenPipe branch in pkg/api/socket_windows.go; the studio change is a no-op until that lands.
The studio now talks to its managed thv exclusively over a UNIX socket / Windows named pipe; the renderer's hey-api fetch is forwarded over IPC. The previous 'point dev studio at an external thv on TCP' workflow no longer composes cleanly with that — and the TCP fallback path in createMainProcessFetch was dead code in every shipping configuration.

Replace THV_PORT with THV_SOCKET, expecting a socket path / named pipe address that the developer's external 'thv serve --socket=...' is listening on. Drop getToolhivePort(), the get-toolhive-port IPC channel, the electronAPI.getToolhivePort / isUsingCustomPort surfaces, and the connect-src http://localhost:<port> branch in CSP. Rename CustomPortBanner -> CustomSocketBanner; the banner now shows the socket / pipe path. THV_MCP_PORT stays because the experimental MCP backend is still TCP.

Tests updated accordingly; main 548/548, renderer 1625/1625, knip clean. docs/README has the new external-thv recipe with both POSIX and Windows examples.
Allows iterable DOM types (Headers, FormData, NodeList, etc.) to be
consumed by Object.fromEntries / Array.from / for...of in the renderer
without forEach helpers.
The hey-api client routes API requests through ipcFetch (Electron IPC →
UNIX socket / named pipe) instead of window.fetch, so Sentry browser
tracing cannot auto-inject trace headers on outgoing requests. Inject
them manually in the renderer using Sentry.getTraceData with
propagateTraceparent: true so the W3C traceparent header is emitted
alongside sentry-trace / baggage — thv API server uses otelhttp which
reads traceparent.

In the main process, wrap the socket call in Sentry.suppressTracing to
keep @sentry/node httpIntegration from replacing the renderer headers
with main own (parentless) trace context.
@samuv samuv marked this pull request as ready for review May 6, 2026 13:54
Copilot AI review requested due to automatic review settings May 6, 2026 13:54
@samuv samuv self-assigned this May 6, 2026
@samuv samuv added the blocked label May 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the Studio ↔ managed thv API transport away from a localhost TCP port to an IPC-forwarded HTTP-over-UNIX-socket (macOS/Linux) or Windows named pipe, so the renderer no longer talks to thv directly and no local port needs to be opened.

Changes:

  • Added a renderer fetch adapter (ipcFetch) and main-process IPC forwarder that proxies hey-api requests over Electron IPC and dials thv via http.request({ socketPath }).
  • Updated thv lifecycle management to spawn thv serve --socket=... with per-process socket/pipe addressing and updated CSP generation accordingly.
  • Updated unit tests and E2E helpers to use the new socket/pipe transport and added Windows-platform assertions.

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tsconfig.app.json Adds DOM.Iterable to support iterable DOM types (e.g., Headers iteration).
renderer/src/vite-env.d.ts Simplifies env typing to ImportMetaEnv and removes the old custom env extension.
renderer/src/routes/__root.tsx Swaps dev banner from custom port to custom socket/pipe.
renderer/src/renderer.tsx Updates renderer bootstrap check to validate the new API bridge (apiFetch).
renderer/src/lib/client-config.ts Configures hey-api to use ipcFetch with a sentinel baseUrl.
renderer/src/common/mocks/electronAPI.ts Extends the Electron API test stub with apiFetch/apiFetchAbort.
renderer/src/common/lib/ipc-fetch.ts New fetch-compatible IPC adapter including null-body status handling + tracing headers.
renderer/src/common/components/custom-socket-banner.tsx Updates dev banner UX for external socket/pipe configuration.
preload/src/api/toolhive.ts Exposes getToolhiveSocketPath, apiFetch, and apiFetchAbort to the renderer.
package.json Adds start:inspect for Electron network inspection during development.
main/src/unix-socket-fetch.ts New main-process socket/pipe HTTP forwarder + IPC handlers + main-process API client helpers.
main/src/toolhive-manager.ts Generates socket/pipe paths, passes --socket, and exposes socket-path getters.
main/src/tests/toolhive-manager.test.ts Updates spawn assertions for --socket and adds Windows pipe test coverage.
main/src/tests/graceful-exit.test.ts Updates stopAllServers calls to the new options signature.
main/src/tests/auto-update.test.ts Updates update-flow tests to the new shutdown path and mocks socket fetch bridge.
main/src/ipc-handlers/toolhive.ts Replaces port IPC with socket-path IPC and registers the API fetch handlers.
main/src/graceful-exit.ts Allows shutdown to use either TCP (port) or a provided custom fetch implementation.
main/src/csp.ts Removes localhost connect-src requirement and tightens CSP directives.
main/src/chat/settings-storage.ts Switches workload validation to use the new main-process API client helper.
main/src/chat/mcp-tools.ts Switches workload listing to use the new main-process API client helper.
main/src/chat/agents/builtin-agent-tools/skills.ts Switches skills agent client creation to the socket/pipe-aware helper.
main/src/chat/agents/builtin-agent-tools/tests/skills.test.ts Updates mocks to target unix-socket-fetch helpers instead of port-based client creation.
main/src/chat/tests/mcp-tools.test.ts Updates mocks to target createMainProcessApiClient.
main/src/auto-update.ts Updates shutdown path to always attempt server teardown via socket/pipe fetch bridge.
main/src/app-events/when-ready.ts Registers the API fetch handlers and updates CSP setup call signature.
main/src/app-events/process-signals.ts Switches teardown to use socket/pipe fetch bridge during signal handling.
main/src/app-events/block-quit.ts Switches teardown to use socket/pipe fetch bridge during quit interception.
e2e-tests/mcp-optimizer-startup-cleanup.spec.ts Updates optimizer cleanup test to seed/verify state via socket/pipe helpers.
e2e-tests/helpers/app-relaunch.ts Rewrites E2E out-of-band API helper to use http.request({ socketPath }).
docs/README.md Updates dev docs for externally-managed thv to use THV_SOCKET and --socket.

Comment thread main/src/toolhive-manager.ts
Comment thread main/src/toolhive-manager.ts Outdated
Comment thread main/src/unix-socket-fetch.ts
Comment thread docs/README.md
samuv added 2 commits May 6, 2026 17:03
- isToolhiveRunning() now reports true when THV_SOCKET is set so the
  renderer setupSecretProvider loader and tray UI behave the same as
  with a managed thv binary. Without this, external dev mode silently
  skipped required backend setup.
- THV_MCP_PORT is parsed via a small helper that validates the integer
  is in 1..65535 and logs a warning on rejects, falling back to
  undefined instead of leaking NaN downstream as a number.
- createMainProcessFetch now mirrors the renderer-side NULL_BODY_STATUSES
  guard so 204/304 responses (e.g. /health) do not throw
  "Response with null body status cannot have body" when a Response is
  reconstructed in main.
- Enable DOM.Iterable in tsconfig.node.json so Object.fromEntries can
  consume the Headers iterator directly, matching the renderer.
@github-actions github-actions Bot removed the size/L label May 6, 2026
Copy link
Copy Markdown
Member

@aponcedeleonch aponcedeleonch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on lifecycle coupling between studio and the spawned thv. None block the PR landing once the matching thv is in.

},
})

log.info(`[startToolhive] Process spawned with PID: ${toolhiveProcess.pid}`)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: the spawn(...) call just above (line 218 onwards): this config doesn't tie thv's lifetime to studio's. detached: false on POSIX only controls the process group, it doesn't kill the child when the parent dies, and I don't see a parent-death watcher on the thv side either (no PR_SET_PDEATHSIG, no ppid polling).

If studio is SIGKILL'd (force-quit, OOM, debugger detach, hard crash), thv keeps running and keeps holding the discovery file. The next studio launch calls Discover, sees the orphan as healthy via /health, gets back StateRunning, and exits with "another ToolHive server is already running." The user lands on AlreadyRunningError until they manually pkill thv.

Under the 1:1 invariant this is the only failure mode I'd call a real bug introduced by the new architecture, since it's sticky across launches.

Options:

  • thv opens a pipe inherited from studio and exits when the read end closes on parent death (cross-platform, no platform-specific calls).
  • Linux: PR_SET_PDEATHSIG(SIGTERM); macOS: kqueue NOTE_EXIT on ppid; Windows: studio creates a job object with KILL_ON_JOB_CLOSE and assigns the thv child to it.
  • Studio-side stopgap: at startup, if Discover returns StateRunning, check whether the thv's parent PID is alive; if it's an orphan, reap it before bailing out with ALREADY_RUNNING.

I'd lean toward the inherited-pipe approach since it's portable and the contract is easy to reason about. Happy to file a follow-up issue if you'd prefer to land this PR first.

@@ -74,18 +75,20 @@ export async function launchApp(userDataDir: string): Promise<LaunchedApp> {

await window.getByRole('link', { name: /mcp servers/i }).waitFor()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: the env: {} block in electron.launch just above (around line 57): the discovery dir is hardcoded to xdg.StateHome/toolhive/server (pkg/server/discovery/discovery.go:51). No flag, no env override exposed on the thv side, but xdg.StateHome itself honors XDG_STATE_HOME.

This helper isolates userDataDir per launch but not the discovery state, so any parallel run on the same host collides on the global discovery file:

  • dev running studio locally + e2e suite on the same box
  • two parallel e2e jobs on a shared CI runner
  • retry attempts that overlap

Two reasonable fixes:

  • Quick: set XDG_STATE_HOME to a per-launch tmp dir in the env block here, and mirror that in toolhive-manager.ts when TOOLHIVE_E2E=true.
  • Cleaner: have thv expose --discovery-dir= so studio can pin it explicitly without piggybacking on XDG.

Not blocking, but worth noting in the test plan since the change makes the discovery file load-bearing for startup.

}

if (toolhiveSocketPath) {
cleanupSocketFile(toolhiveSocketPath)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After cleanupSocketFile, toolhiveSocketPath is still set to the path we just unlinked. getToolhiveSocketPath() keeps returning it, hasToolhiveConnection() keeps returning true, and requireSocketPath() in unix-socket-fetch happily hands the dead path to http.request, which errors with ENOENT.

Mostly invisible during app shutdown (renderer is going away anyway), but it bites during restartToolhive (auto-update's safeServerShutdown, registry-auth recovery), where the renderer is still live and will surface ENOENT errors during the gap between stop and the next start.

The same issue exists in the 'exit' handler around line 273: it clears toolhiveProcess = undefined but leaves toolhiveSocketPath behind, so an unexpected thv crash leaves studio with stale state.

Suggest clearing it in both places:

// in stopToolhive, after cleanupSocketFile
toolhiveSocketPath = undefined

// in the 'exit' handler
toolhiveProcess = undefined
toolhiveSocketPath = undefined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants