fix(backend): route fastify listen/ready/close errors through logger instead of unhandled rejection by calebcgates · Pull Request #17401 · misskey-dev/misskey

calebcgates · 2026-05-14T15:23:45Z

What

packages/backend/src/server/ServerService.ts has three error-handling regressions around fastify.listen(), fastify.ready(), and fastify.close() that cause startup and shutdown failures to be reported as uncaught Node exceptions instead of through the misskey logger the code clearly intended to use.

The four changes:

await fastify.listen() inside a try/catch instead of fire-and-forget on the TCP path and callback-with-ignored-err on the socket path.
Run fs.chmodSync on the socket only after listen() resolves (today the chmod runs even when listen failed, throwing ENOENT and masking the original error).
Wrap await fastify.ready() in the same try/catch so plugin-registration timeouts (ERR_AVVIO_PLUGIN_TIMEOUT) are reported through this.logger.error instead of NestJS's bootstrap stack trace.
Cap fastify.close() at 5s in dispose() so leaked WebSocket upgrades from streamingApiServerService.attach(fastify.server) don't hang OnApplicationShutdown until PM2/systemd SIGKILLs the process.

The intent of the existing fastify.server.on('error', ...) block is preserved — moved into a handleListenError() helper invoked from the new try/catch. The reason this matters: fastify v5's listen() rejects its returned Promise when the underlying http.Server hits EADDRINUSE/EACCES, before (or in a race with) the http.Server emitting 'error'. On Node >= 15 with default --unhandled-rejections=throw, the listen-promise rejection terminates the process with a stack trace before server.on('error') can run. The handler is reachable today only by accident of timing.

Why

Concrete reproductions:

Repro 1 — TCP port in use:

# Terminal 1
nc -l 3000

# Terminal 2 (Misskey configured with `port: 3000`)
pnpm start
# Today: UnhandledPromiseRejectionWarning + stack trace, no friendly message
# After this PR: "Port 3000 is already in use by another process." + clean exit(1)

Repro 2 — Unix socket without permission:

# default.yml
socket: /var/run/misskey.sock
chmodSocket: '777'

Run as a non-root user. fastify.listen({ path }) rejects with EACCES, the callback's err is ignored, then fs.chmodSync('/var/run/misskey.sock', ...) throws ENOENT: no such file or directory because no socket was ever created. User sees the chmod error, not the actual permission error. After this PR: "You do not have permission to listen on /var/run/misskey.sock." + clean exit(1).

Repro 3 — OnApplicationShutdown hangs:

A WebSocket client opens a connection via streamingApiServerService.attach(fastify.server), then disconnects abruptly without a clean close frame. PM2 / systemd / k8s sends SIGTERM to misskey. dispose() calls await this.#fastify.close() which waits indefinitely for the upgraded socket to be reaped. PM2's default kill_timeout then SIGKILLs the process; logs show "process killed" with no diagnostic. After this PR: 5-second cap, then OnApplicationShutdown resolves cleanly so PM2 records a graceful exit.

These are the only four real bugs in this area. The other fastify call sites in the backend (route handlers throwing in ClientServerService / ApiServerService / WellKnownServerService, register() calls deferred to await fastify.ready(), addHook() callbacks) are correctly handled by the existing setErrorHandler (ClientServerService.ts:924) plus the now-properly-try-wrapped await fastify.ready() — they're not part of this PR's scope.

Additional info (optional)

Behavior preservation

Scenario	Old	New
Healthy TCP startup	works	works
Healthy socket startup	works	works (chmod still runs after listen)
`EADDRINUSE`	unhandled rejection trace	logger.error + exit(1)
`EACCES`	unhandled rejection trace	logger.error + exit(1)
Cluster-worker listen failure	`process.send('listenFailed')` only when `server.on('error')` wins the race	`process.send('listenFailed')` always
Plugin timeout in ready()	NestJS bootstrap trace	logger.error + exit(1)
Graceful shutdown w/ idle conns	works	works
Graceful shutdown w/ leaked WS	hangs until SIGKILL	resolves at 5s

On tests

packages/backend/test/unit/ has no ServerService spec today. Adding one requires mocking the NestJS DI graph for ~18 injected services and stubbing 11 sub-fastify-services. I scoped this PR to the bug-fix and the truth table above; happy to add a vitest unit test for handleListenError in a follow-up if useful — please flag if you want it bundled here instead.

Local verification

pnpm install --frozen-lockfile — no lockfile churn
pnpm --filter backend typecheck (tsgo) — 0 errors
pnpm --filter backend exec eslint src/server/ServerService.ts — 0 errors; warning count net-reduced from 9 to 5 (removed (err as any) cast, two unused-param warnings, and the this.config.socket! non-null assertion)

Checklist

Read the contribution guide
Test working in a local environment
Update CHANGELOG.md
(If possible) Add tests — see "On tests" above; offered as follow-up

…instead of unhandled rejection

fix(backend): route fastify listen/ready/close errors through logger …

0945fd6

…instead of unhandled rejection

github-project-automation Bot added this to [実験中] 管理用 May 14, 2026

github-project-automation Bot moved this to Todo in [実験中] 管理用 May 14, 2026

dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label May 14, 2026

github-actions Bot added the packages/backend Server side specific issue/PR label May 14, 2026

calebcgates mentioned this pull request May 14, 2026

ServerService: startup and shutdown failures surface as UnhandledPromiseRejection / SIGKILL instead of via logger #17399

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(backend): route fastify listen/ready/close errors through logger instead of unhandled rejection#17401

fix(backend): route fastify listen/ready/close errors through logger instead of unhandled rejection#17401
calebcgates wants to merge 1 commit into
misskey-dev:developfrom
calebcgates:fix/server-listen-and-shutdown-error-handling

calebcgates commented May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

calebcgates commented May 14, 2026

What

Why

Additional info (optional)

Behavior preservation

On tests

Local verification

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant