Skip to content

Commit 57604d8

Browse files
d-csclaude
andcommitted
fix(webapp): stop writer DB connectivity errors leaking to trigger() API clients
During trigger() worker-queue resolution, getWorkerQueue wraps any error from getDefaultWorkerGroupForProject into a client-facing ServiceValidationError (HTTP 422) carrying error.message. That method runs project.findFirst on the *writer*; when the writer is unreachable Prisma throws P1001 ("Can't reach database server at <host>"), and its raw message — including the DB hostname — was echoed to the API client and surfaced in the customer's run view via the SDK's TriggerApiError. This also mis-classifies a transient outage: a 422 is not retried by the SDK, so triggers failed permanently instead of riding out a brief writer blip. Add isInfrastructureError() (Prisma connectivity codes P1001/P1002/P1008/P1017 plus init/panic/unknown classes) and, at the wrap site, rethrow infrastructure errors so they hit the route's generic 500 handler (scrubbed + retryable); only genuine domain failures (e.g. "Project not found.") become a 422. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 93532cd commit 57604d8

5 files changed

Lines changed: 142 additions & 0 deletions

File tree

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
area: webapp
3+
type: fix
4+
---
5+
6+
Stop `trigger()` from leaking raw database connection errors to API clients during a database outage; infrastructure errors now return a generic, retryable 500.

apps/webapp/app/runEngine/concerns/queues.server.ts

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ import type { RunEngine } from "~/v3/runEngine.server";
1515
import { env } from "~/env.server";
1616
import { tryCatch } from "@trigger.dev/core/v3";
1717
import { ServiceValidationError } from "~/v3/services/common.server";
18+
import { isInfrastructureError } from "~/utils/prismaErrors";
1819
import { createCache, createLRUMemoryStore, DefaultStatefulContext, Namespace } from "@internal/cache";
1920
import { singleton } from "~/utils/singleton";
2021
import type { TaskMetadataCache, TaskMetadataEntry } from "~/services/taskMetadataCache.server";
@@ -394,6 +395,17 @@ export class DefaultQueueManager implements QueueManager {
394395
);
395396

396397
if (error) {
398+
// getDefaultWorkerGroupForProject queries the writer DB. A Prisma
399+
// infrastructure error (e.g. P1001 "Can't reach database server", whose
400+
// message carries the DB hostname) must NOT be promoted into a
401+
// client-facing ServiceValidationError: that leaks internal infra detail
402+
// to the API client (the SDK echoes it into the run view) and
403+
// mis-classifies a transient outage as a non-retryable 422. Let it
404+
// propagate to the route's generic 500 handler (scrubbed + retryable);
405+
// only wrap genuine domain failures.
406+
if (isInfrastructureError(error)) {
407+
throw error;
408+
}
397409
throw new ServiceValidationError(error.message);
398410
}
399411

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import { Prisma } from "@trigger.dev/database";
2+
3+
// Prisma connectivity / infrastructure error codes — engine- and
4+
// connection-level failures, not query- or validation-level ones. When the
5+
// database is unreachable, Prisma 6.x throws a PrismaClientKnownRequestError
6+
// carrying one of these codes (e.g. P1001 "Can't reach database server").
7+
const INFRASTRUCTURE_PRISMA_CODES = new Set([
8+
"P1001", // Can't reach database server
9+
"P1002", // Database server reached but timed out
10+
"P1008", // Operations timed out
11+
"P1017", // Server has closed the connection
12+
]);
13+
14+
/**
15+
* True when `error` is a Prisma infrastructure/connectivity failure (DB
16+
* unreachable, timed out, connection dropped) rather than a query- or
17+
* validation-level error.
18+
*
19+
* These errors carry internal infrastructure detail (e.g. the database
20+
* hostname) in their `.message`, so they must never be surfaced to API
21+
* clients — callers should let them propagate to the generic 5xx handler
22+
* (which both scrubs the message and is retryable by the SDK) instead of
23+
* folding `.message` into a client-facing error.
24+
*/
25+
export function isInfrastructureError(error: unknown): boolean {
26+
if (
27+
error instanceof Prisma.PrismaClientInitializationError ||
28+
error instanceof Prisma.PrismaClientRustPanicError ||
29+
error instanceof Prisma.PrismaClientUnknownRequestError
30+
) {
31+
return true;
32+
}
33+
34+
if (error instanceof Prisma.PrismaClientKnownRequestError) {
35+
return INFRASTRUCTURE_PRISMA_CODES.has(error.code);
36+
}
37+
38+
return false;
39+
}
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
import { describe, expect, it } from "vitest";
2+
import { Prisma } from "@trigger.dev/database";
3+
import { isInfrastructureError } from "../app/utils/prismaErrors.js";
4+
5+
describe("isInfrastructureError", () => {
6+
it("treats a P1001 'can't reach database server' (KnownRequestError) as infrastructure", () => {
7+
// Prisma 6.x reports P1001 as a PrismaClientKnownRequestError with code P1001 —
8+
// this is the exact production shape that leaked the RDS hostname to a customer.
9+
const err = new Prisma.PrismaClientKnownRequestError(
10+
"Invalid `prisma.project.findFirst()` invocation: Can't reach database server at host:5432",
11+
{ code: "P1001", clientVersion: "6.14.0" }
12+
);
13+
expect(isInfrastructureError(err)).toBe(true);
14+
});
15+
16+
it("treats a PrismaClientInitializationError as infrastructure", () => {
17+
const err = new Prisma.PrismaClientInitializationError("init failed", "6.14.0");
18+
expect(isInfrastructureError(err)).toBe(true);
19+
});
20+
21+
it("does NOT treat a query/validation error (P2002 unique constraint) as infrastructure", () => {
22+
const err = new Prisma.PrismaClientKnownRequestError("Unique constraint failed", {
23+
code: "P2002",
24+
clientVersion: "6.14.0",
25+
});
26+
expect(isInfrastructureError(err)).toBe(false);
27+
});
28+
29+
it("does NOT treat a plain domain Error as infrastructure", () => {
30+
expect(isInfrastructureError(new Error("Project not found."))).toBe(false);
31+
});
32+
});
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
import { describe, expect, it } from "vitest";
2+
import { Prisma } from "@trigger.dev/database";
3+
import { DefaultQueueManager } from "../app/runEngine/concerns/queues.server.js";
4+
import { ServiceValidationError } from "../app/v3/services/common.server.js";
5+
6+
// Minimal non-DEVELOPMENT environment so getWorkerQueue resolves a worker group
7+
// (DEVELOPMENT short-circuits before touching the DB).
8+
function productionEnv() {
9+
return { type: "PRODUCTION", projectId: "proj_test", id: "env_test" } as any;
10+
}
11+
12+
describe("DefaultQueueManager.getWorkerQueue — writer DB error handling", () => {
13+
it("rethrows a Prisma connectivity error unchanged instead of wrapping it in a client-facing ServiceValidationError", async () => {
14+
// The exact production failure: getDefaultWorkerGroupForProject's writer
15+
// `project.findFirst` throws P1001 when the DB is unreachable. The raw
16+
// message carries the DB hostname and must NOT become a 422 with that text.
17+
const prisma = {
18+
project: {
19+
findFirst: async () => {
20+
throw new Prisma.PrismaClientKnownRequestError(
21+
"Invalid `prisma.project.findFirst()` invocation: Can't reach database server at host:5432",
22+
{ code: "P1001", clientVersion: "6.14.0" }
23+
);
24+
},
25+
},
26+
} as any;
27+
28+
const queueManager = new DefaultQueueManager(prisma, {} as any);
29+
30+
const result = await queueManager.getWorkerQueue(productionEnv()).then(
31+
() => ({ ok: true as const }),
32+
(error: unknown) => ({ ok: false as const, error })
33+
);
34+
35+
expect(result.ok).toBe(false);
36+
if (!result.ok) {
37+
expect(result.error).toBeInstanceOf(Prisma.PrismaClientKnownRequestError);
38+
expect(result.error).not.toBeInstanceOf(ServiceValidationError);
39+
}
40+
});
41+
42+
it("still wraps a genuine domain failure (project not found) as a ServiceValidationError", async () => {
43+
const prisma = {
44+
project: { findFirst: async () => null },
45+
} as any;
46+
47+
const queueManager = new DefaultQueueManager(prisma, {} as any);
48+
49+
await expect(queueManager.getWorkerQueue(productionEnv())).rejects.toBeInstanceOf(
50+
ServiceValidationError
51+
);
52+
});
53+
});

0 commit comments

Comments
 (0)