Skip to content

feat(web): binary file attachments for Ask#1375

Merged
whoisthey merged 4 commits into
mainfrom
whoisthey/binary-file-attachments
Jun 30, 2026
Merged

feat(web): binary file attachments for Ask#1375
whoisthey merged 4 commits into
mainfrom
whoisthey/binary-file-attachments

Conversation

@whoisthey

@whoisthey whoisthey commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Adds support for binary (image) file attachments in Ask Sourcebot, building on the inline-text attachment work in the base branch. Users can attach PNG/JPEG/WebP/GIF images (drag-and-drop or file picker) to a chat message; the bytes are uploaded to app-mediated blob storage and sent to vision-capable models as native image content. Unlike text attachments, image bytes never travel in the messages JSON — they're referenced by id and served through an access-controlled route.

This is an enterprise (ee) feature, gated by the ask entitlement.

What's included

Data model & storage

  • New Attachment model + ChatAttachment join table + AttachmentStatus (PENDINGCOMMITTED) enum, with migration. Attachments are uploaded before any chat exists and linked to chats via ChatAttachment, keeping access purely chat-derived.
  • StorageBackend abstraction in @sourcebot/shared with a LocalFsStorageBackend (bytes under DATA_CACHE_DIR/attachments), shared by the web app and the backend orphan pruner. An S3 driver is planned as a follow-up.
  • New blob variant on the AttachmentData discriminated union (references stored bytes by id; bytes stay out of message JSON).

Upload & serving

  • POST /api/ee/chat/attachments: authenticated (no anonymous uploads), entitlement-gated. Decodes the image with sharp to authoritatively determine the format (never the client MIME/extension; SVG excluded) and enforces server-side byte and pixel-dimension caps, the latter guarding against decompression bombs. Returns the attachmentId; images upload on select.
  • Commit-on-send: commitMessageAttachments atomically links referenced blobs to the chat and flips PENDING → COMMITTED, rejecting forged/unauthorized ids before the agent runs.
  • GET /api/ee/chat/{chatId}/attachments/{attachmentId}: serves bytes to the uploader, or to any caller who can view the chat and has a ChatAttachment link for it. Sets X-Content-Type-Options: nosniff, Cache-Control: private, no-store, and a header-safe Content-Disposition.

Agent / model integration

  • Server-authoritative model capability resolution (input modalities from the models.dev catalog); the client signal is never trusted.
  • A single mediaType → modality resolver and → model content part builder drive attachment handling, so support for additional modalities (PDF/audio/video) extends one place.
  • resolveLatestTurnMedia loads bytes from storage for the latest user turn (only blobs linked to that chat and accepted by the model) and buildUserModelMessage attaches native content parts. Media bytes are only sent on the turn they were added; a short marker is left when attachments are dropped, distinguishing an older turn, an unsupported modality, and a failed read.

Client UI

  • Image attach affordance is gated on the selected model's image capability; the file picker / drag-overlay accept includes image types only when supported.
  • Pending-image tray pill with thumbnail, upload status (uploading / uploaded / error), on-hover preview, and a full viewer dialog. Submit is blocked while uploads are in flight or in an error state.
  • Sent images render from the serving route, which serves the uploader their own bytes during the brief pre-commit window before the ChatAttachment link exists.

Lifecycle & cleanup

  • Orphan pruner (backend worker) periodically deletes PENDING (uploaded-but-never-sent) blobs older than a TTL, along with their bytes, via the shared StorageBackend.
  • Deleting a chat sweeps blobs left with zero links; duplicating a chat copies the attachment links (metadata only, no byte copy).

Config / observability

  • New env vars: SOURCEBOT_CHAT_ATTACHMENT_MAX_IMAGE_BYTES (default 10 MiB) and SOURCEBOT_CHAT_ATTACHMENT_ORPHAN_TTL_HOURS (default 24, 0 disables). Documented in environment-variables.mdx.
  • New analytics: chat_attachment_uploaded, chat_attachment_degraded.
Screen.Recording.2026-06-29.at.9.44.35.AM.mov
Screen.Recording.2026-06-29.at.9.46.31.AM.mov

Summary by CodeRabbit

  • New Features

    • Added image attachments to chat workflows, including client upload with validation, richer previews, and secure download streaming with safe filenames.
    • Enable image attachment input only when the selected model supports images; unsupported images are omitted with user-visible notes.
    • Documented new environment settings for max image size and orphan attachment retention; added support for orphan cleanup behavior configuration.
  • Bug Fixes

    • Improved attachment access enforcement so only linked and permitted attachments are viewable.
    • Added robust orphan-attachment cleanup for unlinked blobs using a configurable TTL, reducing stale storage buildup.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds persisted chat attachments with upload, commit, serving, preview, and cleanup flows. The change also threads image-size limits through the app, updates model capability handling, and adds background pruning for unlinked attachments.

Changes

Chat Image Attachment Feature

Layer / File(s) Summary
Persistence and storage
packages/db/prisma/migrations/..., packages/db/prisma/schema.prisma, packages/shared/src/env.server.ts, packages/shared/src/storage.ts, packages/shared/src/index.server.ts, docs/docs/configuration/environment-variables.mdx, CHANGELOG.md
Adds the attachment lifecycle enum and Prisma models, new server env vars for image size and orphan TTL, the shared storage abstraction, and their shared re-exports.
Shared attachment contracts
packages/web/src/features/chat/types.ts, packages/web/src/features/chat/attachments/validation.ts, packages/web/src/features/chat/attachments/filename.ts, packages/web/src/features/chat/attachments/modality.ts, packages/web/src/features/chat/constants.ts, packages/web/src/lib/posthogEvents.ts
Adds blob attachment typing, server-side image validation, filename sanitization, modality mapping helpers, image limits, MIME allowlists, and new PostHog event schemas.
Server helpers and chat access
packages/web/src/features/chat/utils.server.ts, packages/web/src/features/chat/utils.ts
Adds shared chat access resolution, attachment commitment, orphan cleanup, and inline text byte counting for server-side enforcement.
Attachment upload and serve routes
packages/web/src/app/api/(server)/ee/chat/attachments/route.ts, packages/web/src/app/api/(server)/ee/chat/[chatId]/attachments/[attachmentId]/route.ts
Adds the authenticated image upload route and the attachment byte-serving route with access checks, storage validation, streaming, and response headers.
Chat route, capabilities, and agent media loading
packages/web/src/app/api/(server)/ee/chat/route.ts, packages/web/src/features/chat/modelsDevCatalog.server.ts, packages/web/src/features/chat/modelCapabilities.server.ts, packages/web/src/features/chat/modelCapabilities.server.test.ts, packages/web/src/ee/features/chat/agent.ts, packages/web/src/ee/features/chat/agent.test.ts
Updates chat submission enforcement, model capability resolution, and agent message construction to handle attachment bytes and accepted modalities.
Client upload pipeline
packages/web/src/features/chat/attachmentUtils.ts, packages/web/src/features/chat/components/chatBox/chatBox.tsx
Extends pending attachment handling, uploads images, gates submission on upload state and model support, and revokes image object URLs.
Attachment UI components
packages/web/src/features/chat/components/chatBox/attachmentButton.tsx, packages/web/src/features/chat/components/chatBox/attachmentTray.tsx, packages/web/src/features/chat/components/chatBox/attachmentViewerDialog.tsx, packages/web/src/features/chat/components/chatBox/chatPaneDropzone.tsx, packages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsx, packages/web/src/ee/features/chat/components/chatThread/messageAttachments.tsx
Updates the attachment picker, tray, viewer dialog, dropzone, and thread attachment rendering for image previews and serving URLs.
Max image bytes wiring
packages/web/src/app/(app)/askgh/..., packages/web/src/app/(app)/chat/..., packages/web/src/ee/features/chat/components/chatThread/...
Threads the image size limit from shared env config through the app entrypoints and chat thread components down to ChatBox.
Chat actions and attachment cleanup
packages/web/src/features/chat/actions.ts
Uses shared chat access resolution in chat info, deletes orphaned attachments after chat deletion, and recreates attachment links when duplicating chats.
Backend AttachmentPruner scheduler
packages/backend/src/attachmentPruner.ts, packages/backend/src/index.ts
Adds a scheduled backend worker that prunes orphaned attachment rows and storage objects, and wires it into backend startup and shutdown.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • sourcebot-dev/sourcebot#888: Shares the same ownership and chat visibility access model used by the new attachment read and cleanup flows.
  • sourcebot-dev/sourcebot#1073: Touches the same chat action surface that now resolves access and manages attachment cleanup during delete/duplicate flows.
  • sourcebot-dev/sourcebot#1374: Introduced the earlier chat attachment foundation that this PR extends to persisted blob/image attachments.

Suggested reviewers

  • msukkari
  • jsourcebot
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: binary file attachments for Ask in the web app.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch whoisthey/binary-file-attachments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@whoisthey

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 17

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/backend/src/attachmentPruner.ts`:
- Around line 38-40: The startup call to pruneOrphanedAttachments() is firing
without any await or error handling, which can surface as an unhandled rejection
and crash the worker during initialization. Update the startup path in
attachmentPruner so the first prune is either awaited from an async
initialization flow or wrapped with explicit catch/log handling, while keeping
the recurring setIntervalAsync scheduling intact.
- Around line 55-80: The orphan cleanup in attachmentPruner’s batch loop deletes
files based only on the initial findMany result, so a PENDING attachment can be
unlinked even if it becomes linked or non-orphaned before deletion. Re-check
each attachment’s current state in the same batch before calling unlink, ideally
by verifying the row still matches the orphan criteria in this method before
file removal and before deleteMany. Use the existing attachmentPruner loop, the
db.attachment queries, and the unlink call to keep only still-orphaned
attachments eligible for byte deletion.

In
`@packages/db/prisma/migrations/20260627000032_add_chat_attachments/migration.sql`:
- Around line 42-43: The Attachment foreign key on uploadedById currently
cascades deletes from User, which causes committed attachments and their
ChatAttachment links to disappear when an uploader is removed. Update the
migration and the corresponding Prisma model for Attachment/uploadedById so
deletion of a User does not delete Attachment rows; use a non-cascading delete
behavior that preserves historical attachments for existing chats.

In
`@packages/web/src/app/api/`(server)/ee/chat/[chatId]/attachments/[attachmentId]/route.ts:
- Line 101: The attachment response in the route handler for chat attachments is
being cached too aggressively via the Cache-Control header. Update the
response-building logic in the attachment route so access-controlled content is
not reused for an hour by the browser; use a no-store/no-cache style policy (or
equivalent short-lived revalidation) for the attachment fetch path, especially
where the route checks authorization before returning the file.

In `@packages/web/src/app/api/`(server)/ee/chat/attachments/route.ts:
- Around line 18-24: The attachment size check in the chat upload route is still
bypassable because it only relies on content-length before calling
req.formData() and file.arrayBuffer() in the route handler. Update the
attachment handling flow in the route’s upload path to enforce an authoritative
limit after parsing by rejecting oversized File objects before buffering their
contents, and keep the existing maxImageBytes check as a secondary guard. Use
the existing attachment route logic and the file size handling around
req.formData(), file.arrayBuffer(), and maxImageBytes to ensure the request is
rejected even when content-length is missing or chunked.

In `@packages/web/src/app/api/`(server)/ee/chat/route.ts:
- Line 79: Reject empty message arrays in the chat route before accessing
latestMessage, since messages: [] currently passes validation and leaves
latestMessage undefined. Update the request handling in the route’s
message-processing logic around latestMessage to explicitly validate that
messages has at least one entry and return a typed 400 for empty arrays before
any downstream dereference of parts.
- Around line 98-104: Validate the model in the chat route before calling
commitMessageAttachments. In the route handler for the chat API, move the
languageModelConfig check ahead of the attachment commit so a bad model request
returns 400 before blobs are linked/flipped. Update the flow around
latestMessage, commitMessageAttachments, and languageModelConfig so attachments
are only persisted after the model request is confirmed.

In `@packages/web/src/ee/features/chat/agent.ts`:
- Around line 139-143: The omission note in agent.ts uses the wrong reason when
the latest user turn has images but all image reads fail. Update the logic
around the imageBlobs.length > 0 branch so the reason distinguishes between
unsupported image models, images added on a different turn, and images that were
added on this turn but could not be loaded. Use the existing identifiers
isLatestUserTurn, supportsImages, and imageBlobs to select the correct message
before appending to baseText.

In `@packages/web/src/features/chat/actions.ts`:
- Around line 306-322: Snapshot the source chat’s attachment links before
persisting the duplicate chat so the copy isn’t affected by a concurrent
delete/cascade. In the chat-duplication flow in actions.ts, read originalLinks
from prisma.chatAttachment.findMany using originalChat.id before creating
newChat, then create newChat and use the saved links in
prisma.chatAttachment.createMany. Keep the fix localized to the duplication
logic that handles originalChat, newChat, and chatAttachment.
- Around line 196-214: The attachment snapshot in the chat delete flow can miss
links created concurrently, causing orphaned blobs after deletion. Update the
delete path in actions.ts around the linkedAttachments fetch and
prisma.chat.delete call so orphan cleanup is based on a post-delete or
transaction-safe view of attachments, and ensure deleteOrphanedAttachments runs
with all attachmentIds still associated with the chat at the moment of deletion.
Use the existing deleteOrphanedAttachments helper and
prisma.chatAttachment/prisma.chat delete logic to keep the cleanup atomic or
re-read after delete before sweeping.

In `@packages/web/src/features/chat/attachments/filename.ts`:
- Around line 1-15: The sanitizeFilename helper only removes control characters
and whitespace, but it still allows markup-significant characters that can break
the <attachment filename="..."> boundary. Update sanitizeFilename to also strip
or escape characters like double quotes, angle brackets, and ampersands while
keeping the basename and existing fallback behavior intact.

In `@packages/web/src/features/chat/components/chatBox/chatBox.tsx`:
- Around line 281-298: The submit gating in chatBox should also catch image
attachments that are failed or malformed, not just `uploading`, because
`attachmentData` later drops those silently and can still allow an empty-text
send. Update the submit-disabled logic in the `chatBox` submit-state helper to
treat non-sendable image attachments the same as uploading ones, using the same
image attachment status checks that feed `attachmentData` so the UI blocks
submit whenever an image won’t actually be included.

In `@packages/web/src/features/chat/constants.ts`:
- Around line 21-24: ATTACHMENT_MAX_IMAGE_BYTES is hard-coded in constants.ts
even though the upload limit is configurable on the server, so the client can
diverge from the authoritative value. Update the client-side early-rejection
logic to source the max image bytes from the same configurable setting used by
the upload route (env.SOURCEBOT_CHAT_ATTACHMENT_MAX_IMAGE_BYTES) rather than a
fixed 10 MiB constant, and keep the existing chat attachment constants
synchronized with the server-facing limit.

In `@packages/web/src/features/chat/modelsDevCatalog.server.ts`:
- Around line 123-129: The cold-start gating in ModelsDevCatalog.server.ts only
uses hasAttempted derived from catalogFetchedAt and lastFailedAt, so it stays
false while inFlightFetch is still pending and allows repeated short waits.
Update the awaitWhenEmpty path to track the cold-start wait attempt separately
from fetch settlement, and use that flag in the condition around Promise.race so
only one process-wide COLD_START_BLOCK_BUDGET_MS wait can occur. Keep the
existing inFlightFetch and cachedCatalog behavior, but mark the short-wait as
attempted as soon as it is started.

In `@packages/web/src/features/chat/utils.server.ts`:
- Around line 222-242: The orphan cleanup in the attachment deletion flow needs
a final safety check before removing rows, because a concurrent relink can
happen after the initial lookup. Update the logic in the utility that computes
orphanedIds and calls prisma.attachment.deleteMany so the delete is conditional
on the attachment still having no chatAttachment references at delete time,
rather than deleting by bare id from the earlier snapshot.
- Around line 159-199: The attachment claim flow in the chat utility is
validating and then committing outside a single atomic check, so two concurrent
sends can both attach the same upload. Move the PENDING/ownership check into the
commit path in the `createMany`/`updateMany` transaction inside the chat
attachment helper in `utils.server.ts`, and ensure the `attachment` update only
succeeds when the row is still `AttachmentStatus.PENDING` and belongs to the
expected `userId`/`orgId`. If the conditional update affects fewer rows than
`idsToCommit`, treat it as an invalid request and do not create any
`chatAttachment` links.

In `@packages/web/src/lib/posthogEvents.ts`:
- Around line 207-218: The PostHog event schema for chat attachment events
currently leaves source optional, which allows indistinguishable cross-surface
emissions from the upload flow. Update the type definitions in posthogEvents.ts
for chat_attachment_uploaded and chat_attachment_degraded so source is required,
or alternatively rename these events to use the wa_ prefix if they are truly
web-only; make sure the emitting call sites match the chosen contract,
especially the new upload route, and keep the schema aligned with the intended
event origin.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 79aa38e4-8079-483a-9868-4da770dbc165

📥 Commits

Reviewing files that changed from the base of the PR and between cb8181d and ac42b73.

📒 Files selected for processing (31)
  • docs/docs/configuration/environment-variables.mdx
  • packages/backend/src/attachmentPruner.ts
  • packages/backend/src/index.ts
  • packages/db/prisma/migrations/20260627000032_add_chat_attachments/migration.sql
  • packages/db/prisma/schema.prisma
  • packages/shared/src/env.server.ts
  • packages/web/src/app/api/(server)/ee/chat/[chatId]/attachments/[attachmentId]/route.ts
  • packages/web/src/app/api/(server)/ee/chat/attachments/route.ts
  • packages/web/src/app/api/(server)/ee/chat/route.ts
  • packages/web/src/ee/features/chat/agent.ts
  • packages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsx
  • packages/web/src/ee/features/chat/components/chatThread/messageAttachments.tsx
  • packages/web/src/features/chat/actions.ts
  • packages/web/src/features/chat/attachmentUtils.ts
  • packages/web/src/features/chat/attachments/attachmentPreviewCache.ts
  • packages/web/src/features/chat/attachments/filename.ts
  • packages/web/src/features/chat/attachments/storage.ts
  • packages/web/src/features/chat/attachments/validation.ts
  • packages/web/src/features/chat/components/chatBox/attachmentButton.tsx
  • packages/web/src/features/chat/components/chatBox/attachmentTray.tsx
  • packages/web/src/features/chat/components/chatBox/attachmentViewerDialog.tsx
  • packages/web/src/features/chat/components/chatBox/chatBox.tsx
  • packages/web/src/features/chat/components/chatBox/chatPaneDropzone.tsx
  • packages/web/src/features/chat/constants.ts
  • packages/web/src/features/chat/modelCapabilities.server.test.ts
  • packages/web/src/features/chat/modelCapabilities.server.ts
  • packages/web/src/features/chat/modelsDevCatalog.server.ts
  • packages/web/src/features/chat/types.ts
  • packages/web/src/features/chat/utils.server.ts
  • packages/web/src/features/chat/utils.ts
  • packages/web/src/lib/posthogEvents.ts

Comment thread packages/backend/src/attachmentPruner.ts Outdated
Comment thread packages/backend/src/attachmentPruner.ts Outdated
Comment thread packages/db/prisma/migrations/20260627000032_add_chat_attachments/migration.sql Outdated
Comment thread packages/web/src/app/api/(server)/ee/chat/attachments/route.ts
Comment thread packages/web/src/features/chat/constants.ts Outdated
Comment thread packages/web/src/features/chat/modelsDevCatalog.server.ts
Comment thread packages/web/src/features/chat/utils.server.ts Outdated
Comment thread packages/web/src/features/chat/utils.server.ts Outdated
Comment thread packages/web/src/lib/posthogEvents.ts
@whoisthey whoisthey marked this pull request as ready for review June 29, 2026 16:47
@github-actions

This comment has been minimized.

Comment thread packages/web/src/ee/features/chat/agent.ts Outdated
jsourcebot
jsourcebot previously approved these changes Jun 30, 2026
@whoisthey whoisthey changed the base branch from whoisthey/text-file-attachments to main June 30, 2026 01:02
@whoisthey whoisthey dismissed jsourcebot’s stale review June 30, 2026 01:02

The base branch was changed.

Squashed onto main after PR #1374 (text file attachments) was squash-merged,
which orphaned the stacked text-attachment commits this branch carried.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@whoisthey whoisthey force-pushed the whoisthey/binary-file-attachments branch from 04fdb14 to b30cea1 Compare June 30, 2026 01:10
@mintlify

mintlify Bot commented Jun 30, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
sourcebot 🟢 Ready View Preview Jun 30, 2026, 1:11 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/web/src/app/api/(server)/ee/chat/route.ts (1)

30-31: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Validate and budget all user turns, not only the latest.

messages is still client-supplied z.any()[], but createMessageStream folds every user turn into the prompt. A caller can place an oversized text attachment in an earlier user message and keep the latest message under the limit. Parse message shape before helper calls and reject any user message over ATTACHMENT_MAX_TURN_TEXT_BYTES.

Suggested direction
 const chatRequestSchema = z.object({
-    messages: z.array(z.any()),
+    messages: z.array(z.object({
+        role: z.string(),
+        parts: z.array(z.any()),
+    }).passthrough()).min(1),
     id: z.string(),
     ...additionalChatRequestParamsSchema.shape,
 })
-            if (
-                latestMessage.role === 'user' &&
-                getMessageTextBytes(latestMessage) > ATTACHMENT_MAX_TURN_TEXT_BYTES
-            ) {
+            const hasOversizedUserMessage = messages.some((message) =>
+                message.role === 'user' &&
+                getMessageTextBytes(message) > ATTACHMENT_MAX_TURN_TEXT_BYTES
+            );
+            if (hasOversizedUserMessage) {

As per coding guidelines, route handlers should validate inputs using Zod schemas for request bodies in POST/PUT/PATCH requests.

Also applies to: 92-98

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/app/api/`(server)/ee/chat/route.ts around lines 30 - 31, The
chat route currently accepts a client-controlled messages array with z.any(), so
earlier user turns can bypass the turn text budget even if the latest message is
small. Update chatRequestSchema and the route handler in chat/route.ts to parse
each message’s shape before calling createMessageStream, then validate every
user message against ATTACHMENT_MAX_TURN_TEXT_BYTES and reject the request if
any user turn exceeds it. Use the existing request-body Zod validation path to
enforce this across all messages, including the logic around the
createMessageStream call.

Source: Coding guidelines

♻️ Duplicate comments (1)
packages/backend/src/attachmentPruner.ts (1)

69-87: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Delete the DB row before unlinking bytes.

The stale-batch race is still present for the blob: line 71 deletes bytes before line 81 re-checks PENDING. If a send commits after findMany, deleteMany will skip the row, but the now-committed attachment’s file is already gone.

Proposed fix
-            await Promise.all(batch.map(async (attachment) => {
-                try {
-                    await this.storage.delete(attachment.storageKey);
-                } catch (error) {
-                    logger.warn(`Failed to delete bytes for orphaned attachment ${attachment.id}: ${error}`);
-                }
-            }));
-
-            // Re-assert the orphan criteria in the delete itself: a concurrent
-            // send could have committed (PENDING -> COMMITTED + linked) a row in
-            // this batch after the findMany, and deleting by bare id would
-            // cascade that live link away.
-            const result = await this.db.attachment.deleteMany({
-                where: {
-                    id: { in: batch.map((attachment) => attachment.id) },
-                    status: AttachmentStatus.PENDING,
-                    createdAt: { lt: cutoff },
-                },
-            });
-            totalDeleted += result.count;
+            const deletedCounts = await Promise.all(batch.map(async (attachment) => {
+                const result = await this.db.attachment.deleteMany({
+                    where: {
+                        id: attachment.id,
+                        status: AttachmentStatus.PENDING,
+                        createdAt: { lt: cutoff },
+                    },
+                });
+
+                if (result.count === 0) {
+                    return 0;
+                }
+
+                try {
+                    await this.storage.delete(attachment.storageKey);
+                } catch (error) {
+                    logger.warn(`Failed to delete bytes for orphaned attachment ${attachment.id}: ${error}`);
+                }
+
+                return result.count;
+            }));
+            totalDeleted += deletedCounts.reduce((sum, count) => sum + count, 0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/backend/src/attachmentPruner.ts` around lines 69 - 87, The orphan
cleanup in attachmentPruner’s batch delete still unlinks storage before
confirming the row is still deletable, leaving a stale-batch race. Rework the
flow in the batch loop and the subsequent attachment.deleteMany call so the
database row is removed/re-checked first, and only delete the blob via
this.storage.delete after the row has been confirmed deleted or otherwise safely
marked orphaned. Keep the existing AttachmentStatus.PENDING and createdAt cutoff
guard in place, and use the attachment.id/storageKey values to drive the
post-delete byte cleanup.
🧹 Nitpick comments (1)
packages/web/src/ee/features/chat/agent.test.ts (1)

34-40: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the storage backend mock stable.

Returning a new object from getStorageBackend() makes it hard to configure get or assert storage reads in attachment tests. Prefer a shared mock object returned by the factory.

Suggested refactor
+const mockStorageBackend = {
+    get: vi.fn(),
+    put: vi.fn(),
+    stat: vi.fn(),
+    createReadStream: vi.fn(),
+    delete: vi.fn(),
+};
+
 // inside vi.mock("`@sourcebot/shared`", ...)
-    getStorageBackend: () => ({
-        get: vi.fn(),
-        put: vi.fn(),
-        stat: vi.fn(),
-        createReadStream: vi.fn(),
-        delete: vi.fn(),
-    }),
+    getStorageBackend: () => mockStorageBackend,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/ee/features/chat/agent.test.ts` around lines 34 - 40, Make
the storage backend mock stable by returning the same shared mock object from
getStorageBackend() instead of creating a new object each time. Update the test
setup in agent.test.ts so the backend methods (especially get) can be configured
and asserted consistently across attachment tests, while keeping the existing
method stubs on the shared mock.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/web/src/ee/features/chat/agent.ts`:
- Around line 90-112: The attachment loading in getMediaBlobs/acceptedBlobs is
only bounded per message, so long chats can still accumulate too many media
bytes before the model call. Add a global prompt-level cap (byte and/or total
attachment count) in agent.ts before querying Prisma and calling storage.get,
and enforce bounded concurrency instead of unbounded Promise.all when populating
the result map. Use the acceptedBlobs, records, and storage.get flow as the
place to apply the limit so no more than the configured aggregate media budget
is loaded.

---

Outside diff comments:
In `@packages/web/src/app/api/`(server)/ee/chat/route.ts:
- Around line 30-31: The chat route currently accepts a client-controlled
messages array with z.any(), so earlier user turns can bypass the turn text
budget even if the latest message is small. Update chatRequestSchema and the
route handler in chat/route.ts to parse each message’s shape before calling
createMessageStream, then validate every user message against
ATTACHMENT_MAX_TURN_TEXT_BYTES and reject the request if any user turn exceeds
it. Use the existing request-body Zod validation path to enforce this across all
messages, including the logic around the createMessageStream call.

---

Duplicate comments:
In `@packages/backend/src/attachmentPruner.ts`:
- Around line 69-87: The orphan cleanup in attachmentPruner’s batch delete still
unlinks storage before confirming the row is still deletable, leaving a
stale-batch race. Rework the flow in the batch loop and the subsequent
attachment.deleteMany call so the database row is removed/re-checked first, and
only delete the blob via this.storage.delete after the row has been confirmed
deleted or otherwise safely marked orphaned. Keep the existing
AttachmentStatus.PENDING and createdAt cutoff guard in place, and use the
attachment.id/storageKey values to drive the post-delete byte cleanup.

---

Nitpick comments:
In `@packages/web/src/ee/features/chat/agent.test.ts`:
- Around line 34-40: Make the storage backend mock stable by returning the same
shared mock object from getStorageBackend() instead of creating a new object
each time. Update the test setup in agent.test.ts so the backend methods
(especially get) can be configured and asserted consistently across attachment
tests, while keeping the existing method stubs on the shared mock.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 10fe4848-13b9-4cc8-8e37-578497a8ea97

📥 Commits

Reviewing files that changed from the base of the PR and between ac42b73 and b30cea1.

📒 Files selected for processing (42)
  • CHANGELOG.md
  • docs/docs/configuration/environment-variables.mdx
  • packages/backend/src/attachmentPruner.ts
  • packages/backend/src/index.ts
  • packages/db/prisma/migrations/20260627000032_add_chat_attachments/migration.sql
  • packages/db/prisma/schema.prisma
  • packages/shared/src/env.server.ts
  • packages/shared/src/index.server.ts
  • packages/shared/src/storage.ts
  • packages/web/src/app/(app)/askgh/[owner]/[repo]/components/landingPage.tsx
  • packages/web/src/app/(app)/askgh/[owner]/[repo]/page.tsx
  • packages/web/src/app/(app)/chat/[id]/page.tsx
  • packages/web/src/app/(app)/chat/chatLandingPage.tsx
  • packages/web/src/app/(app)/chat/components/landingPageChatBox.tsx
  • packages/web/src/app/api/(server)/ee/chat/[chatId]/attachments/[attachmentId]/route.ts
  • packages/web/src/app/api/(server)/ee/chat/attachments/route.ts
  • packages/web/src/app/api/(server)/ee/chat/route.ts
  • packages/web/src/ee/features/chat/agent.test.ts
  • packages/web/src/ee/features/chat/agent.ts
  • packages/web/src/ee/features/chat/components/chatThread/chatThread.tsx
  • packages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsx
  • packages/web/src/ee/features/chat/components/chatThread/messageAttachments.tsx
  • packages/web/src/ee/features/chat/components/chatThreadPanel.test.tsx
  • packages/web/src/ee/features/chat/components/chatThreadPanel.tsx
  • packages/web/src/features/chat/actions.ts
  • packages/web/src/features/chat/attachmentUtils.ts
  • packages/web/src/features/chat/attachments/filename.ts
  • packages/web/src/features/chat/attachments/modality.ts
  • packages/web/src/features/chat/attachments/validation.ts
  • packages/web/src/features/chat/components/chatBox/attachmentButton.tsx
  • packages/web/src/features/chat/components/chatBox/attachmentTray.tsx
  • packages/web/src/features/chat/components/chatBox/attachmentViewerDialog.tsx
  • packages/web/src/features/chat/components/chatBox/chatBox.tsx
  • packages/web/src/features/chat/components/chatBox/chatPaneDropzone.tsx
  • packages/web/src/features/chat/constants.ts
  • packages/web/src/features/chat/modelCapabilities.server.test.ts
  • packages/web/src/features/chat/modelCapabilities.server.ts
  • packages/web/src/features/chat/modelsDevCatalog.server.ts
  • packages/web/src/features/chat/types.ts
  • packages/web/src/features/chat/utils.server.ts
  • packages/web/src/features/chat/utils.ts
  • packages/web/src/lib/posthogEvents.ts
✅ Files skipped from review due to trivial changes (4)
  • packages/web/src/app/(app)/chat/chatLandingPage.tsx
  • docs/docs/configuration/environment-variables.mdx
  • packages/web/src/features/chat/components/chatBox/chatPaneDropzone.tsx
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (18)
  • packages/web/src/features/chat/modelCapabilities.server.ts
  • packages/web/src/features/chat/attachments/filename.ts
  • packages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsx
  • packages/shared/src/env.server.ts
  • packages/web/src/features/chat/types.ts
  • packages/web/src/features/chat/components/chatBox/attachmentButton.tsx
  • packages/web/src/lib/posthogEvents.ts
  • packages/db/prisma/migrations/20260627000032_add_chat_attachments/migration.sql
  • packages/web/src/features/chat/components/chatBox/attachmentViewerDialog.tsx
  • packages/backend/src/index.ts
  • packages/web/src/features/chat/modelCapabilities.server.test.ts
  • packages/web/src/features/chat/utils.server.ts
  • packages/web/src/features/chat/components/chatBox/attachmentTray.tsx
  • packages/web/src/features/chat/modelsDevCatalog.server.ts
  • packages/web/src/app/api/(server)/ee/chat/attachments/route.ts
  • packages/web/src/features/chat/actions.ts
  • packages/web/src/features/chat/attachmentUtils.ts
  • packages/db/prisma/schema.prisma

Comment thread packages/web/src/ee/features/chat/agent.ts
whoisthey and others added 3 commits June 29, 2026 18:44
…rphans

The orphan sweep deleted blob bytes before the guarded row delete, so a
PENDING attachment committed by a concurrent send (PENDING -> COMMITTED +
linked) between the findMany and the byte delete kept its DB row and link
but lost its bytes — a permanently broken attachment.

Reorder to delete the row first (re-asserting the orphan criteria), then
delete bytes only for batch rows that no longer exist, i.e. the rows the
sweep actually removed. A deleted row can never reappear and a survivor is
never deleted by the loop, so the check cannot misclassify.

Also add a COMMITTED-with-zero-links sweep as a backstop for an interrupted
web-app chat-delete sweep, which would otherwise leak those blobs forever
(the pruner previously only touched PENDING rows).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The add_chat_attachments migration (20260627000032) predated
add_oauth_dpop_binding (20260629193000), which merged to main after this
branch was cut, tripping the CI migration-ordering check. The two
migrations are independent (dpop touches none of the attachment tables;
the attachment migration only references the long-existing Org/User/Chat
tables), so resequencing it to run last is safe. Renamed to
20260629200000_add_chat_attachments.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Deletion paths previously removed the DB row, then deleted bytes best-effort.
Since the row is the only durable handle to the bytes, a failed/interrupted
byte delete after the row was gone leaked those bytes with no way to ever
find them again.

Add a DELETING tombstone state. All reclamation now: (1) atomically flips the
orphan to DELETING — the claim doubles as the concurrency guard, replacing the
survivor-recheck — (2) deletes the bytes, (3) removes the row only once the
bytes are confirmed gone. A failed byte delete leaves the row DELETING for the
pruner's reclaim sweep to retry, so a transient storage error can never orphan
bytes.

- schema: add AttachmentStatus.DELETING (+ migration)
- deleteOrphanedAttachments: claim -> DELETING, inline best-effort byte delete,
  remove only reclaimed rows; the rest fall through to the pruner
- pruner: condemn PENDING + zero-link COMMITTED orphans to DELETING, then a
  single reclaim sweep deletes bytes and rows for all tombstones (also picking
  up tombstones the web app left behind)

This unifies byte deletion into one retryable place and matters most ahead of a
remote (S3) storage driver, where delete failures are routine.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/backend/src/attachmentPruner.ts`:
- Around line 115-119: The attachment pruning loop in attachmentPruner.ts keeps
appending every failed delete to failedIds and reusing it in the
attachment.findMany notIn filter, which can grow without bound when storage is
down. Update the pruning logic in the prune routine to cap the number of failed
deletions handled per run, stop the loop once that limit is reached, and let the
next scheduled run retry the remaining tombstones. Make sure the bound is
applied consistently across the batch-processing path and the failure handling
around failedIds so the query size stays limited.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: eacd9c6b-92a6-4269-9031-918cc1b3e9a2

📥 Commits

Reviewing files that changed from the base of the PR and between 74791b3 and 6b6b5ed.

📒 Files selected for processing (4)
  • packages/backend/src/attachmentPruner.ts
  • packages/db/prisma/migrations/20260629210000_add_attachment_deleting_status/migration.sql
  • packages/db/prisma/schema.prisma
  • packages/web/src/features/chat/utils.server.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/db/prisma/schema.prisma
  • packages/web/src/features/chat/utils.server.ts

Comment thread packages/backend/src/attachmentPruner.ts
@whoisthey whoisthey merged commit 216c7d8 into main Jun 30, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants