feat(web): text file attachments for Ask#1374
Conversation
Add an optional `inputModalities` declaration to language model config and expose a resolved capability set to the client. - Schema: add optional `inputModalities` (`text` | `image` | `pdf`) to every provider definition in `schemas/v3/languageModel.json` and regenerate the schema types/snippets. - Add a fail-closed `resolveModelInputModalities` resolver that defaults to text-only when a model does not declare its input modalities. - Expose the resolved `inputModalities` on the client-safe `LanguageModelInfo` (populated via `getConfiguredLanguageModelsInfo` and the MCP ask path). This is groundwork for chat file attachments. It adds no attachment UI and no live provider capability probing yet. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
inputModalities now only enumerates true perceptual channels (text | image | audio | video). Document/container formats like PDF move to a separate fail-closed `supportedDocumentTypes` field, since PDF is not a model modality but a format providers decompose into text/image internally. Co-authored-by: Cursor <cursoragent@cursor.com>
Tighten the inputModalities / supportedDocumentTypes descriptions to remove the implication that omitting supportedDocumentTypes blocks all non-text attachments. Clarify the taxonomy: single-medium files (images, audio, video) and plain-text files (.txt, .md) are governed by inputModalities; supportedDocumentTypes only gates rich compound container formats like PDF. Co-authored-by: Cursor <cursoragent@cursor.com>
LanguageModelInfo now has required inputModalities/supportedDocumentTypes, so a raw LanguageModel config (where those are optional) is no longer assignable to it. getLanguageModelKey only reads provider/model/displayName, so type its parameter as that Pick subset, letting both LanguageModel and LanguageModelInfo be keyed. Fixes the docker build type check. Co-authored-by: Cursor <cursoragent@cursor.com>
Two dev-experience fixes for the stale-build-output footgun: - schemas watch now runs `yarn build` (generate + tsc) instead of generate-only, so editing a schema JSON during `yarn dev` refreshes dist (both the .d.ts types and the runtime index.schema.js used by ajv), not just the generated source. - web tsconfig maps @sourcebot/schemas/v3|v2/* to the package source, so type-checking and the IDE read committed source directly instead of stale built .d.ts. Web only imports .type files (erased at compile), so there is no bundling/runtime impact. Co-authored-by: Cursor <cursoragent@cursor.com>
…ards, wired into user message via xml-like tags similar to system context
…ey/text-file-attachments
….json Re-source language model input-modality / document capabilities from the models.dev catalog instead of hand-declared config.json fields, aligning with the move to de-emphasize on-disk config in favor of automatic resolution (the same catalog already backs context-window resolution). - Revert the inputModalities/supportedDocumentTypes additions to schemas/v3/languageModel.json and all regenerated artifacts; capabilities are no longer declared in config.json. - Extract the shared models.dev catalog plumbing (fetch/TTL/negative-cache/ stale-while-revalidate/provider-id overrides) into modelsDevCatalog.server.ts, now consumed by both context-window and capability resolution. - Add models.dev-backed resolveModelCapabilities (modelCapabilities.server.ts), partitioning the catalog's modalities.input list into Sourcebot's inputModalities (channels) and supportedDocumentTypes (containers); falls back to text-only for uncatalogued / self-hosted models. The client-safe LanguageModelInfo contract is unchanged; only the resolution backend moved. Co-authored-by: Cursor <cursoragent@cursor.com>
…ey/text-file-attachments
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds text attachment support across chat input, message creation, rendering, and agent prompt handling. ChangesChat File Attachment Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
License Audit
Weak Copyleft Packages (informational)
Resolved Packages (21)
|
…ey/text-file-attachments
…ey/text-file-attachments
…er of clear-on-submit
This comment has been minimized.
This comment has been minimized.
|
@whoisthey do you mind adding screenshots of the UX? |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/web/src/features/chat/components/chatBox/chatBox.tsx (1)
270-283: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy liftPersist attachments through the login/upgrade redirect.
Lines 270-282 only save
children, and Line 325 restores the submission with[], so attached files are silently lost on the auth/upsell path.Suggested direction
- JSON.stringify({ pathname, children: editor.children }), + JSON.stringify({ + pathname, + children: editor.children, + attachments: attachments.map(toAttachmentData), + }), ... - const { pathname: storedPathname, children } = JSON.parse(stored) as { pathname: string; children: Descendant[] }; + const { + pathname: storedPathname, + children, + attachments: storedAttachments = [], + } = JSON.parse(stored) as { + pathname: string; + children: Descendant[]; + attachments?: AttachmentData[]; + }; ... - _onSubmit(children, editor, []); + _onSubmit(children, editor, storedAttachments);Also applies to: 324-325
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/features/chat/components/chatBox/chatBox.tsx` around lines 270 - 283, Persist the full pending submission through the auth/upgrade flow in chatBox.tsx, not just editor.children. Update the sessionStorage payload written in the login/upgrade branches of the chat submission path to include attachments (and any other required submission state) alongside pathname and children, then update the restore logic that currently rebuilds the submission with an empty attachments array so it rehydrates from the saved data instead. Use the existing pending submission handling in the chatBox component and the sessionStorage key to keep the login/upgrade redirect path lossless.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/web/src/app/`(app)/askgh/[owner]/[repo]/components/landingPage.tsx:
- Around line 72-81: Attachments are being dropped from the pending submission
flow on the AskGH landing page, so unauthenticated users lose staged files after
passing the login wall. Update the gated submit path in landingPage.tsx where
ChatBox/onSubmit calls createNewChatThread, and ensure the pending-submission
serializer/restorer includes attachments alongside pathname and children so
files survive auth resume. Use the existing ChatBox, createNewChatThread, and
pending-submission handling symbols to wire attachments through end-to-end.
In `@packages/web/src/features/chat/attachmentUtils.ts`:
- Around line 196-237: The attachment limit check in readFilesAsAttachments is
using only the stale existingCount snapshot, which allows overlapping add flows
to bypass ATTACHMENT_MAX_COUNT. Update the add flow in chatBox.tsx’s onAddFiles
and the helper in attachmentUtils.ts so the limit is enforced against the latest
attachment state at commit time, not just at read time; use the current
attachments length when appending and recheck before merging added files. Keep
the logic tied to readFilesAsAttachments, ATTACHMENT_MAX_COUNT, and
setAttachments so concurrent drops/selections cannot overshoot the cap.
- Around line 33-37: The filename truncation logic in attachmentUtils’s
extension-preserving helper can still exceed ATTACHMENT_MAX_FILENAME_LENGTH when
the extension is long. Update the truncation in this helper so the final result,
including the ellipsis and extension, is always capped at the limit by reducing
or omitting the extension as needed before returning the string. Use the
existing ATTACHMENT_MAX_FILENAME_LENGTH, cleaned, dotIndex, and stem logic to
locate the fix.
In `@packages/web/src/features/chat/components/chatBox/chatBox.tsx`:
- Around line 141-149: The attachment quota check in onAddFiles uses a stale
render-time attachments.length and can be bypassed by overlapping addFiles()
calls. Update chatBox.tsx so readFilesAsAttachments and the subsequent
setAttachments append are both clamped against the latest attachment count,
ideally by computing the remaining slots from the current state inside
setAttachments or by passing a fresh count derived from prev rather than the
closed-over attachments value.
- Around line 464-468: The AttachmentTray in chatBox should not expose remove
actions while rendering submittedAttachments during redirect, because
removeAttachment only updates the pending attachments state. Update the chatBox
conditional around AttachmentTray so the onRemove handler is omitted or disabled
when isRedirecting is true, using the isRedirecting, submittedAttachments, and
removeAttachment symbols to keep the tray controls consistent with the active
state.
In `@packages/web/src/features/chat/utils.ts`:
- Around line 430-441: Escape or encode the attachment body in the `blocks`
mapping inside `utils.ts` before interpolating `attachment.text` into the
`<attachment>` wrapper. The current `text` value can contain tag-closing
sequences like `</attachment>` and break the prompt structure, so update the
`attachment.text` handling in this block to use a safe structured encoding
rather than raw insertion, while keeping the existing filename sanitization in
place.
- Around line 431-433: The attachment truncation in the text handling logic is
using string slicing by UTF-16 code units instead of enforcing the byte cap, so
update the `maxBytesPerAttachment` branch in `utils.ts` to truncate by UTF-8
bytes. Use `TextEncoder` or equivalent byte counting in the attachment text path
so the `attachment.text` value stays within `ATTACHMENT_MAX_TEXT_BYTES` even for
non-ASCII content, and keep the change localized to the truncation logic around
`text`.
---
Outside diff comments:
In `@packages/web/src/features/chat/components/chatBox/chatBox.tsx`:
- Around line 270-283: Persist the full pending submission through the
auth/upgrade flow in chatBox.tsx, not just editor.children. Update the
sessionStorage payload written in the login/upgrade branches of the chat
submission path to include attachments (and any other required submission state)
alongside pathname and children, then update the restore logic that currently
rebuilds the submission with an empty attachments array so it rehydrates from
the saved data instead. Use the existing pending submission handling in the
chatBox component and the sessionStorage key to keep the login/upgrade redirect
path lossless.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b37bc956-4494-43c6-8cae-f7448b529a9b
⛔ Files ignored due to path filters (1)
yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (21)
packages/web/package.jsonpackages/web/src/app/(app)/askgh/[owner]/[repo]/components/landingPage.tsxpackages/web/src/app/(app)/chat/chatLandingPage.tsxpackages/web/src/app/(app)/chat/components/chatLandingDropzone.tsxpackages/web/src/app/(app)/chat/components/landingPageChatBox.tsxpackages/web/src/ee/features/chat/agent.tspackages/web/src/ee/features/chat/components/chatThread/chatThread.tsxpackages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsxpackages/web/src/ee/features/chat/components/chatThread/detailsCard.tsxpackages/web/src/ee/features/chat/components/chatThread/messageAttachments.tsxpackages/web/src/features/chat/attachmentUtils.tspackages/web/src/features/chat/components/chatBox/attachmentButton.tsxpackages/web/src/features/chat/components/chatBox/attachmentTray.tsxpackages/web/src/features/chat/components/chatBox/attachmentViewerDialog.tsxpackages/web/src/features/chat/components/chatBox/chatBox.tsxpackages/web/src/features/chat/components/chatBox/chatPaneDropzone.tsxpackages/web/src/features/chat/components/chatBox/index.tspackages/web/src/features/chat/constants.tspackages/web/src/features/chat/types.tspackages/web/src/features/chat/useCreateNewChatThread.tspackages/web/src/features/chat/utils.ts
Added two vids showing flows (drag and drop, paste file/text, paste inline, and click-to-attach) |
…ge cap of around 60-80k tokens
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/web/src/features/chat/attachmentUtils.ts (1)
183-190: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winBudget attachments from decoded text, not raw file bytes.
Line 189 stores
file.size, butpackages/web/src/features/chat/components/chatBox/chatBox.tsxenforces the turn limit throughgetSubmittedTextBytes(), which is documented as UTF-8 prompt bytes. That makes file attachments use on-disk bytes while pasted attachments use text bytes, so UTF-16/BOM text files can be rejected or admitted incorrectly.Suggested fix
try { const text = await readAsText(file); + const sizeBytes = new TextEncoder().encode(text).length; attachments.push({ id: uuidv4(), kind: 'text', filename: sanitizeFilename(file.name), mediaType: file.type || 'text/plain', - sizeBytes: file.size, + sizeBytes, text, }); } catch {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/web/src/features/chat/attachmentUtils.ts` around lines 183 - 190, The attachment size accounting in attachmentUtils should use the decoded text bytes instead of the raw file.size value so it matches the UTF-8 prompt budgeting used by chatBox and getSubmittedTextBytes(). Update the text attachment path in attachmentUtils to derive sizeBytes from the text content (after readAsText/file decoding) and keep the existing filename/mediaType handling unchanged, so text files are accepted or rejected consistently with pasted text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@packages/web/src/features/chat/attachmentUtils.ts`:
- Around line 183-190: The attachment size accounting in attachmentUtils should
use the decoded text bytes instead of the raw file.size value so it matches the
UTF-8 prompt budgeting used by chatBox and getSubmittedTextBytes(). Update the
text attachment path in attachmentUtils to derive sizeBytes from the text
content (after readAsText/file decoding) and keep the existing
filename/mediaType handling unchanged, so text files are accepted or rejected
consistently with pasted text.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 41b6f302-c9a3-46e5-a1d8-ff6f003848f1
📒 Files selected for processing (6)
packages/web/src/ee/features/chat/agent.tspackages/web/src/features/chat/attachmentUtils.tspackages/web/src/features/chat/components/chatBox/attachmentTray.tsxpackages/web/src/features/chat/components/chatBox/chatBox.tsxpackages/web/src/features/chat/constants.tspackages/web/src/features/chat/utils.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/web/src/features/chat/components/chatBox/attachmentTray.tsx
- packages/web/src/ee/features/chat/agent.ts
- packages/web/src/features/chat/components/chatBox/chatBox.tsx
Carry the pending attachment's client id through to the persisted message instead of stripping it. This gives every text attachment a durable handle from the moment the feature ships, so later attachment-referencing work has no backwards-compatibility gap for attachments created before the field existed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
27b33e6
Squashed onto main after PR #1374 (text file attachments) was squash-merged, which orphaned the stacked text-attachment commits this branch carried. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* feat(web): binary file attachments for Ask Squashed onto main after PR #1374 (text file attachments) was squash-merged, which orphaned the stacked text-attachment commits this branch carried. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(worker): make attachment pruner byte-safe and reclaim committed orphans The orphan sweep deleted blob bytes before the guarded row delete, so a PENDING attachment committed by a concurrent send (PENDING -> COMMITTED + linked) between the findMany and the byte delete kept its DB row and link but lost its bytes — a permanently broken attachment. Reorder to delete the row first (re-asserting the orphan criteria), then delete bytes only for batch rows that no longer exist, i.e. the rows the sweep actually removed. A deleted row can never reappear and a survivor is never deleted by the loop, so the check cannot misclassify. Also add a COMMITTED-with-zero-links sweep as a backstop for an interrupted web-app chat-delete sweep, which would otherwise leak those blobs forever (the pruner previously only touched PENDING rows). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(db): resequence chat attachments migration after main's latest The add_chat_attachments migration (20260627000032) predated add_oauth_dpop_binding (20260629193000), which merged to main after this branch was cut, tripping the CI migration-ordering check. The two migrations are independent (dpop touches none of the attachment tables; the attachment migration only references the long-existing Org/User/Chat tables), so resequencing it to run last is safe. Renamed to 20260629200000_add_chat_attachments. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(worker): tombstone protocol for attachment byte reclamation Deletion paths previously removed the DB row, then deleted bytes best-effort. Since the row is the only durable handle to the bytes, a failed/interrupted byte delete after the row was gone leaked those bytes with no way to ever find them again. Add a DELETING tombstone state. All reclamation now: (1) atomically flips the orphan to DELETING — the claim doubles as the concurrency guard, replacing the survivor-recheck — (2) deletes the bytes, (3) removes the row only once the bytes are confirmed gone. A failed byte delete leaves the row DELETING for the pruner's reclaim sweep to retry, so a transient storage error can never orphan bytes. - schema: add AttachmentStatus.DELETING (+ migration) - deleteOrphanedAttachments: claim -> DELETING, inline best-effort byte delete, remove only reclaimed rows; the rest fall through to the pruner - pruner: condemn PENDING + zero-link COMMITTED orphans to DELETING, then a single reclaim sweep deletes bytes and rows for all tombstones (also picking up tombstones the web app left behind) This unifies byte deletion into one retryable place and matters most ahead of a remote (S3) storage driver, where delete failures are routine. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Adds text file attachments to Ask. Users can attach text/code/config files to a
chat message and the contents are folded into that turn's prompt.
Ways to attach
Large-paste auto-conversion
pasted.txtattachment instead of being inserted inline, with a toast hinting that⌘⇧V / Ctrl+Shift+V pastes inline.
Attachment UX
Plumbing & limits
data-attachmentparts and are re-emittedper turn into the content as an
<attachments>block, keeping them bound to the turn.react-dropzone.Potential Followup
Screen.Recording.2026-06-27.at.2.29.59.PM.mov
Screen.Recording.2026-06-27.at.2.32.59.PM.mov
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Bug Fixes