Skip to content

Prevent base64 preview tool results from re-entering AI chat context #141

@Sportinger

Description

@Sportinger

Context

Follow-up to #140.

Issue #140 described a token blow-up caused by base64 video/preview image payloads entering model context. The exact implementation described there does not match the current MasterSelects checkout: the referenced workers/agent/src/* files, generateText/streamText, AI SDK toModelOutput, and scan-video handler are not present on staging or the current remote branches.

However, the underlying risk is real in the current app.

Verified Current-Code Risk

Current preview tools return base64 PNG data URLs:

  • src/services/aiTools/handlers/preview.ts
    • captureFrame returns dataUrl
    • getCutPreviewQuad delegates to frame-grid capture
    • getFramesAtTimes delegates to frame-grid capture
  • src/services/aiTools/utils.ts
    • captureFrameGrid() serializes a PNG grid via gridCanvas.toDataURL('image/png')

The immediate tool follow-up path is partially protected:

  • src/components/panels/AIChatPanel.tsx
    • formatToolResultForApi() truncates tool results before sending them back to the model in the same loop
    • current caps: MAX_TOOL_RESULT_MESSAGE_CHARS = 12000, MAX_TOOL_RESULT_STRING_CHARS = 1200

But the full tool result is still stored in chat state for UI/history:

content: JSON.stringify(result, null, 2)

Later API-message rebuilding can include old tool messages as-is:

content: msg.content

So a full base64 dataUrl from captureFrame, getCutPreviewQuad, or getFramesAtTimes can re-enter the model context on a later user turn through conversation history, even if the immediate tool-result follow-up was truncated.

Why This Matters

A single preview/grid image can be hundreds of KB to multiple MB as base64 text. If preserved in chat history and sent to a text model as a tool message, it can cause:

  • excessive prompt tokens
  • model/API context overflows
  • avoidable hosted-AI cost
  • degraded editor-agent reliability after visual tool usage

Proposed Fix

Separate UI-visible tool results from model-visible tool results.

Suggested direction:

  1. Keep full image data only in UI/local state where needed for preview display.
  2. Store model-visible tool messages in sanitized form immediately, not just at send time.
  3. Replace base64 dataUrl fields with metadata/handles, for example:
{
  "success": true,
  "data": {
    "width": 1280,
    "height": 360,
    "frameCount": 8,
    "gridSize": "4x2",
    "image": "[preview image omitted from text context]"
  }
}
  1. If visual model input is needed later, add an explicit image-part path rather than serializing base64 as text.
  2. Add regression tests around formatToolResultForApi() / chat-history serialization so data:image/...;base64,... cannot appear in outgoing messages.

Acceptance Criteria

  • Outgoing OpenAI/hosted chat messages never contain raw data:image/...;base64, strings from tool history.
  • Preview tools still show captured images in the UI where intended.
  • Same-turn tool follow-up remains concise and useful.
  • A regression test covers a prior captureFrame or getCutPreviewQuad result in chat history being rebuilt into API messages.

Notes

This is related to #140's base64-token issue, but the correct fix for the current codebase is not AI SDK toModelOutput; it is sanitized chat-history/model-context serialization for preview tool results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions