Skip to content

feat: add end-to-end multimodal image turn flow with view_image fallback and persistence redaction#80

Merged
GCWing merged 2 commits intoGCWing:mainfrom
wgqqqqq:feature/layout-redesign
Mar 6, 2026
Merged

feat: add end-to-end multimodal image turn flow with view_image fallback and persistence redaction#80
GCWing merged 2 commits intoGCWing:mainfrom
wgqqqqq:feature/layout-redesign

Conversation

@wgqqqqq
Copy link
Collaborator

@wgqqqqq wgqqqqq commented Mar 6, 2026

Summary

This PR upgrades the image handling pipeline across the agentic chat flow.

It replaces the old image-analysis tool path with view_image, enables direct
image attachment for multimodal-capable models, keeps a vision pre-analysis
fallback for text-only models, and redacts inline image payloads from
persisted session data.

What Changed

  • replaced the legacy image analysis tool with view_image
  • extracted shared image-processing and vision-model resolution utilities
  • added multimodal user message support in the core message model
  • enabled direct image turn input when the selected model supports image
    understanding
  • kept pre-analysis as a fallback path for text-only primary models
  • added temporary uploaded image context resolution for clipboard/image cache
    flows
  • sanitized persisted messages and context snapshots to remove stored
    data_url payloads
  • extended AI connection testing to verify multimodal image-input capability

Details

Tooling and image pipeline

  • migrated from the old analyze_image tool implementation to view_image
  • centralized image loading, path resolution, mime detection, optimization,
    and vision-model lookup
  • updated tool registration and related frontend tool-card wiring

Multimodal turn support

  • added imageContexts support to the dialog-turn API
  • allowed the frontend sender to choose between:
    • direct attach for multimodal-capable models
    • vision pre-analysis for text-only models
  • uploaded clipboard images into a temporary cache so turn requests can
    resolve missing image payloads safely

Execution and model integration

  • added multimodal user message construction in core messaging
  • updated execution flow and tool pipeline state to carry image context
    through the round lifecycle
  • added image-aware token estimation and client-side multimodal request
    support

Persistence and safety

  • redacted base64/data URL image payloads before writing messages or turn
    context snapshots to disk
  • preserved useful image metadata for replay/debugging without storing the raw
    image content

cc @wsp1911

@wgqqqqq wgqqqqq requested a review from wsp1911 March 6, 2026 09:50
@GCWing GCWing merged commit a534de4 into GCWing:main Mar 6, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants