Skip to content

Improve context awareness and image handling for multimodal understanding#69

Merged
quiet-node merged 1 commit intomainfrom
worktree-ancient-fluttering-river
Apr 8, 2026
Merged

Improve context awareness and image handling for multimodal understanding#69
quiet-node merged 1 commit intomainfrom
worktree-ancient-fluttering-river

Conversation

@quiet-node
Copy link
Copy Markdown
Owner

Summary

This PR significantly enhances Thuki's ability to understand and reason across multiple context signals (highlighted text, images, user messages) simultaneously, making it smarter at inferring intent and reducing the need for clarification.

Changes

1. Preserve images in conversation history

  • Images are no longer stripped from persisted conversation history
  • Follow-up questions can now reference earlier screenshots without them being lost
  • Full multimodal context available for every response

2. Structured message format

  • Replaced ambiguous Context: "..." format with explicit [Highlighted Text] and [Request] labels
  • Helps smaller models like Gemma clearly parse boundaries between subject, context, and intent
  • Cleaner separation for multimodal reasoning

3. Enhanced system prompt

Three major additions:

  • Multi-signal reasoning: Teaches the model that highlighted text = primary subject, images = supporting context, message = user intent. When all three are present, synthesize them holistically.

  • Conversational pattern recognition: When users establish a structural pattern (e.g., "What is the population of the US?" then "What about Vietnam?"), recognize and apply the pattern directly instead of asking for clarification.

  • Intelligent multi-image handling: When users reference "the previous image" in a conversation with multiple images, don't assume the immediately preceding one. If it's blank/irrelevant, scan history to find the most recent visually relevant image for comparison.

Impact

  • Users can now send follow-up questions about images without re-attaching them
  • Fewer clarification requests; more direct answers
  • Better handling of quoted text + images together (Thuki's core feature)
  • More natural conversational flow with pattern inference
  • Smarter image comparisons in multi-image conversations

Testing

  • All 472 frontend tests pass
  • All 149 backend tests pass
  • Lint, format, typecheck, and build validation: zero warnings, zero errors

Generated with Claude Code

…dal understanding

Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses.

Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly.

Enhance system prompt with three major improvements:

1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically.

2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification.

3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison.

These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts.

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
@quiet-node quiet-node merged commit 7f64352 into main Apr 8, 2026
3 checks passed
@quiet-node quiet-node deleted the worktree-ancient-fluttering-river branch April 8, 2026 20:53
quiet-node added a commit that referenced this pull request Apr 10, 2026
…ding (#69)

feat: improve context awareness and image handling for better multimodal understanding

Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses.

Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly.

Enhance system prompt with three major improvements:

1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically.

2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification.

3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison.

These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts.

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
quiet-node added a commit that referenced this pull request Apr 10, 2026
…ding (#69)

feat: improve context awareness and image handling for better multimodal understanding

Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses.

Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly.

Enhance system prompt with three major improvements:

1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically.

2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification.

3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison.

These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts.

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
quiet-node added a commit that referenced this pull request Apr 11, 2026
…ding (#69)

feat: improve context awareness and image handling for better multimodal understanding

Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses.

Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly.

Enhance system prompt with three major improvements:

1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically.

2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification.

3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison.

These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts.

Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant