Improve context awareness and image handling for multimodal understanding#69
Merged
quiet-node merged 1 commit intomainfrom Apr 8, 2026
Merged
Conversation
…dal understanding Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses. Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly. Enhance system prompt with three major improvements: 1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically. 2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification. 3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison. These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts. Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
quiet-node
added a commit
that referenced
this pull request
Apr 10, 2026
…ding (#69) feat: improve context awareness and image handling for better multimodal understanding Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses. Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly. Enhance system prompt with three major improvements: 1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically. 2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification. 3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison. These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts. Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
quiet-node
added a commit
that referenced
this pull request
Apr 10, 2026
…ding (#69) feat: improve context awareness and image handling for better multimodal understanding Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses. Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly. Enhance system prompt with three major improvements: 1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically. 2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification. 3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison. These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts. Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
quiet-node
added a commit
that referenced
this pull request
Apr 11, 2026
…ding (#69) feat: improve context awareness and image handling for better multimodal understanding Preserve images in conversation history so follow-up messages can reference earlier screenshots. When users ask follow-up questions about images, Thuki now has access to all previously shared images, enabling richer context for responses. Restructure message format for clarity. When quoted text is present (from highlighted selections), the message is now explicitly labeled as [Highlighted Text] and [Request], replacing the ambiguous 'Context: ' prefix. This helps smaller models like Gemma parse intent boundaries clearly. Enhance system prompt with three major improvements: 1. Multi-signal reasoning: Teach the model that highlighted text is the primary subject, images are supporting context, and the user message is the intent. When all three are present, synthesize them holistically. 2. Conversational pattern recognition: Recognize when users establish a pattern (e.g., 'What is the population of X?' then 'What about Y?') and apply the pattern to the new topic instead of asking for clarification. 3. Intelligent multi-image handling: When users reference 'the previous image' in a conversation with multiple images, don't assume the immediately preceding one. If that image is blank or irrelevant, look back through history to find the most recent visually relevant image for comparison. These changes leverage 100% conversation history context (images are now preserved) to deliver smarter, more context-aware responses that require fewer clarification prompts. Signed-off-by: Logan Nguyen <lg.131.dev@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR significantly enhances Thuki's ability to understand and reason across multiple context signals (highlighted text, images, user messages) simultaneously, making it smarter at inferring intent and reducing the need for clarification.
Changes
1. Preserve images in conversation history
2. Structured message format
Context: "..."format with explicit[Highlighted Text]and[Request]labels3. Enhanced system prompt
Three major additions:
Multi-signal reasoning: Teaches the model that highlighted text = primary subject, images = supporting context, message = user intent. When all three are present, synthesize them holistically.
Conversational pattern recognition: When users establish a structural pattern (e.g., "What is the population of the US?" then "What about Vietnam?"), recognize and apply the pattern directly instead of asking for clarification.
Intelligent multi-image handling: When users reference "the previous image" in a conversation with multiple images, don't assume the immediately preceding one. If it's blank/irrelevant, scan history to find the most recent visually relevant image for comparison.
Impact
Testing
Generated with Claude Code