-
Notifications
You must be signed in to change notification settings - Fork 0
Feat/add why context enhancement #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rnal why context - Introduce new prompt template for integrating external business/contextual information - Add method to enhance existing commit messages with why context - Modify service layer to accept optional why_context parameter - Separate why context handling from initial commit message generation to reduce noise and improve focus This change addresses inconsistent "why" scores in commit messages by implementing a two-stage prompting approach. Previously, the model would prioritize the core diff over external context when both were provided in a single prompt, resulting in strong "what" descriptions but weak "whys." The solution separates the concerns: first generating a preliminary message from the diff, then enhancing it with a second prompt that specifically focuses on integrating the "why" context. This approach improves accuracy and reduces noise, aligning with RAG patterns for better context-aware commit messages.
…mprehensive tests - Modify CommitMessageGenerator to use message content instead of full object when enhancing with why context - Add test coverage for why context enhancement including success, empty context, and error scenarios - Extend GenerationService to support why context integration in commit message generation - Add service layer tests for commit message generation with and without why context - Improve error handling for AI enhancement failures The previous implementation passed the full result object instead of just the message content, causing unnecessary data transfer and potential serialization issues. This change ensures only the relevant message content is used for AI enhancement, improving efficiency and reducing complexity. Additionally, comprehensive tests were added to validate the new why context feature works correctly across different scenarios including success cases, empty contexts, and error conditions. This provides confidence in the feature's reliability and makes future maintenance easier.
…ontext in commit messages - Specify that preliminary message content should not be repeated - Add criteria for including WHY context (relevance, helpfulness, impact level) - Refine instruction to emphasize problem-solution-benefit structure - Maintain focus on concise, conventional commit format output The previous WHY context was too verbose and unfocused, leading to less effective commit message generation. This change streamlines the guidance to focus on impact and conciseness. By clearly stating the problem, solution, and benefit, developers can generate more precise and useful commit messages that improve code maintainability and collaboration.
- Add / option to command - Pass context to - Improve commit message generation with additional context This enables users to provide custom context for better commit message generation, solving the issue of generic or irrelevant commit messages. By allowing users to pass context, the generated commit messages become more accurate and meaningful, leading to better code documentation and easier maintenance.
…ples - Add detailed scoring rubrics for WHAT and WHY components - Include comprehensive examples covering excellent to poor quality messages - Improve prompt logic to better distinguish valuable context from technical noise - Update validation cases with more realistic scenarios and business impact descriptions - Refine evaluation thresholds and criteria descriptions for consistency This change improves the accuracy of commit message quality assessment by providing clearer guidelines and more representative test cases.
…-context-enhancement
…and instructions - Update evaluation criteria to reference <ORIGINAL_COMMIT_MESSAGE> and <EXTERNAL_CONTEXT> placeholders - Clarify definition of WHY in commit messages - Improve instructions for handling external context - Remove outdated examples that were causing confusion - Focus on preserving original commit message when context doesn't add value - Emphasize not making up information not present in provided context - Streamline prompt to reduce redundancy and improve clarity External context: Improved why context decision accuracy from 57.1% to 71.4% in benchmarks. This change correctly skips enhancement for cases like test_coverage_context that shouldn't be processed, marking the first successful implementation of this behavior.
…prompt Some models weren't returning JSON responses consistently. Explicitly stating the JSON requirement ensures all models comply, addressing compatibility issues across different model providers. This change prevents parsing errors and ensures reliable evaluation results.
… suite Adds a benchmarking suite to evaluate the effectiveness of WHY context enhancement in commit messages, ensuring it improves the WHY score without introducing noise or redundant technical detatails. Currently achieves a 75% success rate across most models, with areas for improvement identified in simple and average bug fix cases.
…ion output Fixes issue where entire reasoning chain was being sent to commit message generation, causing verbose and confusing output. This change ensures only the essential prompt instructions are used, improving the quality and relevance of generated commit messages.
… truthfulness - Add requirement for messages to focus on accuracy, validity, and truthfulness - Update evaluation criteria to score 1 for WHAT/WHY when messages are untruthful or inaccurate - Clarify that score 1 applies when changes are misrepresented, omitted, or described inaccurately - Improve prompt guidance to ensure high-quality, honest commit message assessment
…hmarks - Update validation suite with new test cases for documentation, feature implementations, and security fixes - Refactor validation cases to better categorize quality levels (good, average, poor, very poor) - Improve evaluation criteria for WHAT/WHY scoring with clearer guidelines - Add WHY context guidance for lenient scoring of low-impact changes - Update benchmark suite to use enum-based context quality classification - Enhance result reporting with WHAT/WHY score breakdown - Fix service logic to properly return enhanced results when why_context is provided
No functional changes; improves readability and conciceness of benchmark documentation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR improves the commit message evaluation system by refining scoring criteria, enhancing validation test cases, and adding robust WHY context enhancement capabilities.
These changes make the evaluation more accurate, consistent, and comprehensive.
Key Features
Why (External) Context Processing
Enhanced Validation Suite
Improved WHAT/WHY Scoring System
Comprehensive Examples: Added detailed examples showing different score ranges with reasoning
Chain-of-Thought Evaluation: Implemented structured evaluation process for more consistent scoring
WHY Context Enhancement
Enum-Based Classification: Introduced ContextQuality enum for better categorization:
Lenient Scoring Guidance: Added specific guidance for scoring low-impact changes appropriately
Enhanced Decision Logic: Improved criteria for when to enhance messages with external context
Service Logic Improvements
Enhanced Reporting
Impact
This enhancement improves the accuracy and consistency of commit message evaluation, providing:
The changes maintain backward compatibility while providing a more robust foundation for commit message quality assessment.