Feat/add why context enhancement #22

nick-galluzzo · 2025-08-07T06:05:49Z

This PR improves the commit message evaluation system by refining scoring criteria, enhancing validation test cases, and adding robust WHY context enhancement capabilities.

These changes make the evaluation more accurate, consistent, and comprehensive.

Key Features

Why (External) Context Processing

Users are now able to provide external context to the generator to help the generator improve their why score.

Enhanced Validation Suite

New Test Cases: Added comprehensive validation cases for documentation, feature implementations, and security fixes
Quality Categorization: Refactored validation cases with better categorization across quality levels:
- Excellent (4.5-5.0) - Security fixes with business impact, performance optimizations with metrics
- Good (3.5-4.4) - Clear feature implementations, bug fixes with context
- Average (2.5-3.4) - Basic changes with minimal context
- Poor (1.5-2.4) - Unclear or incomplete descriptions
- Very Poor (0.5-1.4) - Misleading or uninformative messages

Improved WHAT/WHY Scoring System

Clearer Guidelines: Enhanced evaluation criteria with specific examples and scoring rationale
Comprehensive Examples: Added detailed examples showing different score ranges with reasoning
Chain-of-Thought Evaluation: Implemented structured evaluation process for more consistent scoring

WHY Context Enhancement

Enum-Based Classification: Introduced ContextQuality enum for better categorization:

GOOD - Adds meaningful user/business context
BAD - Irrelevant or confusing information
TECHNICAL - Implementation details already clear from diff
REDUNDANT - Repeats information from commit message
Lenient Scoring Guidance: Added specific guidance for scoring low-impact changes appropriately
Enhanced Decision Logic: Improved criteria for when to enhance messages with external context

Service Logic Improvements

Fixed Enhancement Flow: Corrected generation service to properly return enhanced results when WHY context is provided
Separation of Concerns: Better handling of WHY context enhancement as separate step to reduce noise in initial generation

Enhanced Reporting

WHAT/WHY Score Breakdown: Detailed reporting of individual scoring components

Impact

This enhancement improves the accuracy and consistency of commit message evaluation, providing:

More reliable scoring across different types of changes
Better guidance for when and how to enhance commit messages with context
Clearer evaluation criteria that align with industry best practices
Enhanced user experience with detailed score breakdowns and reasoning

The changes maintain backward compatibility while providing a more robust foundation for commit message quality assessment.

…rnal why context - Introduce new prompt template for integrating external business/contextual information - Add method to enhance existing commit messages with why context - Modify service layer to accept optional why_context parameter - Separate why context handling from initial commit message generation to reduce noise and improve focus This change addresses inconsistent "why" scores in commit messages by implementing a two-stage prompting approach. Previously, the model would prioritize the core diff over external context when both were provided in a single prompt, resulting in strong "what" descriptions but weak "whys." The solution separates the concerns: first generating a preliminary message from the diff, then enhancing it with a second prompt that specifically focuses on integrating the "why" context. This approach improves accuracy and reduces noise, aligning with RAG patterns for better context-aware commit messages.

…mprehensive tests - Modify CommitMessageGenerator to use message content instead of full object when enhancing with why context - Add test coverage for why context enhancement including success, empty context, and error scenarios - Extend GenerationService to support why context integration in commit message generation - Add service layer tests for commit message generation with and without why context - Improve error handling for AI enhancement failures The previous implementation passed the full result object instead of just the message content, causing unnecessary data transfer and potential serialization issues. This change ensures only the relevant message content is used for AI enhancement, improving efficiency and reducing complexity. Additionally, comprehensive tests were added to validate the new why context feature works correctly across different scenarios including success cases, empty contexts, and error conditions. This provides confidence in the feature's reliability and makes future maintenance easier.

…ontext in commit messages - Specify that preliminary message content should not be repeated - Add criteria for including WHY context (relevance, helpfulness, impact level) - Refine instruction to emphasize problem-solution-benefit structure - Maintain focus on concise, conventional commit format output The previous WHY context was too verbose and unfocused, leading to less effective commit message generation. This change streamlines the guidance to focus on impact and conciseness. By clearly stating the problem, solution, and benefit, developers can generate more precise and useful commit messages that improve code maintainability and collaboration.

- Add / option to command - Pass context to - Improve commit message generation with additional context This enables users to provide custom context for better commit message generation, solving the issue of generic or irrelevant commit messages. By allowing users to pass context, the generated commit messages become more accurate and meaningful, leading to better code documentation and easier maintenance.

…ples - Add detailed scoring rubrics for WHAT and WHY components - Include comprehensive examples covering excellent to poor quality messages - Improve prompt logic to better distinguish valuable context from technical noise - Update validation cases with more realistic scenarios and business impact descriptions - Refine evaluation thresholds and criteria descriptions for consistency This change improves the accuracy of commit message quality assessment by providing clearer guidelines and more representative test cases.

…-context-enhancement

…and instructions - Update evaluation criteria to reference <ORIGINAL_COMMIT_MESSAGE> and <EXTERNAL_CONTEXT> placeholders - Clarify definition of WHY in commit messages - Improve instructions for handling external context - Remove outdated examples that were causing confusion - Focus on preserving original commit message when context doesn't add value - Emphasize not making up information not present in provided context - Streamline prompt to reduce redundancy and improve clarity External context: Improved why context decision accuracy from 57.1% to 71.4% in benchmarks. This change correctly skips enhancement for cases like test_coverage_context that shouldn't be processed, marking the first successful implementation of this behavior.

…prompt Some models weren't returning JSON responses consistently. Explicitly stating the JSON requirement ensures all models comply, addressing compatibility issues across different model providers. This change prevents parsing errors and ensures reliable evaluation results.

… suite Adds a benchmarking suite to evaluate the effectiveness of WHY context enhancement in commit messages, ensuring it improves the WHY score without introducing noise or redundant technical detatails. Currently achieves a 75% success rate across most models, with areas for improvement identified in simple and average bug fix cases.

…ion output Fixes issue where entire reasoning chain was being sent to commit message generation, causing verbose and confusing output. This change ensures only the essential prompt instructions are used, improving the quality and relevance of generated commit messages.

… truthfulness - Add requirement for messages to focus on accuracy, validity, and truthfulness - Update evaluation criteria to score 1 for WHAT/WHY when messages are untruthful or inaccurate - Clarify that score 1 applies when changes are misrepresented, omitted, or described inaccurately - Improve prompt guidance to ensure high-quality, honest commit message assessment

…hmarks - Update validation suite with new test cases for documentation, feature implementations, and security fixes - Refactor validation cases to better categorize quality levels (good, average, poor, very poor) - Improve evaluation criteria for WHAT/WHY scoring with clearer guidelines - Add WHY context guidance for lenient scoring of low-impact changes - Update benchmark suite to use enum-based context quality classification - Enhance result reporting with WHAT/WHY score breakdown - Fix service logic to properly return enhanced results when why_context is provided

No functional changes; improves readability and conciceness of benchmark documentation

nick-galluzzo added 16 commits August 5, 2025 21:01

Merge branch 'main' into feat/add-why-context-enhancement

eb52ec0

Merge branch 'main' into feat/improve-evaluation-benchmarking

701b3e1

Merge branch 'feat/improve-evaluation-benchmarking' into feat/add-why…

32065cc

…-context-enhancement

chore(ai): set default temperature to 0.0 for deterministic AI responses

c2422ca

docs(benchmarks): remove redundant comments and streamline descriptions

fb9a66c

No functional changes; improves readability and conciceness of benchmark documentation

nick-galluzzo merged commit 303a53a into main Aug 7, 2025
3 checks passed

nick-galluzzo deleted the feat/add-why-context-enhancement branch August 7, 2025 06:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/add why context enhancement #22

Feat/add why context enhancement #22

Uh oh!

nick-galluzzo commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feat/add why context enhancement #22

Feat/add why context enhancement #22

Uh oh!

Conversation

nick-galluzzo commented Aug 7, 2025

Key Features

Why (External) Context Processing

Enhanced Validation Suite

Improved WHAT/WHY Scoring System

WHY Context Enhancement

Service Logic Improvements

Enhanced Reporting

Impact

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant