Skip to content

Conversation

@suhaibmujahid
Copy link
Member

These improvements could be reviewed commit by commit.

@suhaibmujahid suhaibmujahid marked this pull request as ready for review January 10, 2026 22:34
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the code review evaluation infrastructure by replacing old script-based evaluation tools with a more modular architecture and W&B Weave integration for tracking evaluations.

Changes:

  • Removes legacy evaluation scripts (code_review_tool_evaluator.py, code_review_tool_evaluator_report.py) and experimental files
  • Introduces new modular tools for patch summarization, suggestion filtering, and comment matching
  • Adds Jupyter notebooks for dataset creation and evaluation using W&B Weave
  • Refactors CodeReviewTool to use Protocol-based dependency injection for better testability
  • Updates platform base classes to accept both str and int for patch_id parameters

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
scripts/code_review_tool_evaluator_report.py Removed legacy evaluation report generator
scripts/code_review_tool_evaluator.py Removed legacy evaluation script (613 lines)
experiments/review_helper_modify_filtering_step.ipy Removed experimental filtering modification script
requirements.txt Added weave>=0.50.0 for evaluation tracking
notebooks/code_review_evaluation.ipynb New notebook for running W&B Weave evaluations
notebooks/code_review_create_dataset.ipynb New notebook for creating evaluation datasets
bugbug/tools/suggestion_filtering/prompts.py Extracted filtering prompts to dedicated module
bugbug/tools/suggestion_filtering/agent.py New modular suggestion filtering tool
bugbug/tools/patch_summarization/prompts.py Extracted summarization prompts to dedicated module
bugbug/tools/patch_summarization/agent.py New modular patch summarization tool
bugbug/tools/comment_matching/prompts.py New prompts for LLM-based comment matching
bugbug/tools/comment_matching/agent.py New tool for matching generated vs ground truth comments
bugbug/tools/code_review/scorer.py New Weave scorers for evaluation metrics
bugbug/tools/code_review/utils.py Refactored to work with structured comment objects
bugbug/tools/code_review/prompts.py Removed prompts moved to specialized modules
bugbug/tools/code_review/agent.py Refactored to use Protocol-based dependencies
bugbug/tools/base.py Simplified by removing version property and print method
bugbug/tools/core/platforms/base.py Updated signature to accept str or int for patch_id
bugbug/tools/core/platforms/phabricator.py Updated signature to accept str or int for patch_id
bugbug/tools/core/platforms/swarm.py Updated signature to accept str or int for patch_id
bugbug/code_search/searchfox_api.py Made get_file parameter optional with default implementation
bugbug/code_search/mozilla.py Made get_file parameter optional with fallback

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from pydantic import BaseModel, Field

from bugbug.tools.base import GenerativeModelTool
from bugbug.tools.code_review.agent import GeneratedReviewComment
Copy link

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential circular import issue: this module imports GeneratedReviewComment from bugbug.tools.code_review.agent, while bugbug.tools.code_review.agent imports SuggestionFilteringTool from this module. This circular dependency could cause import errors. Consider moving GeneratedReviewComment to a shared data types module or using TYPE_CHECKING with forward references.

Copilot uses AI. Check for mistakes.
Introduces a run_by_diff_id method that retrieves a patch by diff ID from review_data and runs the review process.
Eliminated the abstract version property from GenerativeModelTool and removed the version attribute from CodeReviewTool since it is not used.
Moved suggestion filtering logic from code_review/agent.py to a new suggestion_filtering module. Introduced SuggestionFilteringTool for filtering review comments, updated CodeReviewTool to use the new filterer, and relocated related prompt templates. This improves modularity and separation of concerns for suggestion filtering.
Wrapped comments and rejected examples in <comments-to-filter> and <rejected-examples> tags to improve prompt structure and clarity.
It will be replaced with W&B Weave evaluation pipeline
@suhaibmujahid suhaibmujahid force-pushed the improve-revew-helper-4 branch from 7e1ca6b to 595cb8c Compare January 12, 2026 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant