[code_review] Misc improvements part 4 #5588

suhaibmujahid · 2026-01-05T02:51:41Z

These improvements could be reviewed commit by commit.

We can have better tracking with W&B Weave

…mponents

Copilot

Pull request overview

This PR refactors the code review evaluation infrastructure by replacing old script-based evaluation tools with a more modular architecture and W&B Weave integration for tracking evaluations.

Changes:

Removes legacy evaluation scripts (code_review_tool_evaluator.py, code_review_tool_evaluator_report.py) and experimental files
Introduces new modular tools for patch summarization, suggestion filtering, and comment matching
Adds Jupyter notebooks for dataset creation and evaluation using W&B Weave
Refactors CodeReviewTool to use Protocol-based dependency injection for better testability
Updates platform base classes to accept both str and int for patch_id parameters

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
scripts/code_review_tool_evaluator_report.py	Removed legacy evaluation report generator
scripts/code_review_tool_evaluator.py	Removed legacy evaluation script (613 lines)
experiments/review_helper_modify_filtering_step.ipy	Removed experimental filtering modification script
requirements.txt	Added weave>=0.50.0 for evaluation tracking
notebooks/code_review_evaluation.ipynb	New notebook for running W&B Weave evaluations
notebooks/code_review_create_dataset.ipynb	New notebook for creating evaluation datasets
bugbug/tools/suggestion_filtering/prompts.py	Extracted filtering prompts to dedicated module
bugbug/tools/suggestion_filtering/agent.py	New modular suggestion filtering tool
bugbug/tools/patch_summarization/prompts.py	Extracted summarization prompts to dedicated module
bugbug/tools/patch_summarization/agent.py	New modular patch summarization tool
bugbug/tools/comment_matching/prompts.py	New prompts for LLM-based comment matching
bugbug/tools/comment_matching/agent.py	New tool for matching generated vs ground truth comments
bugbug/tools/code_review/scorer.py	New Weave scorers for evaluation metrics
bugbug/tools/code_review/utils.py	Refactored to work with structured comment objects
bugbug/tools/code_review/prompts.py	Removed prompts moved to specialized modules
bugbug/tools/code_review/agent.py	Refactored to use Protocol-based dependencies
bugbug/tools/base.py	Simplified by removing version property and print method
bugbug/tools/core/platforms/base.py	Updated signature to accept str or int for patch_id
bugbug/tools/core/platforms/phabricator.py	Updated signature to accept str or int for patch_id
bugbug/tools/core/platforms/swarm.py	Updated signature to accept str or int for patch_id
bugbug/code_search/searchfox_api.py	Made get_file parameter optional with default implementation
bugbug/code_search/mozilla.py	Made get_file parameter optional with fallback

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

bugbug/tools/comment_matching/agent.py

Copilot · 2026-01-10T22:41:03Z

bugbug/tools/suggestion_filtering/agent.py

+from pydantic import BaseModel, Field
+
+from bugbug.tools.base import GenerativeModelTool
+from bugbug.tools.code_review.agent import GeneratedReviewComment


Potential circular import issue: this module imports GeneratedReviewComment from bugbug.tools.code_review.agent, while bugbug.tools.code_review.agent imports SuggestionFilteringTool from this module. This circular dependency could cause import errors. Consider moving GeneratedReviewComment to a shared data types module or using TYPE_CHECKING with forward references.

bugbug/tools/code_review/agent.py

bugbug/tools/patch_summarization/agent.py

bugbug/tools/comment_matching/agent.py

bugbug/tools/patch_summarization/prompts.py

bugbug/tools/comment_matching/agent.py

bugbug/tools/patch_summarization/agent.py

bugbug/tools/suggestion_filtering/agent.py

bugbug/tools/comment_matching/agent.py

Introduces a run_by_diff_id method that retrieves a patch by diff ID from review_data and runs the review process.

Eliminated the abstract version property from GenerativeModelTool and removed the version attribute from CodeReviewTool since it is not used.

Moved suggestion filtering logic from code_review/agent.py to a new suggestion_filtering module. Introduced SuggestionFilteringTool for filtering review comments, updated CodeReviewTool to use the new filterer, and relocated related prompt templates. This improves modularity and separation of concerns for suggestion filtering.

Wrapped comments and rejected examples in <comments-to-filter> and <rejected-examples> tags to improve prompt structure and clarity.

It will be replaced with W&B Weave evaluation pipeline

suhaibmujahid added 3 commits December 24, 2025 14:10

Remove unused _print_answer method and related calls

2fb736c

We can have better tracking with W&B Weave

Refactor FunctionSearchMozilla file retrieval logic

9fe827b

Improve the factory method to create a CodeReviewTool with default co…

bd85288

…mponents

suhaibmujahid marked this pull request as ready for review January 10, 2026 22:34

suhaibmujahid requested review from Copilot and marco-c January 10, 2026 22:34

Copilot started reviewing on behalf of suhaibmujahid January 10, 2026 22:35 View session

Copilot AI reviewed Jan 10, 2026

View reviewed changes

suhaibmujahid added 13 commits January 11, 2026 21:25

Add run_by_diff_id method to CodeReviewTool

95678ef

Introduces a run_by_diff_id method that retrieves a patch by diff ID from review_data and runs the review process.

Support providing the patch summery when generating suggestions

3794dad

Update get_patch_by_id to accept str or int patch_id

8d64fe6

Remove version property from GenerativeModelTool

2458610

Eliminated the abstract version property from GenerativeModelTool and removed the version attribute from CodeReviewTool since it is not used.

Refactor patch summarization into separate tool

c9923f3

Add XML-like tags to filtering prompt template

b3d4942

Wrapped comments and rejected examples in <comments-to-filter> and <rejected-examples> tags to improve prompt structure and clarity.

Refactor filtering to return indices instead of full comments

6bb85f1

Add comment matching tool

f5d3dd1

Remove code review evaluation scripts

4856ffb

It will be replaced with W&B Weave evaluation pipeline

Add code review evaluation pipeline

78af5a2

Refactor create method for CodeReviewTool to use classmethod

4699a2c

Fix typo in summarization prompt template

595cb8c

suhaibmujahid force-pushed the improve-revew-helper-4 branch from 7e1ca6b to 595cb8c Compare January 12, 2026 02:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[code_review] Misc improvements part 4 #5588

[code_review] Misc improvements part 4 #5588

Uh oh!

suhaibmujahid commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Jan 10, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[code_review] Misc improvements part 4 #5588

Are you sure you want to change the base?

[code_review] Misc improvements part 4 #5588

Uh oh!

Conversation

suhaibmujahid commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant