-
Notifications
You must be signed in to change notification settings - Fork 321
[code_review] Misc improvements part 4 #5588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[code_review] Misc improvements part 4 #5588
Conversation
We can have better tracking with W&B Weave
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the code review evaluation infrastructure by replacing old script-based evaluation tools with a more modular architecture and W&B Weave integration for tracking evaluations.
Changes:
- Removes legacy evaluation scripts (
code_review_tool_evaluator.py,code_review_tool_evaluator_report.py) and experimental files - Introduces new modular tools for patch summarization, suggestion filtering, and comment matching
- Adds Jupyter notebooks for dataset creation and evaluation using W&B Weave
- Refactors
CodeReviewToolto use Protocol-based dependency injection for better testability - Updates platform base classes to accept both
strandintforpatch_idparameters
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/code_review_tool_evaluator_report.py | Removed legacy evaluation report generator |
| scripts/code_review_tool_evaluator.py | Removed legacy evaluation script (613 lines) |
| experiments/review_helper_modify_filtering_step.ipy | Removed experimental filtering modification script |
| requirements.txt | Added weave>=0.50.0 for evaluation tracking |
| notebooks/code_review_evaluation.ipynb | New notebook for running W&B Weave evaluations |
| notebooks/code_review_create_dataset.ipynb | New notebook for creating evaluation datasets |
| bugbug/tools/suggestion_filtering/prompts.py | Extracted filtering prompts to dedicated module |
| bugbug/tools/suggestion_filtering/agent.py | New modular suggestion filtering tool |
| bugbug/tools/patch_summarization/prompts.py | Extracted summarization prompts to dedicated module |
| bugbug/tools/patch_summarization/agent.py | New modular patch summarization tool |
| bugbug/tools/comment_matching/prompts.py | New prompts for LLM-based comment matching |
| bugbug/tools/comment_matching/agent.py | New tool for matching generated vs ground truth comments |
| bugbug/tools/code_review/scorer.py | New Weave scorers for evaluation metrics |
| bugbug/tools/code_review/utils.py | Refactored to work with structured comment objects |
| bugbug/tools/code_review/prompts.py | Removed prompts moved to specialized modules |
| bugbug/tools/code_review/agent.py | Refactored to use Protocol-based dependencies |
| bugbug/tools/base.py | Simplified by removing version property and print method |
| bugbug/tools/core/platforms/base.py | Updated signature to accept str or int for patch_id |
| bugbug/tools/core/platforms/phabricator.py | Updated signature to accept str or int for patch_id |
| bugbug/tools/core/platforms/swarm.py | Updated signature to accept str or int for patch_id |
| bugbug/code_search/searchfox_api.py | Made get_file parameter optional with default implementation |
| bugbug/code_search/mozilla.py | Made get_file parameter optional with fallback |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from pydantic import BaseModel, Field | ||
|
|
||
| from bugbug.tools.base import GenerativeModelTool | ||
| from bugbug.tools.code_review.agent import GeneratedReviewComment |
Copilot
AI
Jan 10, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential circular import issue: this module imports GeneratedReviewComment from bugbug.tools.code_review.agent, while bugbug.tools.code_review.agent imports SuggestionFilteringTool from this module. This circular dependency could cause import errors. Consider moving GeneratedReviewComment to a shared data types module or using TYPE_CHECKING with forward references.
Introduces a run_by_diff_id method that retrieves a patch by diff ID from review_data and runs the review process.
Eliminated the abstract version property from GenerativeModelTool and removed the version attribute from CodeReviewTool since it is not used.
Moved suggestion filtering logic from code_review/agent.py to a new suggestion_filtering module. Introduced SuggestionFilteringTool for filtering review comments, updated CodeReviewTool to use the new filterer, and relocated related prompt templates. This improves modularity and separation of concerns for suggestion filtering.
Wrapped comments and rejected examples in <comments-to-filter> and <rejected-examples> tags to improve prompt structure and clarity.
It will be replaced with W&B Weave evaluation pipeline
7e1ca6b to
595cb8c
Compare
These improvements could be reviewed commit by commit.