Skip to content

agentic-review-benchmarks/benchmark-pr-mapping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

benchmark-pr-mapping

Summary

This benchmark mapping tracks agentic code review PRs across 9 different AI-powered code review tools. Each row represents a unique PR from a source repository, and each column shows the corresponding review PR generated by each tool for the same code change—enabling direct comparison of review quality across tools.

Quick Facts

Metric Value
Total PRs 100
Repositories 8
Tools Compared 9

Tools Under Evaluation

qodo · augment · copilot · cursor · greptile · codex · coderabbit · sentry

PR Distribution by Repository

Repository Language PRs
cal.com TypeScript 16
Ghost JavaScript 13
dify Python, Go 13
firefox-ios Swift 13
prefect Python 13
tauri Rust 13
aspnetcore C# 10
redis C 9

Metric Definitions

Metric Description
Precision "When I flag something, am I right?" — Of all issues reported, how many were actual bugs. High precision = few false alarms.
Recall "Did I catch everything?" — Of all real bugs that existed, how many did the tool find. High recall = few things slip through.
F1 Score The harmonic mean of precision & recall — balances both metrics to show overall effectiveness at finding real issues without crying wolf.

Benchmark Results

Agent # GT Findings Precision (%) Recall (%) F1 (%)
Qodo - Exhaustive 580 63.8 56.7 60.1
Qodo - Precise 580 74.5 44.2 55.4
Augment 580 70.6 32.1 44.1
Copilot 580 50.1 37.4 42.8
Cursor 580 78.5 26.2 39.3
Greptile 580 68.5 27.2 39.0
Codex 580 83.0 24.3 37.6
Coderabbit 580 53.7 19.0 28.0
Sentry 580 85.3 13.8 23.7

PR Mapping Table

Repo Name PR Title qodo-exhaustive qodo-precise augment copilot cursor greptile codex coderabbit sentry
Ghost Removed xmlrpc/pingomatic ping service PR #3 PR #16 PR #2 PR #3 PR #3 PR #4 PR #3 PR #1 PR #3
Ghost 🐛 Fixed reply form showing parent comment author's details PR #4 PR #17 PR #3 PR #4 PR #4 PR #5 PR #4 PR #2 PR #5
Ghost Updated the scaling factors for domain warmup PR #5 PR #18 PR #4 PR #5 PR #5 PR #6 PR #5 PR #3 PR #6
Ghost Redesigned comment moderation list layout PR #6 PR #19 PR #5 PR #6 PR #6 PR #7 PR #6 PR #4 PR #7
Ghost Moved to kebab-case formatting signup-form PR #7 PR #20 PR #6 PR #7 PR #7 PR #8 PR #7 PR #5 PR #8
Ghost Changed member welcome email job to run based on config PR #8 PR #21 PR #7 PR #8 PR #8 PR #9 PR #8 PR #6 PR #9
Ghost Renamed files to kebab-case - core - services - part 2 PR #9 PR #22 PR #8 PR #9 PR #9 PR #10 PR #9 PR #7 PR #10
Ghost Align zh translations with China mainland terminology. PR #10 PR #23 PR #9 PR #10 PR #10 PR #11 PR #10 PR #8 PR #11
Ghost Switched to time-based domain warmup PR #11 PR #24 PR #10 PR #11 PR #11 PR #12 PR #11 PR #9 PR #12
Ghost Updated activitypub Bluesky sharing enablement flow PR #12 PR #25 PR #11 PR #12 PR #12 PR #13 PR #12 PR #10 PR #13
Ghost Analytics filter refinements PR #13 PR #26 PR #12 PR #13 PR #13 PR #14 PR #13 PR #11 PR #14
Ghost Removed unused tinybird filters PR #14 PR #27 PR #13 PR #14 PR #14 PR #15 PR #14 PR #12 PR #15
Ghost Added materialized view and duplicated v2 Tinybird endpoints PR #15 PR #28 PR #14 PR #15 PR #15 PR #16 PR #15 PR #13 PR #16
aspnetcore [release/10.0] Source code updates from dotnet/dotnet PR #9 PR #19 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
aspnetcore [release/9.0] Update dependencies from dotnet/arcade PR #2 PR #12 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2
aspnetcore Ensure SSL_CERT_DIR messages are always shown and check for ... PR #3 PR #13 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3
aspnetcore Blazor supports DisplayName for models PR #4 PR #14 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4
aspnetcore Add test coverage for prerendering closed generic components PR #5 PR #15 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5
aspnetcore [Blazor] Remove obsolete APIs from Components PR #6 PR #16 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6
aspnetcore Allow JS root components to reinitialize on circuit restart PR #7 PR #17 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7
aspnetcore [release/10.0] Fix ModelMetadata null reference exception in... PR #8 PR #18 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8
aspnetcore [release/10.0] Source code updates from dotnet/dotnet PR #9 PR #19 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
aspnetcore Doc updates PR #10 PR #20 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10
cal.com fix: get bookings handler for pbac and fallback roles PR #1 PR #16 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1
cal.com fix: add server-side redirect for users with pending invites... PR #2 PR #17 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2
cal.com chore: [Booking Cancellation Refactor - 2] Inject repositori... PR #3 PR #18 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3
cal.com fix(api): return original email without OAuth suffix in book... PR #4 PR #19 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4
cal.com fix(companion): event type links for org user PR #5 PR #20 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5
cal.com feat: api v2 team invite link endpoint PR #6 PR #21 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6
cal.com feat: CalendarCache - filter generic calendars from subscrip... PR #7 PR #22 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7
cal.com feat: limit badges to 2 with hover/click popover in UserList... PR #8 PR #23 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8
cal.com refactor: move WebWrapper files from packages/platform to ap... PR #9 PR #24 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
cal.com chore: Integrate confirmation booking audit PR #10 PR #25 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10
cal.com fix: Make BOOKING_CANCELLED` webhook payload consistent for ... PR #11 PR #26 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11
cal.com refactor: Remove trpc/react dependency from @calcom/atoms PR #12 PR #27 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12
cal.com fix: enable DI for FeatureOptInService PR #13 PR #28 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13
cal.com feat: add scope configuration for feature opt-in PR #14 PR #29 PR #14 PR #14 PR #14 PR #14 PR #14 PR #14 PR #14
cal.com refactor: Remove trpc/server dependency from @calcom/atoms PR #15 PR #30 PR #15 PR #15 PR #15 PR #15 PR #15 PR #15 PR #15
cal.com feat: update recording, transcript endpoint and add tests PR #15 PR #31 PR #16 PR #15 PR #15 PR #16 PR #15 PR #16 PR #16
dify fix(api): defer streaming response until referenced variable... PR #1 PR #14 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1
dify fix(web): enable JSON_OBJECT type support in console UI PR #2 PR #15 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2
dify refactor(workflow): add Jinja2 renderer abstraction for temp... PR #3 PR #16 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3
dify feat: Add conversation variable persistence layer PR #4 PR #17 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4
dify fix: remove hardcoded 48-character limit from text inputs PR #5 PR #18 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5
dify feat: allow pass hostname in docker env PR #6 PR #19 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6
dify test: unify i18next mocks into centralized helpers PR #7 PR #20 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7
dify refactor(web): organize devtools components PR #8 PR #21 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8
dify refactor: always preserve marketplace search state in URL PR #9 PR #22 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
dify feat: get plan bulk with cache PR #10 PR #23 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10
dify refactor(models): Refine MessageAgentThought SQLAlchemy typi... PR #11 PR #24 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11
dify fix(api): refactors the SQL LIKE pattern escaping logic to u... PR #12 PR #25 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12
dify fix: workflow incorrectly marked as completed while nodes ar... PR #13 PR #26 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13
firefox-ios EXP-5874 Add strings for Rollouts toggle and update Studies ... PR #1 PR #14 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1
firefox-ios Refactor FXIOS-12796 [Swift 6 Migration] Fix main actor iso... PR #2 PR #15 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2
firefox-ios Refactor FXIOS-14485 FXIOS-14472 [Swift 6 Migration] Turn on... PR #3 PR #16 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3
firefox-ios [MTE-5149] - delete auto tests PR #4 PR #17 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4
firefox-ios Remove FXIOS-14327 [Clean up] Inactive tabs removal part 2 (... PR #5 PR #18 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5
firefox-ios Refactor FXIOS-14344 [Swift 6 migration] Fixing warnings in ... PR #8 PR #21 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8
firefox-ios Refactor FXIOS-13122 [Swift 6] Remove @Sendable from closure... PR #7 PR #20 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7
firefox-ios Refactor FXIOS-14344 [Swift 6 migration] Fixing DependencyHe... PR #8 PR #21 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8
firefox-ios Remove FXIOS-14593 [BVC] Clean up qr code related code from ... PR #9 PR #22 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
firefox-ios Refactor FXIOS-14466 FXIOS-14463 [Swift 6 Migration] Migrate... PR #10 PR #23 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10
firefox-ios Refactor NO_TICKET Various improvements PR #11 PR #24 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11
firefox-ios Bugfix FXIOS-14456 #31308 ⁃ Fix test failing on XCode 26.2 t... PR #12 PR #25 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12
firefox-ios [MTE-5047] - auto tests improvements PR #13 PR #26 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13
prefect Add documentation for prefect sdk generate CLI PR #1 PR #14 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1
prefect Link automation.action.* events to automation.triggered even... PR #2 PR #15 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2
prefect Add drag-and-drop reordering for schema form arrays PR #3 PR #16 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3
prefect Add block document reference support to schema form PR #4 PR #17 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4
prefect fix(prefect-gcp): prevent double-nesting in GcsBucket._resol... PR #5 PR #18 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5
prefect Add arun_deployment and replace @sync_compatible with `@... PR #6 PR #19 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6
prefect fix: resolve race condition in compound trigger evaluation PR #7 PR #20 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7
prefect Add V2 UI packaging and Docker integration PR #8 PR #21 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8
prefect Add FlowIconText component for UI v2 migration PR #9 PR #22 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
prefect feat(ui-v2): Add trigger form templates PR #10 PR #23 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10
prefect Add semaphore to limit concurrent API calls during Kubernete... PR #11 PR #24 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11
prefect feat(ui-v2): improve task run details page parity with Vue PR #12 PR #25 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12
prefect feat(ui-v2): add WorkPoolEditForm component for editing work... PR #13 PR #26 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13
redis Replace fragile dict stored-key API with getKeyId callback PR #1 PR #10 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1
redis Fix adjacent slot range behavior in ASM operations PR #2 PR #11 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2
redis Clean up lookahead-related code PR #3 PR #12 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3
redis Refactor some of ASM and slot-stats functions PR #4 PR #13 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4
redis Fix MEMORY USAGE command PR #5 PR #14 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5
redis Redis 8.2.1 PR #6 PR #15 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6
redis Add auto-repair options for broken AOF tail on startup PR #7 PR #16 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7 PR #7
redis Fix HINCRBYFLOAT removes field expiration on replica PR #9 PR #18 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
redis Fix HINCRBYFLOAT removes field expiration on replica PR #9 PR #18 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9 PR #9
tauri feat(android): add auto_increment_version_code option for An... PR #1 PR #14 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1 PR #1
tauri fix(bundler): inline linuxdeploy plugin scripts PR #2 PR #15 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2 PR #2
tauri chore(deps): update rust crate toml to 0.9 PR #3 PR #16 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3 PR #3
tauri refactor(cli): disable jsonschema resolving external resourc... PR #4 PR #17 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4 PR #4
tauri Less statics PR #5 PR #18 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5 PR #5
tauri feat(core): back button event on Android, closes #8142 PR #6 PR #19 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6 PR #6
tauri Apply Version Updates From Current Changes PR #13 PR #26 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13
tauri chore: fix new clippy warnings (derive default) PR #8 PR #21 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8 PR #8
tauri Apply Version Updates From Current Changes PR #13 PR #26 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13
tauri fix: a few regressions from previous PRs PR #10 PR #23 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10 PR #10
tauri feat(bundler/cli): Add feature flag to use system certificat... PR #11 PR #24 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11 PR #11
tauri feat(cli): UTExportedTypeDeclarations support for file assoc... PR #12 PR #25 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12 PR #12
tauri Apply Version Updates From Current Changes PR #13 PR #26 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13 PR #13

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors