Conversation
- Create benchmark.yml with workflow_run trigger on Tests+main - Single Python 3.13 job with MongoDB+Redis services - github-action-benchmark auto-push to gh-pages at dev/bench - Document CI-01 (gh-pages setup) and CI-04 (release strategy) - Add .benchmarks/ to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… files - Remove Run benchmarks, Visualize benchmarks, Upload benchmark results steps - Delete docs/visualize_benchmarks.py (superseded by gh-pages dashboard) - Delete .benchmarks/ local cache directory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 05-01-SUMMARY.md with execution results - Update STATE.md with plan completion and decisions - Update ROADMAP.md with phase 5 progress - Mark CI-01 through CI-04 requirements complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pull_request trigger (opened, synchronize) alongside workflow_run - Add concurrency group to cancel in-progress PR benchmark runs - Split benchmark-action into main (auto-push) and PR (compare-only) steps - PR step: summary-always, comment-on-alert, fail-on-alert at 150% threshold - PR step: auto-push false, save-data-file false to avoid polluting gh-pages - Document PR behavior and branch protection setup in header comments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis pull request separates benchmarking into a dedicated CI workflow, removes legacy local benchmark artifacts and visualization, updates ignore rules, and adds extensive planning, research, validation, and roadmap documentation for benchmark and PR-feedback phases. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (6)
.planning/REQUIREMENTS.md (1)
21-21: PR-03 may already be implemented.
benchmark.ymlline 107 setsfail-on-alert: true, which blocks PR merges on benchmark regressions. This appears to implement PR-03 (fail-on-regression gate). Consider marking this complete or clarifying if additional work is needed for Phase 6.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.planning/REQUIREMENTS.md at line 21, The PR-03 item in REQUIREMENTS.md appears outdated because benchmark.yml already enforces a fail-on-regression gate via the setting fail-on-alert: true; update the PR-03 checklist entry to mark it complete or add clarifying text that references the existing implementation (benchmark.yml's fail-on-alert setting) and whether any additional Phase 6 enhancements are required so reviewers know this requirement is satisfied or what remains..github/workflows/benchmark.yml (1)
27-28: Consider: PR benchmarks run independently of test results.The
pull_requesttrigger runs benchmarks directly without waiting for the Tests workflow to pass. If tests fail, benchmark resources are wasted. This is a design trade-off favoring faster feedback.To chain PR benchmarks after tests, you could use
workflow_runfor PRs too, but this adds complexity around accessing PR context fromworkflow_run. The current approach is simpler and acceptable for most projects.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml around lines 27 - 28, The current benchmark workflow uses the pull_request trigger (pull_request: types: [opened, synchronize]) which runs benchmarks immediately; to run benchmarks only after tests pass, replace the pull_request trigger with a workflow_run trigger that listens for the Tests workflow completion (e.g., workflow_run: workflows: ["Tests"] types: [completed]) and conditionally run when github.event.workflow_run.conclusion == 'success'; when you need PR context in that workflow, read PR info from github.event.workflow_run.pull_requests (or fetch via the GitHub API using the run's commit/branch) to avoid losing PR association..planning/phases/06-pr-feedback/06-VALIDATION.md (1)
41-42: Automate thePR-02threshold check instead of treating it as manual-only.Verifying a fixed YAML value is a cheap deterministic check, so keeping
PR-02manual weakens the validation contract for no real benefit. A grep/yq-style assertion here would also align better with the sign-off item about automated verification coverage.Proposed update
-| 06-01-02 | 01 | 1 | PR-02 | manual | Verify `alert-threshold: "150%"` in YAML | N/A | ⬜ pending | +| 06-01-02 | 01 | 1 | PR-02 | automated | `rg -n 'alert-threshold:\s*"150%"' .github/workflows/benchmark.yml` | N/A | ⬜ pending | ... -| Alert threshold is configurable at 150% | PR-02 | Configuration verification in YAML | Read `alert-threshold` value in workflow file | +| Alert threshold is configurable at 150% | PR-02 | None — static workflow config is directly checkable | Assert `alert-threshold: "150%"` is present in `.github/workflows/benchmark.yml` |Also applies to: 60-60
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.planning/phases/06-pr-feedback/06-VALIDATION.md around lines 41 - 42, The PR-02 validation entry currently marked as manual should be changed to an automated check: update the test matrix row for PR-02 so the "manual" column becomes "automated"/"auto" and the description notes an automated YAML assertion; implement a deterministic assertion that verifies the YAML contains alert-threshold: "150%" (e.g., a yq/grep check in the validation script invoked by the pipeline) and link that automation as the verification step for test id PR-02 so the table and pipeline reflect the automated coverage for the alert-threshold check..planning/STATE.md (2)
35-41: Consider standardizing duration formatting.The duration format "1min" appears twice (lines 39-40). Standard formatting would be "1 min" (with space), "1m", or "1 minute" for better readability in planning documents.
📝 Suggested formatting improvement
**Velocity:** - Total plans completed: 1 (v0.3.1) -- Average duration: 1min -- Total execution time: 1min +- Average duration: 1 min +- Total execution time: 1 min🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.planning/STATE.md around lines 35 - 41, Update the duration formatting in the "Performance Metrics" -> "Velocity" block by replacing the repeated "1min" strings used for "Average duration" and "Total execution time" with a standardized, readable format (e.g., "1 min" or "1 minute"); locate the "Average duration" and "Total execution time" entries and change their values to the chosen format so both lines match consistently.
10-31: Clarify dual phase numbering system.The document uses two different phase numbering systems that may cause confusion:
- Lines 10-13: Progress shows "3 total phases, 1 completed" (milestone-specific)
- Line 28: "Phase: 5 of 7" (project-wide numbering)
While this dual system is internally consistent (Phase 5 is the first phase of v0.3.1 milestone), future readers may find it confusing. Consider adding a brief note explaining that "total_phases" refers to the current milestone's phases, while the "Phase: X of Y" notation refers to overall project phases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.planning/STATE.md around lines 10 - 31, The document shows two phase numbering systems (the YAML keys total_phases/completed_phases/percent and the human-readable "Phase: 5 of 7" line) which can confuse readers; add a single clarifying sentence or parenthetical near the top (e.g., after the "Project State" header or directly beneath the Project Reference) that explains that total_phases/completed_phases refer to the current milestone's phases while the "Phase: X of Y" line denotes overall project phases, and update any adjacent wording to make this distinction explicit (reference total_phases, completed_phases, and the "Phase: 5 of 7" text so reviewers can locate the change)..planning/phases/06-pr-feedback/06-01-PLAN.md (1)
136-136: Consider breaking verification script into multiple lines for readability.The verification script is a 400+ character one-liner with 8 chained grep commands. While functionally correct, splitting it into multiple lines with comments would improve maintainability and make it easier to identify which check failed.
♻️ More readable verification script
#!/bin/bash # Verify benchmark.yml has all required PR comparison elements cd /Users/fzills/tools/asebytes echo "Checking for pull_request trigger..." grep -q 'pull_request' .github/workflows/benchmark.yml || exit 1 echo "Checking for summary-always option..." grep -q 'summary-always: true' .github/workflows/benchmark.yml || exit 1 echo "Checking for fail-on-alert option..." grep -q 'fail-on-alert: true' .github/workflows/benchmark.yml || exit 1 echo "Checking for alert-threshold..." grep -q 'alert-threshold' .github/workflows/benchmark.yml || exit 1 echo "Checking for auto-push: false (PR mode)..." grep -q 'auto-push: false' .github/workflows/benchmark.yml || exit 1 echo "Checking for save-data-file: false..." grep -q 'save-data-file: false' .github/workflows/benchmark.yml || exit 1 echo "Checking for concurrency group..." grep -q 'concurrency' .github/workflows/benchmark.yml || exit 1 echo "ALL CHECKS PASSED"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.planning/phases/06-pr-feedback/06-01-PLAN.md at line 136, The one-line verification command in the automated block should be split into a multi-line shell script for readability and debuggability: replace the long chained command (the current automated line containing multiple grep -q checks) with a bash script that cd's to the repo, runs each grep -q as its own step with a preceding echo message and a short comment, and uses "|| exit 1" after each grep so failures surface immediately; keep the same exact grep patterns ('pull_request', 'summary-always: true', 'fail-on-alert: true', 'alert-threshold', 'auto-push: false', 'save-data-file: false', 'concurrency') and ensure a final echo "ALL CHECKS PASSED" is printed on success to preserve behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/benchmark.yml:
- Around line 13-14: There's a contradiction between the CI-04 description in
the benchmark workflow header and the CI-04 entry in REQUIREMENTS.md; pick the
intended behavior and make them consistent: either update the CI-04 comment in
the benchmark workflow file to state that release/tag events DO trigger a
benchmark snapshot on gh-pages, or update the CI-04 entry in REQUIREMENTS.md to
mark that release/tag events do NOT get a separate benchmark run. Locate the
CI-04 comment in the workflow header and the CI-04 line in REQUIREMENTS.md and
edit the one that is incorrect so both documents state the same behavior.
In @.planning/phases/05-benchmark-pipeline/05-01-PLAN.md:
- Around line 52-55: The execution_context block contains machine-specific
absolute paths (the strings starting with /Users/...) which must be replaced
with repo-relative references or plain document identifiers; edit the
execution_context entry to remove those absolute paths and use repo-relative
paths like ./workflows/execute-plan.md and ./templates/summary.md or simple
identifiers (e.g., "workflows/execute-plan" and "templates/summary") so the plan
contains no user-specific filesystem information.
In @.planning/phases/05-benchmark-pipeline/05-RESEARCH.md:
- Line 13: Remove the obsolete "artifact-handoff" architecture text from the
research doc: delete or rewrite the "Primary recommendation" paragraph that
advises passing benchmark JSON across workflows (the sentence starting "Use
`workflow_run` trigger with `actions/download-artifact@v4`...") and any other
conflicting passages that describe cross-workflow artifact handoff (also remove
the blocks referenced around lines 70-72 and 89-101). Replace those with a clear
single recommendation that `benchmark.yml` runs benchmarks directly (matching
the "IMPORTANT DECISION" and the cleaned `tests.yml`), and mention using
`benchmark-action/github-action-benchmark@v1` with `auto-push: true` to publish
results to gh-pages if needed; ensure the document no longer presents both
models side-by-side.
In @.planning/phases/05-benchmark-pipeline/05-VERIFICATION.md:
- Around line 22-25: The verification doc is out of date: update
.planning/phases/05-benchmark-pipeline/05-VERIFICATION.md so its statements
about the benchmark workflow match the current .github/workflows/benchmark.yml
(which now includes a pull_request trigger and a PR-only comparison step) —
either mark the listed items (e.g., the “PR does NOT trigger benchmark workflow”
entry and related lines referenced in the diff) as historical snapshots, or
change their VERIFIED status and text to reflect that benchmark.yml contains a
pull_request trigger and PR-only comparison; apply the same updates to the other
affected sections called out (lines referenced as 35-36, 43-45, 50-53) so the
document consistently matches the branch behavior.
In @.planning/phases/06-pr-feedback/06-CONTEXT.md:
- Around line 17-21: The concurrency group currently uses 'benchmark-${{
github.event.pull_request.number }}' which is undefined for workflow_run events
and causes unrelated runs to collide; change the concurrency group to use a
fallback expression so when github.event.pull_request.number is missing it falls
back to a stable per-run identifier (for example github.run_id or
github.ref/sha). Update the concurrency: group value to incorporate that
fallback (keeping the 'benchmark-' prefix) so both pull_request and workflow_run
triggers produce unique, non-conflicting keys.
In @.planning/REQUIREMENTS.md:
- Line 15: The CI-04 check is marked complete but the benchmark workflow lacks
release/tag triggers; either update the REQUIREMENTS.md entry for CI-04 to
"pending" or modify the benchmark workflow (benchmark.yml) to include release
and tag push triggers (add an on: release: types: [published] and/or on: push:
tags: ['*']) so that release/tag events will run the benchmark and publish the
snapshot to gh-pages; update the CI-04 status in .planning/REQUIREMENTS.md if
you choose not to change the workflow.
In @.planning/research/ARCHITECTURE.md:
- Around line 112-120: The ARCHITECTURE.md section uses an `auto-push`
conditional of `github.event_name == 'push' && github.ref == 'refs/heads/main'`
which conflicts with the Phase 6 plan that uses `workflow_run` for main-branch
benchmarks and `pull_request` for PRs; update the ARCHITECTURE.md text to match
the Phase 6 approach (or vice versa) by replacing the current `auto-push`
conditional with the `workflow_run`-based logic (`github.event_name ==
'workflow_run'` for main branch runs triggered after Tests and `pull_request`
for PR runs) and explicitly document that `comment-on-alert: true` remains
enabled for PR comparisons so readers can trace the `workflow_run` vs
`pull_request` trigger strategy consistently with the Phase 6 plan.
- Around line 122-125: Decision 4 in ARCHITECTURE.md claims we kept
visualize_benchmarks.py, but the file docs/visualize_benchmarks.py is missing
and docs/README.md still references running it; either restore the script and
its artifact-upload behavior and add it back at docs/visualize_benchmarks.py
(ensuring it accepts benchmark_results.json as the CLI arg and is documented in
docs/README.md), or change Decision 4 text in .planning/research/ARCHITECTURE.md
and remove or update the docs/README.md reference so both documentation and
architecture reflect that the visualization script was removed; update whichever
files you change to keep the repository consistent (Decision 4, docs/README.md,
and presence or absence of docs/visualize_benchmarks.py).
In @.planning/research/FEATURES.md:
- Line 84: Update the stale reference in the FEATURES.md line that mentions
"`--benchmark-json=benchmark_results.json` in CI (tests.yml line 61)`" so it
points to the new workflow location: replace the `tests.yml line 61` reference
with the appropriate `benchmark.yml` workflow (or remove the line-number mention
and reference `benchmark.yml`/the benchmark workflow step), ensuring the text
still mentions `--benchmark-json=benchmark_results.json` as the direct input to
github-action-benchmark `output-file-path`.
In @.planning/ROADMAP.md:
- Around line 87-89: The "Benchmark Pipeline" (Phase 5) table row is missing its
Milestone cell, causing column misalignment; update the Phase 5 row (the line
containing "5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | -") to
include the Milestone value (match the format used by lines with "v0.3.1") so
all pipes align and the table columns are consistent.
---
Nitpick comments:
In @.github/workflows/benchmark.yml:
- Around line 27-28: The current benchmark workflow uses the pull_request
trigger (pull_request: types: [opened, synchronize]) which runs benchmarks
immediately; to run benchmarks only after tests pass, replace the pull_request
trigger with a workflow_run trigger that listens for the Tests workflow
completion (e.g., workflow_run: workflows: ["Tests"] types: [completed]) and
conditionally run when github.event.workflow_run.conclusion == 'success'; when
you need PR context in that workflow, read PR info from
github.event.workflow_run.pull_requests (or fetch via the GitHub API using the
run's commit/branch) to avoid losing PR association.
In @.planning/phases/06-pr-feedback/06-01-PLAN.md:
- Line 136: The one-line verification command in the automated block should be
split into a multi-line shell script for readability and debuggability: replace
the long chained command (the current automated line containing multiple grep -q
checks) with a bash script that cd's to the repo, runs each grep -q as its own
step with a preceding echo message and a short comment, and uses "|| exit 1"
after each grep so failures surface immediately; keep the same exact grep
patterns ('pull_request', 'summary-always: true', 'fail-on-alert: true',
'alert-threshold', 'auto-push: false', 'save-data-file: false', 'concurrency')
and ensure a final echo "ALL CHECKS PASSED" is printed on success to preserve
behavior.
In @.planning/phases/06-pr-feedback/06-VALIDATION.md:
- Around line 41-42: The PR-02 validation entry currently marked as manual
should be changed to an automated check: update the test matrix row for PR-02 so
the "manual" column becomes "automated"/"auto" and the description notes an
automated YAML assertion; implement a deterministic assertion that verifies the
YAML contains alert-threshold: "150%" (e.g., a yq/grep check in the validation
script invoked by the pipeline) and link that automation as the verification
step for test id PR-02 so the table and pipeline reflect the automated coverage
for the alert-threshold check.
In @.planning/REQUIREMENTS.md:
- Line 21: The PR-03 item in REQUIREMENTS.md appears outdated because
benchmark.yml already enforces a fail-on-regression gate via the setting
fail-on-alert: true; update the PR-03 checklist entry to mark it complete or add
clarifying text that references the existing implementation (benchmark.yml's
fail-on-alert setting) and whether any additional Phase 6 enhancements are
required so reviewers know this requirement is satisfied or what remains.
In @.planning/STATE.md:
- Around line 35-41: Update the duration formatting in the "Performance Metrics"
-> "Velocity" block by replacing the repeated "1min" strings used for "Average
duration" and "Total execution time" with a standardized, readable format (e.g.,
"1 min" or "1 minute"); locate the "Average duration" and "Total execution time"
entries and change their values to the chosen format so both lines match
consistently.
- Around line 10-31: The document shows two phase numbering systems (the YAML
keys total_phases/completed_phases/percent and the human-readable "Phase: 5 of
7" line) which can confuse readers; add a single clarifying sentence or
parenthetical near the top (e.g., after the "Project State" header or directly
beneath the Project Reference) that explains that total_phases/completed_phases
refer to the current milestone's phases while the "Phase: X of Y" line denotes
overall project phases, and update any adjacent wording to make this distinction
explicit (reference total_phases, completed_phases, and the "Phase: 5 of 7" text
so reviewers can locate the change).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c48e66ed-ebfe-4379-a279-afa1e27deda6
📒 Files selected for processing (25)
.benchmarks/Darwin-CPython-3.11-64bit/0001_baseline.json.benchmarks/Darwin-CPython-3.11-64bit/0002_perf_analysis.json.github/workflows/benchmark.yml.github/workflows/tests.yml.gitignore.planning/PROJECT.md.planning/REQUIREMENTS.md.planning/ROADMAP.md.planning/STATE.md.planning/phases/05-benchmark-pipeline/05-01-PLAN.md.planning/phases/05-benchmark-pipeline/05-01-SUMMARY.md.planning/phases/05-benchmark-pipeline/05-CONTEXT.md.planning/phases/05-benchmark-pipeline/05-RESEARCH.md.planning/phases/05-benchmark-pipeline/05-VALIDATION.md.planning/phases/05-benchmark-pipeline/05-VERIFICATION.md.planning/phases/06-pr-feedback/06-01-PLAN.md.planning/phases/06-pr-feedback/06-CONTEXT.md.planning/phases/06-pr-feedback/06-RESEARCH.md.planning/phases/06-pr-feedback/06-VALIDATION.md.planning/research/ARCHITECTURE.md.planning/research/FEATURES.md.planning/research/PITFALLS.md.planning/research/STACK.md.planning/research/SUMMARY.mddocs/visualize_benchmarks.py
💤 Files with no reviewable changes (4)
- .benchmarks/Darwin-CPython-3.11-64bit/0001_baseline.json
- .benchmarks/Darwin-CPython-3.11-64bit/0002_perf_analysis.json
- .github/workflows/tests.yml
- docs/visualize_benchmarks.py
| # CI-04: Release/tag events do NOT get a separate benchmark run. Every push to | ||
| # main updates the gh-pages dashboard, so releases inherit the latest baseline. |
There was a problem hiding this comment.
Contradiction between CI-04 documentation and REQUIREMENTS.md.
The workflow comment states release/tag events do NOT get a separate benchmark run, but .planning/REQUIREMENTS.md line 15 marks CI-04: Release/tag events trigger a benchmark snapshot on gh-pages as complete. One of these needs to be corrected to maintain consistency.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/benchmark.yml around lines 13 - 14, There's a
contradiction between the CI-04 description in the benchmark workflow header and
the CI-04 entry in REQUIREMENTS.md; pick the intended behavior and make them
consistent: either update the CI-04 comment in the benchmark workflow file to
state that release/tag events DO trigger a benchmark snapshot on gh-pages, or
update the CI-04 entry in REQUIREMENTS.md to mark that release/tag events do NOT
get a separate benchmark run. Locate the CI-04 comment in the workflow header
and the CI-04 line in REQUIREMENTS.md and edit the one that is incorrect so both
documents state the same behavior.
| <execution_context> | ||
| @/Users/fzills/.claude/get-shit-done/workflows/execute-plan.md | ||
| @/Users/fzills/.claude/get-shit-done/templates/summary.md | ||
| </execution_context> |
There was a problem hiding this comment.
Remove machine-specific absolute paths from the committed plan.
These paths leak the local username/workstation layout and are unusable for anyone else reading or replaying the plan. Please replace them with repo-relative references or plain document identifiers instead of committing /Users/... paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/phases/05-benchmark-pipeline/05-01-PLAN.md around lines 52 - 55,
The execution_context block contains machine-specific absolute paths (the
strings starting with /Users/...) which must be replaced with repo-relative
references or plain document identifiers; edit the execution_context entry to
remove those absolute paths and use repo-relative paths like
./workflows/execute-plan.md and ./templates/summary.md or simple identifiers
(e.g., "workflows/execute-plan" and "templates/summary") so the plan contains no
user-specific filesystem information.
|
|
||
| The user explicitly decided against release-triggered benchmark runs -- every main push updates the dashboard, so releases just link to the existing data. CI-04 is satisfied by documentation rather than a separate release workflow trigger. | ||
|
|
||
| **Primary recommendation:** Use `workflow_run` trigger with `actions/download-artifact@v4` (cross-workflow download via `run-id`) to decouple benchmarks from the test matrix, then `benchmark-action/github-action-benchmark@v1` with `auto-push: true` to commit results to gh-pages. |
There was a problem hiding this comment.
Remove the obsolete artifact-handoff architecture from this research doc.
The current recommendation says to pass benchmark JSON across workflows, but the later “IMPORTANT DECISION” section and the actual cleaned tests.yml both say benchmark.yml runs benchmarks directly. Keeping both models in the same document makes the research guidance self-contradictory and easy to implement incorrectly.
Also applies to: 70-72, 89-101
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/phases/05-benchmark-pipeline/05-RESEARCH.md at line 13, Remove the
obsolete "artifact-handoff" architecture text from the research doc: delete or
rewrite the "Primary recommendation" paragraph that advises passing benchmark
JSON across workflows (the sentence starting "Use `workflow_run` trigger with
`actions/download-artifact@v4`...") and any other conflicting passages that
describe cross-workflow artifact handoff (also remove the blocks referenced
around lines 70-72 and 89-101). Replace those with a clear single recommendation
that `benchmark.yml` runs benchmarks directly (matching the "IMPORTANT DECISION"
and the cleaned `tests.yml`), and mention using
`benchmark-action/github-action-benchmark@v1` with `auto-push: true` to publish
results to gh-pages if needed; ensure the document no longer presents both
models side-by-side.
| | 1 | Pushing to main triggers a benchmark workflow after tests pass | VERIFIED | benchmark.yml: `workflow_run` on `["Tests"]`, `types: [completed]`, `branches: [main]`, condition `github.event.workflow_run.conclusion == 'success'` | | ||
| | 2 | Benchmark results are auto-pushed to gh-pages at /dev/bench/ | VERIFIED | benchmark.yml: `auto-push: true`, `gh-pages-branch: gh-pages`, `benchmark-data-dir-path: dev/bench` | | ||
| | 3 | Opening or updating a PR does NOT trigger benchmark workflow | VERIFIED | benchmark.yml only has `workflow_run` trigger with `branches: [main]` -- no `pull_request` trigger | | ||
| | 4 | Release/tag behavior is documented (no separate trigger; main pushes cover it) | VERIFIED | benchmark.yml lines 7-8: CI-04 comment explaining releases inherit latest baseline | |
There was a problem hiding this comment.
This verification report is stale against the workflow shipped in this branch.
The current .github/workflows/benchmark.yml already includes a pull_request trigger and a PR-only comparison step, so statements like “PR does NOT trigger benchmark workflow” and “benchmark.yml only has workflow_run” are no longer true. Please either snapshot this as a historical Phase 5 artifact or update the verified truths/status so the report matches the actual branch contents.
Also applies to: 35-36, 43-45, 50-53
🧰 Tools
🪛 LanguageTool
[uncategorized] ~22-~22: The official name of this software platform is spelled with a capital “H”.
Context: ...pleted], branches: [main], condition github.event.workflow_run.conclusion == 'succe...
(GITHUB)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/phases/05-benchmark-pipeline/05-VERIFICATION.md around lines 22 -
25, The verification doc is out of date: update
.planning/phases/05-benchmark-pipeline/05-VERIFICATION.md so its statements
about the benchmark workflow match the current .github/workflows/benchmark.yml
(which now includes a pull_request trigger and a PR-only comparison step) —
either mark the listed items (e.g., the “PR does NOT trigger benchmark workflow”
entry and related lines referenced in the diff) as historical snapshots, or
change their VERIFIED status and text to reflect that benchmark.yml contains a
pull_request trigger and PR-only comparison; apply the same updates to the other
affected sections called out (lines referenced as 35-36, 43-45, 50-53) so the
document consistently matches the branch behavior.
| - Add `pull_request` trigger (opened + synchronize) to existing `benchmark.yml` — same file handles both main and PR flows | ||
| - Main pushes continue using `workflow_run` trigger (existing Phase 5 behavior) | ||
| - PR benchmarks run independently — do NOT wait for tests to pass | ||
| - Concurrency cancel per PR number: `concurrency: { group: 'benchmark-${{ github.event.pull_request.number }}', cancel-in-progress: true }` | ||
| - Use `if` conditions on `github.event_name` to distinguish main vs PR behavior at the step level |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "*.yml" -o -name "*.yaml" | grep -i benchmark | head -20Repository: zincware/asebytes
Length of output: 94
🌐 Web query:
GitHub Actions github.event.pull_request.number undefined workflow_run trigger
💡 Result:
github.event.pull_request.number is only populated for workflows triggered by pull_request / pull_request_target. When your workflow is triggered by workflow_run, the event payload is a workflow_run payload, so github.event.pull_request is typically undefined. [1]
If the completed run was associated with a PR
Use the PR list attached to the run:
${{ github.event.workflow_run.pull_requests[0].number }}The workflow-run payload (and the workflow-runs API response) exposes a pull_requests array for runs whose head_sha/head_branch matches an open PR. [2]
If pull_requests is empty (common with forks / some setups)
This is a known limitation: workflow_run.pull_requests can be empty (especially for PRs from forks), so you must pass the PR number yourself (artifact/output) from the first workflow or resolve it via API using the run’s head_sha. [3]
Practical pattern: In workflow A (triggered by pull_request), save ${{ github.event.pull_request.number }} as an artifact; in workflow B (workflow_run), download it and use that PR number. GitHub’s own docs show this “upload PR number then use it in workflow_run” approach. [1]
Sources
[1] GitHub Docs – Events that trigger workflows (workflow_run example passing PR number via artifact) (docs.github.com)
[2] Workflow run fields (pull_requests array) (github-provider.stackql.io)
[3] GitHub Community – workflow_run.pull_requests empty for fork PRs / limitations (github.com)
Citations:
- 1: https://docs.github.com/enterprise-server%403.16/actions/automating-your-workflow-with-github-actions/events-that-trigger-workflows
- 2: https://github-provider.stackql.io/services/actions/workflow_runs/?utm_source=openai
- 3: https://github.com/orgs/community/discussions/170143?utm_source=openai
🌐 Web query:
GitHub Actions concurrency group mixed triggers pull_request workflow_run
💡 Result:
concurrency.group is just a string (or expression) evaluated per workflow run, regardless of which trigger (pull_request, workflow_run, etc.) started it. If the expression resolves to different values across triggers, the runs won’t share a concurrency bucket and therefore won’t cancel/serialize each other. (docs.github.com)
Why this bites with pull_request + workflow_run
- For
pull_request, people often usegithub.ref/github.head_ref. Butgithub.refis typicallyrefs/pull/<n>/merge, not the branch name. - For
workflow_run, the triggered workflow runs using the workflow file from the default branch, andgithub.refis commonly that default-branch ref—not the PR ref—sogithub.refno longer matches what you got underpull_request. (stackoverflow.com)
A robust pattern: group by PR number (else head branch)
Use the PR number when available; otherwise fall back to the originating branch from the workflow_run payload:
concurrency:
group: ci-${{
github.event.pull_request.number
|| github.event.workflow_run.pull_requests[0].number
|| format('{0}:{1}',
github.event.workflow_run.head_repository.owner.login,
github.event.workflow_run.head_branch)
|| github.head_ref
|| github.ref_name
}}
cancel-in-progress: trueNotes:
github.event.workflow_run.head_branchis the branch that triggered the upstream workflow. (stackoverflow.com)workflow_run.pull_requests[0]is convenient when populated, but it may be missing/empty in some cases (notably fork-related scenarios), so keep a fallback (likeowner:branch). (github.com)- The expression uses standard contexts/expressions supported in workflow syntax. (docs.github.com)
Citations:
- 1: https://docs.github.com/actions/concepts/workflows-and-actions/concurrency?utm_source=openai
- 2: https://stackoverflow.com/questions/63343937/how-to-use-the-github-actions-workflow-run-event?utm_source=openai
- 3: https://stackoverflow.com/questions/63343937/how-to-use-the-github-actions-workflow-run-event?utm_source=openai
- 4: https://github.com/orgs/community/discussions/25220?utm_source=openai
- 5: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/accessing-contextual-information-about-workflow-runs?ref=engineering-empathy-co&utm_source=openai
🏁 Script executed:
cat -n .github/workflows/benchmark.yml | head -60Repository: zincware/asebytes
Length of output: 2220
🏁 Script executed:
wc -l .github/workflows/benchmark.ymlRepository: zincware/asebytes
Length of output: 96
Use a fallback expression for the concurrency group to handle mixed triggers safely.
The guidance in lines 19–20 prescribes using only github.event.pull_request.number in the concurrency group, but this workflow also listens to workflow_run. The github.event.pull_request.number context is undefined for workflow_run events, causing all non-PR runs to share the same empty concurrency key and unintentionally cancel each other. Use a fallback that works across both triggers:
Safer pattern
-- Concurrency cancel per PR number: `concurrency: { group: 'benchmark-${{ github.event.pull_request.number }}', cancel-in-progress: true }`
+- Concurrency cancel per event: `concurrency: { group: 'benchmark-${{ github.event.pull_request.number || github.sha }}', cancel-in-progress: true }`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - Add `pull_request` trigger (opened + synchronize) to existing `benchmark.yml` — same file handles both main and PR flows | |
| - Main pushes continue using `workflow_run` trigger (existing Phase 5 behavior) | |
| - PR benchmarks run independently — do NOT wait for tests to pass | |
| - Concurrency cancel per PR number: `concurrency: { group: 'benchmark-${{ github.event.pull_request.number }}', cancel-in-progress: true }` | |
| - Use `if` conditions on `github.event_name` to distinguish main vs PR behavior at the step level | |
| - Add `pull_request` trigger (opened + synchronize) to existing `benchmark.yml` — same file handles both main and PR flows | |
| - Main pushes continue using `workflow_run` trigger (existing Phase 5 behavior) | |
| - PR benchmarks run independently — do NOT wait for tests to pass | |
| - Concurrency cancel per event: `concurrency: { group: 'benchmark-${{ github.event.pull_request.number || github.sha }}', cancel-in-progress: true }` | |
| - Use `if` conditions on `github.event_name` to distinguish main vs PR behavior at the step level |
🧰 Tools
🪛 LanguageTool
[uncategorized] ~21-~21: The official name of this software platform is spelled with a capital “H”.
Context: ...gress: true }- Useifconditions ongithub.event_name` to distinguish main vs PR b...
(GITHUB)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/phases/06-pr-feedback/06-CONTEXT.md around lines 17 - 21, The
concurrency group currently uses 'benchmark-${{ github.event.pull_request.number
}}' which is undefined for workflow_run events and causes unrelated runs to
collide; change the concurrency group to use a fallback expression so when
github.event.pull_request.number is missing it falls back to a stable per-run
identifier (for example github.run_id or github.ref/sha). Update the
concurrency: group value to incorporate that fallback (keeping the 'benchmark-'
prefix) so both pull_request and workflow_run triggers produce unique,
non-conflicting keys.
| - [x] **CI-01**: gh-pages branch exists with GitHub Pages enabled serving benchmark dashboard | ||
| - [x] **CI-02**: Post-matrix benchmark job runs github-action-benchmark for a single Python version (latest) | ||
| - [x] **CI-03**: Auto-push to gh-pages only on main branch pushes, not PRs | ||
| - [x] **CI-04**: Release/tag events trigger a benchmark snapshot on gh-pages |
There was a problem hiding this comment.
CI-04 marked complete but not implemented.
The requirement states "Release/tag events trigger a benchmark snapshot on gh-pages" but benchmark.yml has no release or push.tags trigger. The workflow comments explicitly state releases do NOT get separate benchmark runs. Either update the requirement status to pending or add the tag trigger to the workflow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/REQUIREMENTS.md at line 15, The CI-04 check is marked complete but
the benchmark workflow lacks release/tag triggers; either update the
REQUIREMENTS.md entry for CI-04 to "pending" or modify the benchmark workflow
(benchmark.yml) to include release and tag push triggers (add an on: release:
types: [published] and/or on: push: tags: ['*']) so that release/tag events will
run the benchmark and publish the snapshot to gh-pages; update the CI-04 status
in .planning/REQUIREMENTS.md if you choose not to change the workflow.
| ### Decision 3: `auto-push: true` only on main branch pushes | ||
|
|
||
| # Padded (explicit opt-in via extension suffix) | ||
| _RegistryEntry("pattern", "*.h5p", "object", "asebytes.columnar", "PaddedColumnarBackend", ...), | ||
| _RegistryEntry("pattern", "*.zarrp", "object", "asebytes.columnar", "PaddedColumnarBackend", ...), | ||
| **Why:** On PRs, you want comparison comments but should NOT push results to gh-pages (PR benchmarks are noisy, transient, and would pollute the historical record). Use a conditional: | ||
|
|
||
| # H5MD (spec-compliant, always padded) | ||
| _RegistryEntry("pattern", "*.h5md", "object", "asebytes.h5md", "H5MDBackend", ...), | ||
| ] | ||
| ```yaml | ||
| auto-push: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }} | ||
| ``` | ||
|
|
||
| **Rationale for `.h5p`/`.zarrp` over `.h5-padded`:** File extensions with hyphens break shell glob patterns and confuse some filesystem tools. Single-suffix extensions are conventional. The `p` suffix is short for "padded" and unambiguous in context. However, the exact naming is a user decision -- what matters architecturally is that padded and ragged are separate registry entries pointing to separate backend classes. | ||
|
|
||
| **Alternative considered: parameter-based dispatch** (`ASEIO("data.h5", strategy="padded")`). Rejected because: (1) the registry is extension-based and adding constructor parameters would require registry protocol changes, (2) a file's storage strategy is an intrinsic property of the file, not a user preference at open time, (3) re-opening a file should auto-detect its strategy without user knowledge. | ||
|
|
||
| ## Patterns to Follow | ||
|
|
||
| ### Pattern 1: Template Method for Columnar Backends | ||
|
|
||
| **What:** `BaseColumnarBackend` implements the full `ReadWriteBackend` contract. Subclasses override only the per-atom-specific methods. | ||
|
|
||
| **When:** Any columnar backend (ragged, padded, H5MD). | ||
|
|
||
| **Why:** The current ColumnarBackend and H5MDBackend duplicate ~300 lines of scalar column handling, JSON serialization, and postprocessing. A Template Method base class eliminates this. | ||
|
|
||
| ```python | ||
| class BaseColumnarBackend(ReadWriteBackend[str, Any], ABC): | ||
| """Shared logic for all columnar backends.""" | ||
|
|
||
| _returns_mutable: bool = True | ||
|
|
||
| # --- Concrete methods (shared) --- | ||
|
|
||
| def _postprocess(self, val, col_name, *, is_per_atom=False): | ||
| """Shared read postprocessing: NaN->None, JSON decode, numpy scalar unwrap.""" | ||
| ... | ||
|
|
||
| def _prepare_scalar_column(self, values): | ||
| """Shared scalar column serialization.""" | ||
| ... | ||
|
|
||
| def _write_scalar_columns(self, keys, batch_values): | ||
| """Write non-per-atom columns to store.""" | ||
| ... | ||
|
|
||
| def _classify_columns(self, data): | ||
| """Determine which columns are per-atom vs scalar.""" | ||
| ... | ||
|
|
||
| def _serialize_value(self, val): | ||
| """JSON-encode dicts/lists/strings.""" | ||
| ... | ||
|
|
||
| # --- Abstract methods (per-atom strategy) --- | ||
|
|
||
| @abstractmethod | ||
| def _write_per_atom_columns(self, keys, batch_values, n_atoms_list): | ||
| """Write per-atom columns using strategy-specific layout.""" | ||
| ... | ||
|
|
||
| @abstractmethod | ||
| def _read_per_atom_value(self, col_name, index): | ||
| """Read a single per-atom value for one frame.""" | ||
| ... | ||
|
|
||
| @abstractmethod | ||
| def _read_per_atom_bulk(self, col_name, indices): | ||
| """Read per-atom values for multiple frames.""" | ||
| ... | ||
| PR runs still get `comment-on-alert: true` which compares against the last stored result and comments on the PR if regression is detected. |
There was a problem hiding this comment.
Inconsistency in trigger strategy between documents.
This section describes using github.event_name == 'push' && github.ref == 'refs/heads/main' for conditional auto-push, implying the workflow runs on direct push events to main.
However, .planning/phases/06-pr-feedback/06-01-PLAN.md describes a different approach:
- Main branch benchmarks trigger via
workflow_run(after Tests workflow completes) - PR benchmarks trigger via
pull_requestevent - Conditions check
github.event_name == 'workflow_run'vsgithub.event_name == 'pull_request'
These are two different architectural approaches. Clarify which approach is being implemented, or update this document to reflect the workflow_run-based approach described in the Phase 6 plan.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/research/ARCHITECTURE.md around lines 112 - 120, The
ARCHITECTURE.md section uses an `auto-push` conditional of `github.event_name ==
'push' && github.ref == 'refs/heads/main'` which conflicts with the Phase 6 plan
that uses `workflow_run` for main-branch benchmarks and `pull_request` for PRs;
update the ARCHITECTURE.md text to match the Phase 6 approach (or vice versa) by
replacing the current `auto-push` conditional with the `workflow_run`-based
logic (`github.event_name == 'workflow_run'` for main branch runs triggered
after Tests and `pull_request` for PR runs) and explicitly document that
`comment-on-alert: true` remains enabled for PR comparisons so readers can trace
the `workflow_run` vs `pull_request` trigger strategy consistently with the
Phase 6 plan.
| ### Decision 4: Keep existing visualize_benchmarks.py + artifact uploads | ||
|
|
||
| **Why:** The PNGs serve a different purpose (static per-run snapshots in artifact archives). The github-action-benchmark dashboard provides historical trends. Both are valuable. No reason to remove existing functionality. | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if visualize_benchmarks.py exists in the repository
fd -t f "visualize_benchmarks.py"Repository: zincware/asebytes
Length of output: 43
🏁 Script executed:
# Check git history for visualize_benchmarks.py
git log --oneline --all -- "*visualize_benchmarks.py" 2>/dev/null || echo "No git history for this file"Repository: zincware/asebytes
Length of output: 229
🏁 Script executed:
# Search for references to visualize_benchmarks in the codebase
rg "visualize_benchmarks" --type md --type py --type yaml --type txtRepository: zincware/asebytes
Length of output: 134
🏁 Script executed:
# Read the ARCHITECTURE.md file to see the exact context of Decision 4
cat -n ".planning/research/ARCHITECTURE.md" | sed -n '115,135p'Repository: zincware/asebytes
Length of output: 1446
🏁 Script executed:
ls -la docs/ | grep -i visualizeRepository: zincware/asebytes
Length of output: 43
🏁 Script executed:
# Check if docs/visualize_benchmarks.py exists
test -f "docs/visualize_benchmarks.py" && echo "File exists" || echo "File does not exist"Repository: zincware/asebytes
Length of output: 80
🏁 Script executed:
# Get more context from docs/README.md about the visualize_benchmarks reference
grep -n "visualize_benchmarks" docs/README.md -B 2 -A 2Repository: zincware/asebytes
Length of output: 225
Fix Decision 4: Visualization script was removed, not kept.
Decision 4 states "Keep existing visualize_benchmarks.py + artifact uploads," but the file docs/visualize_benchmarks.py does not exist in the repository. Additionally, docs/README.md still contains a reference to this non-existent script (uv run docs/visualize_benchmarks.py benchmark_results.json).
Either restore the visualization script if it should be kept, or rewrite Decision 4 to reflect that the visualization script and its documentation have been removed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/research/ARCHITECTURE.md around lines 122 - 125, Decision 4 in
ARCHITECTURE.md claims we kept visualize_benchmarks.py, but the file
docs/visualize_benchmarks.py is missing and docs/README.md still references
running it; either restore the script and its artifact-upload behavior and add
it back at docs/visualize_benchmarks.py (ensuring it accepts
benchmark_results.json as the CLI arg and is documented in docs/README.md), or
change Decision 4 text in .planning/research/ARCHITECTURE.md and remove or
update the docs/README.md reference so both documentation and architecture
reflect that the visualization script was removed; update whichever files you
change to keep the repository consistent (Decision 4, docs/README.md, and
presence or absence of docs/visualize_benchmarks.py).
| 5. **Backend-specific optimizations** (MongoDB TTL, Redis Lua) - These are validated wins (1.9-3.5x) but lower priority than correctness. | ||
| | Existing Asset | How It Feeds New Features | | ||
| |----------------|--------------------------| | ||
| | `--benchmark-json=benchmark_results.json` in CI (tests.yml line 61) | Direct input to github-action-benchmark `output-file-path` | |
There was a problem hiding this comment.
Stale reference: benchmark JSON is no longer generated in tests.yml.
The reference to tests.yml line 61 is outdated. Per the relevant code snippet, benchmark steps have been moved to the new benchmark.yml workflow. Consider updating this line to reference the new location.
📝 Proposed fix
-| `--benchmark-json=benchmark_results.json` in CI (tests.yml line 61) | Direct input to github-action-benchmark `output-file-path` |
+| `--benchmark-json=benchmark_results.json` in CI (benchmark.yml line 81) | Direct input to github-action-benchmark `output-file-path` |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | `--benchmark-json=benchmark_results.json` in CI (tests.yml line 61) | Direct input to github-action-benchmark `output-file-path` | | |
| | `--benchmark-json=benchmark_results.json` in CI (benchmark.yml line 81) | Direct input to github-action-benchmark `output-file-path` | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/research/FEATURES.md at line 84, Update the stale reference in the
FEATURES.md line that mentions "`--benchmark-json=benchmark_results.json` in CI
(tests.yml line 61)`" so it points to the new workflow location: replace the
`tests.yml line 61` reference with the appropriate `benchmark.yml` workflow (or
remove the line-number mention and reference `benchmark.yml`/the benchmark
workflow step), ensuring the text still mentions
`--benchmark-json=benchmark_results.json` as the direct input to
github-action-benchmark `output-file-path`.
| | 5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | - | | ||
| | 6. PR Feedback | v0.3.1 | 0/1 | Not started | - | | ||
| | 7. Dashboard and README | v0.3.1 | 0/? | Not started | - | |
There was a problem hiding this comment.
Table row misalignment in Phase 5 progress entry.
Line 87 is missing the Milestone column value and has columns shifted. Compare to the consistent format in lines 88-89.
📝 Proposed fix
-| 5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | - |
+| 5. Benchmark Pipeline | v0.3.1 | 1/1 | Complete | 2026-03-09 |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | 5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | - | | |
| | 6. PR Feedback | v0.3.1 | 0/1 | Not started | - | | |
| | 7. Dashboard and README | v0.3.1 | 0/? | Not started | - | | |
| | 5. Benchmark Pipeline | v0.3.1 | 1/1 | Complete | 2026-03-09 | | |
| | 6. PR Feedback | v0.3.1 | 0/1 | Not started | - | | |
| | 7. Dashboard and README | v0.3.1 | 0/? | Not started | - | |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.planning/ROADMAP.md around lines 87 - 89, The "Benchmark Pipeline" (Phase
5) table row is missing its Milestone cell, causing column misalignment; update
the Phase 5 row (the line containing "5. Benchmark Pipeline | 1/1 | Complete |
2026-03-09 | -") to include the Milestone value (match the format used by lines
with "v0.3.1") so all pipes align and the table columns are consistent.
…terations write_single tests created the DB once and appended in each benchmark iteration, causing quadratic slowdown (2h+ hangs on CI). Two fixes: - Create fresh DB per iteration (matches write_trajectory pattern) - Cap to 10 frames (per-row overhead is the signal, not throughput) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/benchmarks/test_bench_write.py (2)
27-30:⚠️ Potential issue | 🟠 MajorClean up local benchmark artifacts per invocation.
These UUID-based paths fix cross-run reuse, but every benchmark round now leaves another file/store behind until
tmp_pathis torn down. On the larger datasets,pytest-benchmarkwill callfn()many times, so later samples run against a progressively dirtier filesystem and can start burning a lot of disk in CI.Please add per-iteration teardown for the local targets, or reuse a single cleaned target between iterations so the benchmark stays measuring write cost rather than accumulated temp data.
Also applies to: 39-42, 51-54, 95-99, 110-113, 122-124, 133-137, 157-161, 171-175, 185-189, 235-239, 251-255, 265-268, 278-282
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/benchmarks/test_bench_write.py` around lines 27 - 30, The benchmark helper fn currently creates UUID-named LMDB stores via tmp_path and ASEIO and never removes them, causing accumulation across iterations; modify fn (and the other similar helpers noted) to either (A) create a single SAEIO/LMDB target once outside the benchmark loop and clear/truncate its contents between iterations, or (B) after db.extend(frames) call the ASEIO close/cleanup API and remove the created path (tmp_path / f"wt_{name}_lmdb_...") so each invocation deletes its artifact; ensure you reference the ASEIO instance (db), call its proper close/flush method, and unlink/rmdir the created filesystem path (or reuse a shared path created before benchmarking) for all the functions that create tmp_path/ASEIO stores.
64-72:⚠️ Potential issue | 🟠 MajorMake MongoDB/Redis teardown failure-safe.
Right now cleanup only runs after
benchmark(fn), and the handle is appended todbsonly after the writes finish. If a benchmark iteration fails partway through, the collection/prefix is leaked and can pollute later CI runs.Please append the handle immediately after construction and wrap
benchmark(fn)intry/finallyso teardown always runs.Suggested pattern
dbs = [] def fn(): uri = f"{mongo_uri}/bench_ws_{name}_{uuid.uuid4().hex[:8]}" db = ASEIO(uri) + dbs.append(db) for mol in frames: db.extend([mol]) - dbs.append(db) - benchmark(fn) - for db in dbs: - db.remove() + try: + benchmark(fn) + finally: + for db in dbs: + db.remove()Also applies to: 80-88, 200-209, 218-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/benchmarks/test_bench_write.py` around lines 64 - 72, The teardown can leak resources if a benchmark iteration fails because db handles are only appended to dbs after writes complete and benchmark(fn) isn't guaranteed to run cleanup; modify the fn closure so it constructs the ASEIO instance and appends it to dbs immediately after creation (right after db = ASEIO(uri)) and then perform db.extend(frames); also wrap the call to benchmark(fn) in a try/finally where the finally iterates over dbs and calls db.remove() to ensure teardown always runs; apply the same change to the other similar blocks referenced (lines handling the same pattern).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/benchmarks/test_bench_write.py`:
- Around line 27-30: The benchmark helper fn currently creates UUID-named LMDB
stores via tmp_path and ASEIO and never removes them, causing accumulation
across iterations; modify fn (and the other similar helpers noted) to either (A)
create a single SAEIO/LMDB target once outside the benchmark loop and
clear/truncate its contents between iterations, or (B) after db.extend(frames)
call the ASEIO close/cleanup API and remove the created path (tmp_path /
f"wt_{name}_lmdb_...") so each invocation deletes its artifact; ensure you
reference the ASEIO instance (db), call its proper close/flush method, and
unlink/rmdir the created filesystem path (or reuse a shared path created before
benchmarking) for all the functions that create tmp_path/ASEIO stores.
- Around line 64-72: The teardown can leak resources if a benchmark iteration
fails because db handles are only appended to dbs after writes complete and
benchmark(fn) isn't guaranteed to run cleanup; modify the fn closure so it
constructs the ASEIO instance and appends it to dbs immediately after creation
(right after db = ASEIO(uri)) and then perform db.extend(frames); also wrap the
call to benchmark(fn) in a try/finally where the finally iterates over dbs and
calls db.remove() to ensure teardown always runs; apply the same change to the
other similar blocks referenced (lines handling the same pattern).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9f34ee4b-344e-4f93-8e75-ef1597d6ec79
📒 Files selected for processing (1)
tests/benchmarks/test_bench_write.py
Summary by CodeRabbit
New Features
Chores
Documentation
Tests