Skip to content

Gsd/auto bench#12

Merged
PythonFZ merged 28 commits intomainfrom
gsd/auto-bench
Mar 9, 2026
Merged

Gsd/auto bench#12
PythonFZ merged 28 commits intomainfrom
gsd/auto-bench

Conversation

@PythonFZ
Copy link
Copy Markdown
Member

@PythonFZ PythonFZ commented Mar 9, 2026

Summary by CodeRabbit

  • New Features

    • CI benchmark pipeline: baseline storage on main, PR comparisons with regression alerting and summary reporting.
  • Bug Fixes

    • Enforced per-test isolation in backend tests to prevent cross-test data leakage; contract suite stability improved.
  • Chores

    • Removed legacy local benchmark visualization and local benchmark cache; added ignore rule for benchmark artifacts.
  • Documentation

    • Added planning, roadmap, validation, and research docs describing the benchmark pipeline and PR feedback strategy.

PythonFZ and others added 20 commits March 9, 2026 15:43
- Create benchmark.yml with workflow_run trigger on Tests+main
- Single Python 3.13 job with MongoDB+Redis services
- github-action-benchmark auto-push to gh-pages at dev/bench
- Document CI-01 (gh-pages setup) and CI-04 (release strategy)
- Add .benchmarks/ to .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… files

- Remove Run benchmarks, Visualize benchmarks, Upload benchmark results steps
- Delete docs/visualize_benchmarks.py (superseded by gh-pages dashboard)
- Delete .benchmarks/ local cache directory

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add 05-01-SUMMARY.md with execution results
- Update STATE.md with plan completion and decisions
- Update ROADMAP.md with phase 5 progress
- Mark CI-01 through CI-04 requirements complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pull_request trigger (opened, synchronize) alongside workflow_run
- Add concurrency group to cancel in-progress PR benchmark runs
- Split benchmark-action into main (auto-push) and PR (compare-only) steps
- PR step: summary-always, comment-on-alert, fail-on-alert at 150% threshold
- PR step: auto-push false, save-data-file false to avoid polluting gh-pages
- Document PR behavior and branch protection setup in header comments

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…terations

write_single tests created the DB once and appended in each benchmark
iteration, causing quadratic slowdown (2h+ hangs on CI). Two fixes:
- Create fresh DB per iteration (matches write_trajectory pattern)
- Cap to 10 frames (per-row overhead is the signal, not throughput)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

Warning

Rate limit exceeded

@PythonFZ has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63dabcd2-c1d9-48f3-96bf-7496598f1600

📥 Commits

Reviewing files that changed from the base of the PR and between 7e37111 and 4848760.

📒 Files selected for processing (4)
  • .planning/STATE.md
  • .planning/quick/1-make-mongodb-backend-cache-ttl-configura/1-SUMMARY.md
  • src/asebytes/mongodb/_backend.py
  • tests/test_mongodb.py
📝 Walkthrough

Walkthrough

This PR implements a CI benchmarking infrastructure: removes legacy local benchmark artifacts, adds a new GitHub Actions benchmark workflow triggered after tests (and for PRs), updates .gitignore and planning docs for benchmark phases, adjusts benchmark tests for isolation, and adds per-test UUID group isolation for stateful backends.

Changes

Cohort / File(s) Summary
Workflows
\.github/workflows/benchmark.yml, \.github/workflows/tests.yml
Add new benchmark.yml job (runs after Tests or on pull_request) with Redis/MongoDB services, Python 3.13 setup, pytest-benchmark execution, and github-action-benchmark steps for storing (main) and comparing (PR) results; remove legacy benchmark steps from tests.yml.
Benchmarks artifacts & visualization
\.benchmarks/.../0001_baseline.json, \.benchmarks/.../0002_perf_analysis.json, docs/visualize_benchmarks.py
Delete large local benchmark JSON blobs and the visualization script that parsed benchmark JSON and produced charts/stats.
VCS ignores
\.gitignore
Add .benchmarks/ to ignore rules to prevent committing local benchmark caches.
Planning & roadmaps
.planning/PROJECT.md, .planning/STATE.md, .planning/REQUIREMENTS.md, .planning/ROADMAP.md
Add and update planning, requirements, state, and roadmap documents to reflect v0.3.1 Phase 5–8 benchmark pipeline work and related milestones.
Phase 5 benchmark docs
.planning/phases/05-benchmark-pipeline/...
Add comprehensive Phase 5 plan, context, research, validation, summary, and verification artifacts describing the decoupled benchmark workflow, triggers, and acceptance criteria.
Phase 6 PR feedback docs
.planning/phases/06-pr-feedback/...
Add Phase 6 planning/research/validation for PR-level benchmark comparisons, summary behavior, concurrency, and fail-on-regression gating (150% threshold).
Phase 8 test isolation docs
.planning/phases/08-fix-failing-tests-in-redis-mongo-backends-test-isolation/...
Add plans, context, research, and validation for per-test UUID-based group isolation across stateful backends.
Research & stack docs
.planning/research/ARCHITECTURE.md, FEATURES.md, PITFALLS.md, STACK.md, SUMMARY.md
Reframe research docs from backend refactor orientation to CI benchmarking infrastructure, covering architecture, pitfalls (fork PR tokens, gh-pages), stack recommendations, and feature priorities.
Test changes
tests/benchmarks/test_bench_write.py
Add WRITE_SINGLE_FRAMES = 10; limit frames per single-write benchmark, create per-benchmark unique temp paths/DB handles, and perform explicit cleanup to improve isolation.
Contract tests fixtures
tests/contract/conftest.py
Pass a generated per-test group (format test_<8hex>) into all six facade/backend constructors to isolate stateful backends (UUID-based group names).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant TestsWorkflow as Tests Workflow
participant BenchmarkWorkflow as Benchmark Workflow (GH Actions)
participant Runner as GitHub Runner
participant Services as Redis / MongoDB
participant Pytest as pytest-benchmark
participant BenchAction as github-action-benchmark
participant GhPages as GitHub Pages (gh-pages)

TestsWorkflow->>Runner: run tests (matrix)
Runner->>TestsWorkflow: upload artifacts (benchmark inputs)
note over TestsWorkflow,BenchmarkWorkflow: workflow_run trigger on Tests success
TestsWorkflow->>BenchmarkWorkflow: workflow_run event
BenchmarkWorkflow->>Runner: provision job (ubuntu, Python 3.13)
Runner->>Services: start Redis / MongoDB (healthchecks)
Runner->>Pytest: run pytest with benchmark suite, produce JSON
Pytest-->>Runner: benchmark JSON artifact
Runner->>BenchAction: run comparison/store step
alt main branch
    BenchAction->>GhPages: push results (auto-push)
else pull request
    BenchAction->>BenchmarkWorkflow: compare vs baseline, post PR summary/comment, fail-on-alert if threshold exceeded
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • feat: unified group parameter across all backends #8 — Related: also introduces/uses a unified per-test group parameter and touches backend group/listing behavior; strongly connected to the conftest group isolation changes.
  • Gsd/auto bench #11 — Related: overlaps on benchmark infrastructure work (removal of .benchmarks/ artifacts, new benchmark workflow, viz script removal, and .gitignore updates).

Poem

🐰
I hopped through CI with nimble feet,
Removed old JSON, made workflows neat,
Now benchmarks run after tests complete,
PR alerts and gh-pages dashboards greet,
A little rabbit dance — performance sweet!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Gsd/auto bench' is vague and does not clearly convey the main purpose of the changeset. While the PR does implement automated benchmarking infrastructure, the title uses non-descriptive terms that don't meaningfully summarize the primary changes. Consider renaming to a more descriptive title that captures the main objective, such as 'Add automated CI benchmark pipeline with GitHub Actions' or 'Implement benchmark workflow and test isolation improvements'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gsd/auto-bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/benchmarks/test_bench_write.py (1)

60-72: ⚠️ Potential issue | 🔴 Critical

ASEIO does not expose a close() method—the suggested fix will fail at runtime.

The underlying concern about unclosed connections is valid: remove() only drops the collection without closing the MongoDB/Redis client connections, and with multiple benchmark iterations, connections accumulate.

However, the proposed solution (db.close()) is incorrect because ASEIO does not have a public close() method. Although ASEIO implements the context manager protocol (__enter__ and __exit__), the __exit__ method performs no cleanup.

To fix this properly, either:

  • Call the private backend's close method: db._backend.close() (not ideal; uses internal API)
  • OR modify ASEIO to expose a public close() method that delegates to self._backend.close() (if it exists)
  • OR update ASEIO's __exit__ to call self._backend.close() if the backend provides it

Apply the same pattern to Redis tests and all other affected benchmarks once a solution is chosen.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/benchmarks/test_bench_write.py` around lines 60 - 72, The benchmark
leaks DB connections because ASEIO lacks a public close() and its __exit__ does
not close the backend; update tests and/or ASEIO so connections are closed after
each iteration: either call the backend close method directly in the test
(db._backend.close()) after db.remove(), or better, add a public ASEIO.close()
that delegates to self._backend.close() (and update ASEIO.__exit__ to call
self.close()), then replace test teardown to call db.close() (and db.remove())
for Mongo and analogous calls for Redis and other benchmarks.
🧹 Nitpick comments (1)
.planning/research/ARCHITECTURE.md (1)

1-56: Architecture document describes a more elaborate multi-version setup than implemented.

The current benchmark.yml runs benchmarks on a single Python version (3.13) per requirement CI-02, while this architecture document describes a multi-Python-version approach with artifact aggregation. Consider updating this document to reflect the actual simplified implementation, or add a note clarifying this describes an alternative/future approach.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/research/ARCHITECTURE.md around lines 1 - 56, ARCHITECTURE.md
currently describes a multi-Python-version benchmark aggregation pipeline but
the implemented workflow (benchmark.yml) only runs benchmarks on a single Python
version per CI-02; update ARCHITECTURE.md to either (a) reflect the actual
implemented design by replacing the multi-version architecture diagram and
wording with a concise description of the single-version 3.13 workflow and its
artifact handling, or (b) add a clear “Future work / Alternative design” section
that states the multi-version aggregation diagram is optional and not
implemented, referencing CI-02 and benchmark.yml as the current single-version
behavior. Ensure the file names/symbols "ARCHITECTURE.md", "benchmark.yml", and
"CI-02" are mentioned so reviewers can locate and understand the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.planning/phases/05-benchmark-pipeline/05-01-PLAN.md:
- Around line 7-12: The execution manifest's files_modified list in
.planning/phases/05-benchmark-pipeline/05-01-PLAN.md is missing the .benchmarks/
directory even though Task 2 deletes it; update the files_modified array (the
YAML block listing files_modified) to include ".benchmarks/" so the declared
change set matches the actual operations (also ensure the same addition is made
where the files_modified block is duplicated around the later section referenced
by lines 127-129).

In @.planning/phases/05-benchmark-pipeline/05-CONTEXT.md:
- Around line 16-20: The "Backend selection" phase is internally inconsistent
with the research docs: update the Phase 5 definition under the "Backend
selection" heading to match the research recommendations by removing
service-backed systems from the tracked CI benchmark list (drop MongoDB and
Redis from "All backends in CI benchmarks" and instead mark them as
optional/bench-only with a note that they are excluded from tracked CI noise),
align the single Python version to 3.12 (replace "latest (3.13)" with "3.12"),
and add a short clarification sentence referencing that competitor baselines
(znh5md, extxyz, sqlite, aselmdb) remain included but service-backed systems
require separate, non-tracked runs; ensure the research doc reference or
rationale is cited in the same section so the phase definition is no longer
contradictory.
- Around line 34-39: The doc incorrectly states github-action-benchmark will
auto-create the gh-pages branch; update the wording in the 05-CONTEXT.md
gh-pages setup so it requires pre-creating the gh-pages branch and add a concise
fix: instruct maintainers to create the branch before workflow run (via UI or
git push --set-upstream origin gh-pages) and ensure GitHub Pages source is
configured, and optionally add a note to the workflow using
github-action-benchmark to either verify the branch exists or fail fast with a
clear message referencing gh-pages and github-action-benchmark.

In @.planning/phases/05-benchmark-pipeline/05-RESEARCH.md:
- Line 13: Remove the obsolete artifact-handoff architecture: replace any
primary recommendation that promotes upload-artifact/download-artifact
cross-workflow handoff (e.g., mentions of actions/download-artifact@v4 and
upload-artifact) with a single, consistent recommendation to use the
workflow_run trigger and the benchmark-action/github-action-benchmark@v1 with
auto-push: true to commit results to gh-pages; update or delete the contradicted
paragraphs in the “Primary recommendation” section and any references that
conflict with the “IMPORTANT DECISION” section so the doc exclusively describes
the self-contained benchmark workflow approach (workflow_run + benchmark-action)
and removes the cross-workflow artifact handoff guidance.

In @.planning/phases/05-benchmark-pipeline/05-VALIDATION.md:
- Around line 20-23: The "quick run" instruction uses python -c "import yaml;
..." but assumes PyYAML is present; update the guidance to either (A) wrap the
quick-run command with uv run to ensure the managed env (i.e., use uv run python
-c "import yaml; yaml.safe_load(...)") or (B) add PyYAML to the dev dependencies
in pyproject.toml under the dev group (so pytest/benchmark tooling can import
yaml on a clean checkout) and include a brief rationale that PyYAML is required
for the quick-run check; modify the text for the quick run command and mention
which option was chosen.

In @.planning/phases/05-benchmark-pipeline/05-VERIFICATION.md:
- Around line 3-6: The verification frontmatter currently marks Phase 5 as fully
passed (keys verified/status/score/re_verification) even though the human-only
checks section that lists pending end-to-end validations is not completed;
update the frontmatter so verified=false, status="pending-human-verification"
(or similar), and adjust score to reflect incomplete must-haves until the
human-only checks section is completed and re_verification=true is set after
re-run; ensure the frontmatter and the human-only checks section are kept
consistent so the phase cannot be marked passed before those validations finish.

In @.planning/phases/06-pr-feedback/06-CONTEXT.md:
- Around line 17-21: The PR benchmarks should only run after tests succeed, so
remove the independent pull_request trigger from benchmark.yml and instead
trigger the benchmark workflow for PRs via a workflow_run that listens to the
test workflow completion with conclusion: "success"; keep the existing
workflow_run for main pushes and preserve the concurrency group expression
(concurrency: { group: 'benchmark-${{ github.event.pull_request.number }}',
cancel-in-progress: true }) and any step-level if conditions on
github.event_name so behavior for main vs PR remains distinguishable.

In @.planning/phases/06-pr-feedback/06-VALIDATION.md:
- Around line 39-45: The sign-off criteria conflict with the current task map:
tasks 06-01-01, 06-01-02, and 06-01-03 are all marked as manual but the phase
validation requires automated continuity (no 3 consecutive manual verifies).
Update the task rows in the Job Summary table so at least one of these tasks
includes an automated check (for example set "Automated Command" to the YAML
lint command and mark "Test Type" or "Automated" accordingly for task IDs
06-01-01/06-01-02/06-01-03), or alternatively relax the continuity rule
referenced later by adjusting the requirement that forbids 3 consecutive manual
tasks; ensure the change is applied consistently (also update the similar block
referenced at the later section covering lines 67-72).

In @.planning/research/FEATURES.md:
- Around line 80-90: Remove the deleted PNG tooling entry from the "Existing
Asset" table: delete the row that references docs/visualize_benchmarks.py
generating PNGs (and its note about embedding PNGs in PR comments) so the
FEATURES.md "Existing Asset" section no longer lists a script that Phase 5
removes; keep other rows intact and ensure the table remains syntactically valid
after removing that row.

In @.planning/research/PITFALLS.md:
- Around line 96-100: The guidance mixes two deployment models—branch-based
auto-push (github-action-benchmark pushing gh-pages) and the newer artifact
deployment (actions/deploy-pages@v4)—causing ambiguity; choose one model and
update the doc and Phase 5 workflow accordingly: either standardize on
branch-based deployment by removing references to actions/deploy-pages@v4 and
keep github-action-benchmark pushing the gh-pages branch plus the curl smoke
test and setup instructions, or standardize on actions/deploy-pages@v4 by
removing the gh-pages auto-push steps from the Phase 5 plan and replace them
with artifact build-and-deploy steps using actions/deploy-pages@v4 and the same
curl smoke test; update any mentions of “gh-pages”, “actions/deploy-pages@v4”,
and “github-action-benchmark” to reflect the chosen approach.

In @.planning/research/STACK.md:
- Around line 97-105: The workflow snippet incorrectly references
`.github/workflows/tests.yml`; update the documentation and example to point to
the dedicated benchmarking workflow `.github/workflows/benchmark.yml` so it
matches the Phase 5 plan (i.e., replace mention of tests.yml with benchmark.yml
in the YAML example and any explanatory text), ensuring the new GitHub Actions
step (uses: benchmark-action/github-action-benchmark@v1 with tool: pytest and
output-file-path: benchmark_results.json) is shown as part of `benchmark.yml`
and not `tests.yml`.

In @.planning/research/SUMMARY.md:
- Around line 50-58: The Architecture Approach text incorrectly mixes two
execution models; update the summary to match the chosen model (either the
in-workflow approach using tests.yml with the benchmark job that uses needs:
[test] or the separate workflow approach using benchmark.yml triggered by
workflow_run). Edit the paragraph mentioning "adds a single new `benchmark` job
to the existing `tests.yml` workflow" and the reference to `needs: [test]` (or
alternatively replace with `benchmark.yml` and `workflow_run`) so the
description, list item `#2` ("benchmark job (new, post-matrix)"), and the Phase 5
references consistently describe the same model (in-workflow `tests.yml` +
`needs: [test]` OR standalone `benchmark.yml` + `workflow_run`).

In @.planning/ROADMAP.md:
- Around line 50-57: The Phase 6 acceptance criteria overpromise by stating
every PR will receive "a comment with a full benchmark comparison table", but
the implementation uses "summary-always" for the full table and
"comment-on-alert" to only comment when a regression crosses the threshold;
update the ROADMAP.md Phase 6 section (heading "Phase 6: PR Feedback") to
reflect the actual behavior: change criterion 1 to state the full benchmark
comparison table is available in the Job Summary via "summary-always" and
clarify that "comment-on-alert" only posts a commit comment when a regression
exceeds the configurable alert threshold (default 150%), and adjust success
criteria text to remove the requirement that every PR receives a full-table
comment.
- Line 30: Update the Phase 5 entry in ROADMAP.md (the "Phase 5: Benchmark
Pipeline" checklist item) to remove the claim that benchmark snapshots are
triggered by actual releases/tags and instead state that benchmarks inherit the
latest main baseline and CI-04 is satisfied via documentation; adjust the
wording for items covering lines 36-45 similarly so the roadmap matches the rest
of the PR (no separate tag/release trigger, only auto-push on main with
documentation-based compliance).
- Around line 87-89: The Phase 5 table row has its cells misordered; update the
row for "5. Benchmark Pipeline" so the columns match the header (Phase |
Milestone | Plans Complete | Status | Completed) — e.g. change the current "| 5.
Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | - |" to place a Milestone
(use "-" if none), then "1/1" under Plans Complete, "Complete" under Status, and
the date "2026-03-09" under Completed.

---

Outside diff comments:
In `@tests/benchmarks/test_bench_write.py`:
- Around line 60-72: The benchmark leaks DB connections because ASEIO lacks a
public close() and its __exit__ does not close the backend; update tests and/or
ASEIO so connections are closed after each iteration: either call the backend
close method directly in the test (db._backend.close()) after db.remove(), or
better, add a public ASEIO.close() that delegates to self._backend.close() (and
update ASEIO.__exit__ to call self.close()), then replace test teardown to call
db.close() (and db.remove()) for Mongo and analogous calls for Redis and other
benchmarks.

---

Nitpick comments:
In @.planning/research/ARCHITECTURE.md:
- Around line 1-56: ARCHITECTURE.md currently describes a multi-Python-version
benchmark aggregation pipeline but the implemented workflow (benchmark.yml) only
runs benchmarks on a single Python version per CI-02; update ARCHITECTURE.md to
either (a) reflect the actual implemented design by replacing the multi-version
architecture diagram and wording with a concise description of the
single-version 3.13 workflow and its artifact handling, or (b) add a clear
“Future work / Alternative design” section that states the multi-version
aggregation diagram is optional and not implemented, referencing CI-02 and
benchmark.yml as the current single-version behavior. Ensure the file
names/symbols "ARCHITECTURE.md", "benchmark.yml", and "CI-02" are mentioned so
reviewers can locate and understand the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f388034-94ec-4065-b7e2-8a4040d773e0

📥 Commits

Reviewing files that changed from the base of the PR and between 7b4b9c9 and 857dc47.

📒 Files selected for processing (26)
  • .benchmarks/Darwin-CPython-3.11-64bit/0001_baseline.json
  • .benchmarks/Darwin-CPython-3.11-64bit/0002_perf_analysis.json
  • .github/workflows/benchmark.yml
  • .github/workflows/tests.yml
  • .gitignore
  • .planning/PROJECT.md
  • .planning/REQUIREMENTS.md
  • .planning/ROADMAP.md
  • .planning/STATE.md
  • .planning/phases/05-benchmark-pipeline/05-01-PLAN.md
  • .planning/phases/05-benchmark-pipeline/05-01-SUMMARY.md
  • .planning/phases/05-benchmark-pipeline/05-CONTEXT.md
  • .planning/phases/05-benchmark-pipeline/05-RESEARCH.md
  • .planning/phases/05-benchmark-pipeline/05-VALIDATION.md
  • .planning/phases/05-benchmark-pipeline/05-VERIFICATION.md
  • .planning/phases/06-pr-feedback/06-01-PLAN.md
  • .planning/phases/06-pr-feedback/06-CONTEXT.md
  • .planning/phases/06-pr-feedback/06-RESEARCH.md
  • .planning/phases/06-pr-feedback/06-VALIDATION.md
  • .planning/research/ARCHITECTURE.md
  • .planning/research/FEATURES.md
  • .planning/research/PITFALLS.md
  • .planning/research/STACK.md
  • .planning/research/SUMMARY.md
  • docs/visualize_benchmarks.py
  • tests/benchmarks/test_bench_write.py
💤 Files with no reviewable changes (4)
  • .benchmarks/Darwin-CPython-3.11-64bit/0002_perf_analysis.json
  • .github/workflows/tests.yml
  • .benchmarks/Darwin-CPython-3.11-64bit/0001_baseline.json
  • docs/visualize_benchmarks.py

Comment on lines +7 to +12
files_modified:
- .github/workflows/benchmark.yml
- .github/workflows/tests.yml
- .gitignore
- docs/visualize_benchmarks.py
autonomous: true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Include .benchmarks/ in the execution manifest.

Task 2 explicitly deletes the .benchmarks/ directory, but it is missing from files_modified. That makes the plan’s declared change set incomplete and weakens the traceability between the manifest and the actual work.

Also applies to: 127-129

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/phases/05-benchmark-pipeline/05-01-PLAN.md around lines 7 - 12,
The execution manifest's files_modified list in
.planning/phases/05-benchmark-pipeline/05-01-PLAN.md is missing the .benchmarks/
directory even though Task 2 deletes it; update the files_modified array (the
YAML block listing files_modified) to include ".benchmarks/" so the declared
change set matches the actual operations (also ensure the same addition is made
where the files_modified block is duplicated around the later section referenced
by lines 127-129).

Comment on lines +16 to +20
### Backend selection
- All backends in CI benchmarks: LMDB, HDF5 (ragged), Zarr (ragged), H5MD, MongoDB, Redis
- Include competitor baselines: znh5md, extxyz, sqlite, aselmdb
- MongoDB and Redis Docker services needed in benchmark job (same as test job)
- Single Python version: latest (3.13) per CI-02
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reconcile benchmark scope with the research docs before implementation.

This locks Phase 5 into all backends, MongoDB/Redis services, and Python 3.13, but the research docs in this PR recommend excluding service-backed benchmarks from tracked CI noise and standardizing on a single tracked version at 3.12. That leaves the phase definition internally contradictory and changes both runtime cost and signal quality.

🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 16-16: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3

(MD001, heading-increment)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/phases/05-benchmark-pipeline/05-CONTEXT.md around lines 16 - 20,
The "Backend selection" phase is internally inconsistent with the research docs:
update the Phase 5 definition under the "Backend selection" heading to match the
research recommendations by removing service-backed systems from the tracked CI
benchmark list (drop MongoDB and Redis from "All backends in CI benchmarks" and
instead mark them as optional/bench-only with a note that they are excluded from
tracked CI noise), align the single Python version to 3.12 (replace "latest
(3.13)" with "3.12"), and add a short clarification sentence referencing that
competitor baselines (znh5md, extxyz, sqlite, aselmdb) remain included but
service-backed systems require separate, non-tracked runs; ensure the research
doc reference or rationale is cited in the same section so the phase definition
is no longer contradictory.

Comment on lines +34 to +39
### gh-pages setup
- Let github-action-benchmark auto-create the gh-pages branch on first push
- Benchmark data stored at `/dev/bench/` on gh-pages
- GitHub Pages enablement is a manual step (documented, not automated)
- Use GITHUB_TOKEN for pushing to gh-pages (no deploy key)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

github-action-benchmark setup requirements gh-pages branch

💡 Result:

To use benchmark-action/github-action-benchmark with a gh-pages branch, the key setup requirements are:

  • Create the Pages branch (must already exist): create and push a gh-pages branch (or set the action input gh-pages-branch to whatever branch you use). The action’s own setup docs explicitly require creating the branch first. [1]
  • Configure GitHub Pages to publish from that branch: in Repo → Settings → Pages → Build and deployment → Source = “Deploy from a branch”, select Branch = gh-pages (and typically Folder = /(root)). GitHub Docs notes the branch must already exist. [2]
  • Give the workflow permission to push + deploy: the action’s example workflow sets permissions: contents: write (to update the gh-pages branch) and deployments: write (to deploy Pages), and passes ${{ secrets.GITHUB_TOKEN }} via github-token with auto-push: true to push updates automatically. [1]
  • Don’t run it on PRs from forks/untrusted code: the action warns to avoid running this workflow on pull requests because it can modify the Pages branch. [1]

Sources: [1], [2]


🌐 Web query:

benchmark-action github-action-benchmark documentation gh-pages auto-create

💡 Result:

benchmark-action/github-action-benchmark does not auto-create the gh-pages branch. The documentation explicitly says you must create the GitHub Pages branch first (example: git checkout --orphan gh-pages then git push origin gh-pages:gh-pages) before enabling Pages publishing. [1]

What it does “auto” do (once the branch exists):

  • If you set auto-push: true (and provide github-token + appropriate workflow permissions), the action will switch to the gh-pages branch, update the benchmark data (default under dev/bench/data.js), commit, and push the update automatically. [1]
  • It will auto-generate index.html in the directory specified by benchmark-data-dir-path if it doesn’t already exist (default dir is dev/bench). [1]

If you want a different Pages setup (e.g., Pages served from docs/ on master/main), the docs say to set gh-pages-branch to that branch and adjust benchmark-data-dir-path accordingly. [1]


🏁 Script executed:

# Check if there's relevant context in the repository about benchmark setup
find . -name "*benchmark*" -o -name "*gh-pages*" | head -20

Repository: zincware/asebytes

Length of output: 711


🏁 Script executed:

# Look for any configuration or setup files that might reference this
rg -i "github-action-benchmark|gh-pages" --type md --max-count 20

Repository: zincware/asebytes

Length of output: 43


🏁 Script executed:

cat -n ".planning/phases/05-benchmark-pipeline/05-CONTEXT.md" | sed -n '30,45p'

Repository: zincware/asebytes

Length of output: 957


The gh-pages branch must be pre-created before workflow execution; it will not auto-create.

Line 35 assumes github-action-benchmark will auto-create the gh-pages branch on first push. Official action documentation explicitly requires the branch to exist beforehand. The action only auto-pushes updates and generates index.html if the branch already exists. GitHub Pages publishing also requires the source branch to exist in the repository. Without pre-creating gh-pages, the initial Phase 5 rollout will fail before any dashboard is available.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/phases/05-benchmark-pipeline/05-CONTEXT.md around lines 34 - 39,
The doc incorrectly states github-action-benchmark will auto-create the gh-pages
branch; update the wording in the 05-CONTEXT.md gh-pages setup so it requires
pre-creating the gh-pages branch and add a concise fix: instruct maintainers to
create the branch before workflow run (via UI or git push --set-upstream origin
gh-pages) and ensure GitHub Pages source is configured, and optionally add a
note to the workflow using github-action-benchmark to either verify the branch
exists or fail fast with a clear message referencing gh-pages and
github-action-benchmark.


The user explicitly decided against release-triggered benchmark runs -- every main push updates the dashboard, so releases just link to the existing data. CI-04 is satisfied by documentation rather than a separate release workflow trigger.

**Primary recommendation:** Use `workflow_run` trigger with `actions/download-artifact@v4` (cross-workflow download via `run-id`) to decouple benchmarks from the test matrix, then `benchmark-action/github-action-benchmark@v1` with `auto-push: true` to commit results to gh-pages.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Remove the obsolete artifact-handoff architecture from this research doc.

The file still recommends upload-artifact/download-artifact as the primary pattern, but the later “IMPORTANT DECISION” section says the benchmark workflow runs benchmarks itself and does not need cross-workflow artifacts. Keeping both architectures here makes the guidance contradictory and can send follow-up work down the wrong path.

Also applies to: 70-72, 89-112

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/phases/05-benchmark-pipeline/05-RESEARCH.md at line 13, Remove the
obsolete artifact-handoff architecture: replace any primary recommendation that
promotes upload-artifact/download-artifact cross-workflow handoff (e.g.,
mentions of actions/download-artifact@v4 and upload-artifact) with a single,
consistent recommendation to use the workflow_run trigger and the
benchmark-action/github-action-benchmark@v1 with auto-push: true to commit
results to gh-pages; update or delete the contradicted paragraphs in the
“Primary recommendation” section and any references that conflict with the
“IMPORTANT DECISION” section so the doc exclusively describes the self-contained
benchmark workflow approach (workflow_run + benchmark-action) and removes the
cross-workflow artifact handoff guidance.

Comment on lines +20 to +23
| **Framework** | pytest 8.x + pytest-benchmark >=5.2.1 |
| **Config file** | pyproject.toml `[tool.pytest.ini_options]` |
| **Quick run command** | `python -c "import yaml; yaml.safe_load(open('.github/workflows/benchmark.yml'))"` |
| **Full suite command** | `uv run pytest` |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check the file content around lines 20-33
head -35 ".planning/phases/05-benchmark-pipeline/05-VALIDATION.md" | tail -20

Repository: zincware/asebytes

Length of output: 913


🏁 Script executed:

# Check for pyproject.toml to see declared dependencies
find . -maxdepth 2 -name "pyproject.toml" -type f

Repository: zincware/asebytes

Length of output: 77


🏁 Script executed:

# Look for any requirements files or dependency declarations
find . -maxdepth 2 \( -name "requirements*.txt" -o -name "*.lock" -o -name "uv.lock" \) -type f

Repository: zincware/asebytes

Length of output: 70


🏁 Script executed:

# Read pyproject.toml to check for PyYAML in dependencies
cat pyproject.toml

Repository: zincware/asebytes

Length of output: 2009


🏁 Script executed:

# Search for PyYAML or pyyaml in uv.lock
grep -i "pyyaml\|yaml" uv.lock | head -20

Repository: zincware/asebytes

Length of output: 5359


🏁 Script executed:

# Check the full context of the VALIDATION.md file to understand the setup
cat -n ".planning/phases/05-benchmark-pipeline/05-VALIDATION.md"

Repository: zincware/asebytes

Length of output: 3732


The "quick run" command assumes PyYAML is installed but doesn't explicitly require it.

The command on lines 22 and 30 uses python -c "import yaml; ..." without uv run, while PyYAML is not listed as an explicit dependency in pyproject.toml—it only appears as a transitive dependency in uv.lock. This will fail on a clean checkout unless the developer has previously run uv sync. Either wrap the command with uv run to guarantee the managed environment, or add PyYAML to the dev dependency group with explicit rationale.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~22-~22: The official name of this software platform is spelled with a capital “H”.
Context: ...i_options]| | **Quick run command** |python -c "import yaml; yaml.safe_load(open('.github/workflows/benchmark.yml'))"` | | **Full...

(GITHUB)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/phases/05-benchmark-pipeline/05-VALIDATION.md around lines 20 -
23, The "quick run" instruction uses python -c "import yaml; ..." but assumes
PyYAML is present; update the guidance to either (A) wrap the quick-run command
with uv run to ensure the managed env (i.e., use uv run python -c "import yaml;
yaml.safe_load(...)") or (B) add PyYAML to the dev dependencies in
pyproject.toml under the dev group (so pytest/benchmark tooling can import yaml
on a clean checkout) and include a brief rationale that PyYAML is required for
the quick-run check; modify the text for the quick run command and mention which
option was chosen.

Comment on lines +97 to +105
No Python packages to install. The only addition is a GitHub Actions step:

# Storage backends (extras, already configured)
uv add --optional h5md "h5py>=3.12"
uv add --optional zarr "zarr>=3.0"
uv add --optional lmdb "lmdb>=1.6.0"

# Dev / testing
uv add --group dev "pytest>=8.4.2" "pytest-benchmark>=5.2.1" "anyio>=4.9" "molify>=0.0.1a0"

# Optional: CI perf regression
uv add --group dev "pytest-codspeed>=4.0"
```yaml
# In .github/workflows/tests.yml, new benchmark job
- uses: benchmark-action/github-action-benchmark@v1
with:
tool: pytest
output-file-path: benchmark_results.json
# ... (see configuration inputs above)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Point the setup snippet at benchmark.yml, not tests.yml.

This block tells readers to add the benchmark action to .github/workflows/tests.yml, while the rest of the Phase 5 plan moves benchmarking into a dedicated .github/workflows/benchmark.yml triggered after tests. That mismatch will send the implementation down the wrong workflow shape.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/research/STACK.md around lines 97 - 105, The workflow snippet
incorrectly references `.github/workflows/tests.yml`; update the documentation
and example to point to the dedicated benchmarking workflow
`.github/workflows/benchmark.yml` so it matches the Phase 5 plan (i.e., replace
mention of tests.yml with benchmark.yml in the YAML example and any explanatory
text), ensuring the new GitHub Actions step (uses:
benchmark-action/github-action-benchmark@v1 with tool: pytest and
output-file-path: benchmark_results.json) is shown as part of `benchmark.yml`
and not `tests.yml`.

Comment on lines 50 to +58
### Architecture Approach

The target architecture introduces a BaseColumnarBackend template method class that owns shared logic (~60% of current ColumnarBackend: scalar columns, JSON serialization, fill values, postprocessing, classification). RaggedColumnarBackend and PaddedColumnarBackend inherit from it, overriding only per-atom storage methods. H5MDBackend also inherits shared logic but uses h5py directly instead of ColumnarStore due to H5MD's incompatible nested layout. The registry dispatches based on file extension to separate variants.
The architecture adds a single new `benchmark` job to the existing `tests.yml` workflow. This job uses `needs: [test]` to wait for all matrix legs, downloads all benchmark artifacts, then runs github-action-benchmark once per Python version. Each version gets its own `benchmark-data-dir-path` on gh-pages for clean data separation. The `auto-push` flag is conditional on main branch pushes only.

**Major components:**
1. **BaseColumnarBackend** -- shared columnar logic: classification, scalar write/read, serialization, postprocessing
2. **RaggedColumnarBackend** -- offset+flat ragged storage for variable-size molecular data (default)
3. **PaddedColumnarBackend** -- NaN-padded storage for uniform-size data (opt-in via extension)
4. **H5MDBackend (refactored)** -- H5MD 1.1 spec compliance, inherits from BaseColumnarBackend, uses h5py direct
5. **ColumnarStore** -- array-level I/O abstraction (HDF5Store, ZarrStore) -- keep as-is, it works well
6. **Contract test suite** -- single parametrized test suite replacing 40+ per-feature test files
1. **test job (existing, unchanged)** -- runs benchmarks per Python version, uploads JSON artifacts
2. **benchmark job (new, post-matrix)** -- downloads artifacts, runs github-action-benchmark per version, pushes to gh-pages
3. **gh-pages branch** -- stores historical data.js + auto-generated index.html dashboard
4. **GitHub Pages** -- serves the interactive Chart.js dashboard
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The architecture summary still describes an in-workflow needs job.

Line 52 says the benchmark job is added to tests.yml and waits via needs: [test], but the Phase 5 docs elsewhere define a separate benchmark.yml driven by workflow_run. Those are materially different execution models; keep this summary aligned with the chosen one before Phase 5 gets implemented.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/research/SUMMARY.md around lines 50 - 58, The Architecture
Approach text incorrectly mixes two execution models; update the summary to
match the chosen model (either the in-workflow approach using tests.yml with the
benchmark job that uses needs: [test] or the separate workflow approach using
benchmark.yml triggered by workflow_run). Edit the paragraph mentioning "adds a
single new `benchmark` job to the existing `tests.yml` workflow" and the
reference to `needs: [test]` (or alternatively replace with `benchmark.yml` and
`workflow_run`) so the description, list item `#2` ("benchmark job (new,
post-matrix)"), and the Phase 5 references consistently describe the same model
(in-workflow `tests.yml` + `needs: [test]` OR standalone `benchmark.yml` +
`workflow_run`).


**Milestone Goal:** Automated benchmark tracking in CI with PR regression feedback and a public GitHub Pages dashboard.

- [x] **Phase 5: Benchmark Pipeline** - gh-pages branch, benchmark workflow job, auto-push on main, release snapshots (completed 2026-03-09)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Phase 5’s release behavior is described incorrectly here.

The rest of the Phase 5 docs in this PR say there is no separate tag/release trigger; releases inherit the latest main baseline and CI-04 is satisfied by documentation. This roadmap currently promises actual release-triggered benchmark snapshots, which overstates the shipped behavior.

Also applies to: 36-45

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/ROADMAP.md at line 30, Update the Phase 5 entry in ROADMAP.md (the
"Phase 5: Benchmark Pipeline" checklist item) to remove the claim that benchmark
snapshots are triggered by actual releases/tags and instead state that
benchmarks inherit the latest main baseline and CI-04 is satisfied via
documentation; adjust the wording for items covering lines 36-45 similarly so
the roadmap matches the rest of the PR (no separate tag/release trigger, only
auto-push on main with documentation-based compliance).

Comment on lines +50 to +57
### Phase 6: PR Feedback
**Goal**: PR authors see benchmark comparison results and regressions block merge
**Depends on**: Phase 5 (baseline data must exist on gh-pages)
**Requirements**: PR-01, PR-02, PR-03
**Success Criteria** (what must be TRUE):
1. PRs receive a comment with a full benchmark comparison table showing deltas (regressions and improvements) vs main
2. The alert threshold percentage is configurable in the workflow YAML (default 150%)
3. A PR with a benchmark regression beyond the threshold is blocked from merging
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Phase 6 is overpromising PR comments for every run.

The research in this PR says the full comparison table comes from summary-always in the Job Summary, while comment-on-alert only adds a commit comment when a regression crosses the threshold. Requiring “a comment with a full benchmark comparison table” changes the planned behavior and acceptance criteria.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/ROADMAP.md around lines 50 - 57, The Phase 6 acceptance criteria
overpromise by stating every PR will receive "a comment with a full benchmark
comparison table", but the implementation uses "summary-always" for the full
table and "comment-on-alert" to only comment when a regression crosses the
threshold; update the ROADMAP.md Phase 6 section (heading "Phase 6: PR
Feedback") to reflect the actual behavior: change criterion 1 to state the full
benchmark comparison table is available in the Job Summary via "summary-always"
and clarify that "comment-on-alert" only posts a commit comment when a
regression exceeds the configurable alert threshold (default 150%), and adjust
success criteria text to remove the requirement that every PR receives a
full-table comment.

Comment on lines +87 to +89
| 5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | - |
| 6. PR Feedback | v0.3.1 | 0/1 | Not started | - |
| 7. Dashboard and README | v0.3.1 | 0/? | Not started | - |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The Phase 5 progress row is shifted under the wrong columns.

Given the header Phase | Milestone | Plans Complete | Status | Completed, the Phase 5 row currently places 1/1 under Milestone and Complete under Plans Complete. It should follow the same column order as the Phase 6 and 7 rows.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/ROADMAP.md around lines 87 - 89, The Phase 5 table row has its
cells misordered; update the row for "5. Benchmark Pipeline" so the columns
match the header (Phase | Milestone | Plans Complete | Status | Completed) —
e.g. change the current "| 5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 |
- |" to place a Milestone (use "-" if none), then "1/1" under Plans Complete,
"Complete" under Status, and the date "2026-03-09" under Completed.

PythonFZ and others added 8 commits March 9, 2026 21:46
- Generate unique UUID-based group name per test invocation
- Pass group= uniformly to all 6 facade constructors (ASEIO, ObjectIO, BlobIO, AsyncASEIO, AsyncObjectIO, AsyncBlobIO)
- Prevents data leakage between MongoDB/Redis tests sharing a single server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- SUMMARY.md documents UUID-based group isolation for all facade fixtures
- STATE.md updated with phase 8 completion
- ROADMAP.md marks phase 8 plan complete
- REQUIREMENTS.md marks ISO-01, ISO-02, ISO-03 complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 1s TTL cache improves performance for typical use but prevents
cross-instance visibility in the stale-cache test. Adding cache_ttl
parameter (default 1.0, None=disabled) lets callers opt out when
they need immediate consistency across backend instances.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/contract/conftest.py (1)

203-214: ⚠️ Potential issue | 🟠 Major

Fix unawaited async remove() call in _sync_cleanup for native async backends.

The function attempts to call backend.remove() synchronously on all backends, but AsyncMongoObjectBackend.remove() is an async coroutine that requires awaiting. Since mongodb:// uses the native async backend (not wrapped by SyncToAsyncAdapter), the function directly receives AsyncMongoObjectBackend without the wrapping check succeeding, and calls its async remove() without awaiting, causing a RuntimeWarning: coroutine was never awaited.

This leaves MongoDB collections uncleaned after async tests despite group isolation, risking test pollution.

🔧 Proposed fix
 def _sync_cleanup(db, path: str) -> None:
     """Synchronously clean up network/memory backends after async tests."""
     if path.startswith(("mongodb://", "redis://", "memory://")):
         # Access the underlying sync backend wrapped by SyncToAsyncAdapter
         backend = db._backend
         if hasattr(backend, "_backend"):
             # SyncToAsyncReadWriteAdapter wraps sync backend in _backend
             backend = backend._backend
         try:
+            import asyncio
+            import inspect
+            if inspect.iscoroutinefunction(backend.remove):
+                asyncio.run(backend.remove())
+            else:
+                backend.remove()
-            backend.remove()
         except Exception:
             pass
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/contract/conftest.py` around lines 203 - 214, _sync_cleanup is calling
backend.remove() synchronously but for native async backends (e.g.,
AsyncMongoObjectBackend) remove() is a coroutine and must be awaited; update
_sync_cleanup to detect when backend.remove() returns a coroutine (or is an
async function) and run it to completion (e.g., via
asyncio.get_event_loop().run_until_complete or similar) after resolving the
underlying backend (db._backend and possible ._backend unwrap used for
SyncToAsyncReadWriteAdapter) so both sync and async remove() calls are handled
safely and exceptions still caught.
♻️ Duplicate comments (3)
.planning/ROADMAP.md (3)

30-30: ⚠️ Potential issue | 🟡 Minor

Phase 5 description mentions "release snapshots" but implementation doesn't include a separate release trigger.

The research and summary documents indicate there's no separate tag/release trigger — benchmarks inherit the latest main baseline. Consider updating this line to match the actual implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/ROADMAP.md at line 30, Update the Phase 5 checklist line in
ROADMAP.md to remove or reword "release snapshots" to reflect the actual
implementation (benchmarks inherit the latest main baseline and there is no
separate tag/release trigger); locate the bullet "- [x] **Phase 5: Benchmark
Pipeline**" and change the phrase so it accurately states "benchmarks use latest
main baseline (no separate release/tag trigger)" or similar concise wording.

54-57: ⚠️ Potential issue | 🟡 Minor

Phase 6 success criteria may overstate the PR comment behavior.

The implementation uses summary-always for the full comparison table in the Job Summary and comment-on-alert for commit comments only when regressions cross the threshold. Criterion 1 stating "PRs receive a comment with a full benchmark comparison table" should clarify this distinction.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/ROADMAP.md around lines 54 - 57, Update the Success Criteria text
to clarify that the full benchmark comparison table is always posted to the Job
Summary using the "summary-always" behavior, while PR commit comments are only
posted when regressions exceed the configured threshold using the
"comment-on-alert" behavior; specifically edit criterion 1 (under "Success
Criteria") to state that PRs will see a full comparison table in the Job Summary
and that commit comments on the PR occur only for alerts (not for every run),
and mention the alert threshold configurability referenced in criterion 2 to tie
the behaviors together.

100-100: ⚠️ Potential issue | 🟡 Minor

The Phase 5 progress row has misaligned columns.

Given the header Phase | Milestone | Plans Complete | Status | Completed, the Phase 5 row places values under incorrect columns. Compare with the Phase 6 and 7 rows which follow the correct order.

🔧 Proposed fix for column alignment
-| 5. Benchmark Pipeline | 1/1 | Complete   | 2026-03-09 | - |
+| 5. Benchmark Pipeline | v0.3.1 | 1/1 | Complete | 2026-03-09 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/ROADMAP.md at line 100, The Phase 5 table row is misaligned;
update the Markdown row so each cell maps to the header "Phase | Milestone |
Plans Complete | Status | Completed" — set the first cell to the phase number
(5), the second to the milestone name (Benchmark Pipeline), the third to "1/1",
the fourth to "Complete", and the fifth to "2026-03-09" (or "-" if not
completed), replacing the current '| 5. Benchmark Pipeline | 1/1 | Complete   |
2026-03-09 | - |' entry accordingly.
🧹 Nitpick comments (1)
.planning/STATE.md (1)

4-4: Milestone name appears to be a placeholder.

milestone_name: milestone seems like it should contain an actual milestone name (e.g., "v0.3.1 CI Benchmark Infrastructure").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/STATE.md at line 4, The milestone_name entry in STATE.md is a
placeholder; replace the value for the key milestone_name with the actual
milestone string (for example "v0.3.1 CI Benchmark Infrastructure" or whatever
project-specific release/feature name applies) so the plan reflects the real
milestone—update the literal value for milestone_name in .planning/STATE.md and
ensure it follows the project's naming convention used elsewhere (e.g., version
prefix and descriptive title).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.planning/STATE.md:
- Around line 10-14: The progress metrics block (keys total_phases,
completed_phases, total_plans, completed_plans, percent) is inconsistent:
percent is 100 while completed_plans is 1 of 2; either recompute percent to
reflect plan completion (e.g., set percent to 50) or change the percent key
name/value to clarify its scope (e.g., percent_phase or percent_overall) and
update its value accordingly so the numbers align with
total_plans/completed_plans.

---

Outside diff comments:
In `@tests/contract/conftest.py`:
- Around line 203-214: _sync_cleanup is calling backend.remove() synchronously
but for native async backends (e.g., AsyncMongoObjectBackend) remove() is a
coroutine and must be awaited; update _sync_cleanup to detect when
backend.remove() returns a coroutine (or is an async function) and run it to
completion (e.g., via asyncio.get_event_loop().run_until_complete or similar)
after resolving the underlying backend (db._backend and possible ._backend
unwrap used for SyncToAsyncReadWriteAdapter) so both sync and async remove()
calls are handled safely and exceptions still caught.

---

Duplicate comments:
In @.planning/ROADMAP.md:
- Line 30: Update the Phase 5 checklist line in ROADMAP.md to remove or reword
"release snapshots" to reflect the actual implementation (benchmarks inherit the
latest main baseline and there is no separate tag/release trigger); locate the
bullet "- [x] **Phase 5: Benchmark Pipeline**" and change the phrase so it
accurately states "benchmarks use latest main baseline (no separate release/tag
trigger)" or similar concise wording.
- Around line 54-57: Update the Success Criteria text to clarify that the full
benchmark comparison table is always posted to the Job Summary using the
"summary-always" behavior, while PR commit comments are only posted when
regressions exceed the configured threshold using the "comment-on-alert"
behavior; specifically edit criterion 1 (under "Success Criteria") to state that
PRs will see a full comparison table in the Job Summary and that commit comments
on the PR occur only for alerts (not for every run), and mention the alert
threshold configurability referenced in criterion 2 to tie the behaviors
together.
- Line 100: The Phase 5 table row is misaligned; update the Markdown row so each
cell maps to the header "Phase | Milestone | Plans Complete | Status |
Completed" — set the first cell to the phase number (5), the second to the
milestone name (Benchmark Pipeline), the third to "1/1", the fourth to
"Complete", and the fifth to "2026-03-09" (or "-" if not completed), replacing
the current '| 5. Benchmark Pipeline | 1/1 | Complete   | 2026-03-09 | - |'
entry accordingly.

---

Nitpick comments:
In @.planning/STATE.md:
- Line 4: The milestone_name entry in STATE.md is a placeholder; replace the
value for the key milestone_name with the actual milestone string (for example
"v0.3.1 CI Benchmark Infrastructure" or whatever project-specific
release/feature name applies) so the plan reflects the real milestone—update the
literal value for milestone_name in .planning/STATE.md and ensure it follows the
project's naming convention used elsewhere (e.g., version prefix and descriptive
title).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fdb9738b-f744-497a-978c-267cb2afdb53

📥 Commits

Reviewing files that changed from the base of the PR and between 857dc47 and 7e37111.

📒 Files selected for processing (9)
  • .planning/REQUIREMENTS.md
  • .planning/ROADMAP.md
  • .planning/STATE.md
  • .planning/phases/08-fix-failing-tests-in-redis-mongo-backends-test-isolation/08-01-PLAN.md
  • .planning/phases/08-fix-failing-tests-in-redis-mongo-backends-test-isolation/08-01-SUMMARY.md
  • .planning/phases/08-fix-failing-tests-in-redis-mongo-backends-test-isolation/08-CONTEXT.md
  • .planning/phases/08-fix-failing-tests-in-redis-mongo-backends-test-isolation/08-RESEARCH.md
  • .planning/phases/08-fix-failing-tests-in-redis-mongo-backends-test-isolation/08-VALIDATION.md
  • tests/contract/conftest.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • .planning/REQUIREMENTS.md

Comment on lines 10 to 14
total_phases: 4
completed_phases: 4
total_plans: 13
completed_plans: 13
completed_phases: 1
total_plans: 2
completed_plans: 1
percent: 100
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Progress metrics appear inconsistent.

The metrics show total_plans: 2, completed_plans: 1, but percent: 100. If only 1 of 2 plans is complete, the percentage should reflect that (50%), unless the 100% refers to a different scope.

🔧 Suggested clarification

Either update the percentage to match the plan completion ratio, or clarify what scope the 100% refers to (e.g., Phase 8 specifically vs overall milestone).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/STATE.md around lines 10 - 14, The progress metrics block (keys
total_phases, completed_phases, total_plans, completed_plans, percent) is
inconsistent: percent is 100 while completed_plans is 1 of 2; either recompute
percent to reflect plan completion (e.g., set percent to 50) or change the
percent key name/value to clarify its scope (e.g., percent_phase or
percent_overall) and update its value accordingly so the numbers align with
total_plans/completed_plans.

@PythonFZ PythonFZ merged commit 765174a into main Mar 9, 2026
4 of 5 checks passed
@PythonFZ PythonFZ deleted the gsd/auto-bench branch March 9, 2026 21:26
@coderabbitai coderabbitai bot mentioned this pull request Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant