Skip to content

Commit 765174a

Browse files
PythonFZclaude
andauthored
Gsd/auto bench (#12)
* docs: start milestone v0.3.1 CI Benchmark Infrastructure * docs: complete project research * docs: define milestone v0.3.1 requirements * docs: create milestone v0.3.1 roadmap (3 phases) * docs(05): capture phase context * docs(state): record phase 5 context session * docs(05): research benchmark pipeline phase * docs(05): add validation strategy * docs(05): create phase plan * feat(05-01): add benchmark CI workflow and update .gitignore - Create benchmark.yml with workflow_run trigger on Tests+main - Single Python 3.13 job with MongoDB+Redis services - github-action-benchmark auto-push to gh-pages at dev/bench - Document CI-01 (gh-pages setup) and CI-04 (release strategy) - Add .benchmarks/ to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore(05-01): remove benchmark steps from tests.yml and delete legacy files - Remove Run benchmarks, Visualize benchmarks, Upload benchmark results steps - Delete docs/visualize_benchmarks.py (superseded by gh-pages dashboard) - Delete .benchmarks/ local cache directory Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(05-01): complete benchmark pipeline plan - Add 05-01-SUMMARY.md with execution results - Update STATE.md with plan completion and decisions - Update ROADMAP.md with phase 5 progress - Mark CI-01 through CI-04 requirements complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(phase-05): complete phase execution * docs(06): capture phase context * docs(state): record phase 6 context session * docs(06): research phase domain * docs(phase-6): add validation strategy * docs(06): create phase plan * feat(06-01): add PR benchmark comparison and fail-on-regression gate - Add pull_request trigger (opened, synchronize) alongside workflow_run - Add concurrency group to cancel in-progress PR benchmark runs - Split benchmark-action into main (auto-push) and PR (compare-only) steps - PR step: summary-always, comment-on-alert, fail-on-alert at 150% threshold - PR step: auto-push false, save-data-file false to avoid polluting gh-pages - Document PR behavior and branch protection setup in header comments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(benchmarks): prevent write_single from accumulating data across iterations write_single tests created the DB once and appended in each benchmark iteration, causing quadratic slowdown (2h+ hangs on CI). Two fixes: - Create fresh DB per iteration (matches write_trajectory pattern) - Cap to 10 frames (per-row overhead is the signal, not throughput) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(08): capture phase context * docs(state): record phase 8 context session * docs(08): research phase domain * docs(phase-8): add validation strategy * docs(08): create phase plan for test isolation fix * fix(08-01): add per-test group isolation to all facade fixtures - Generate unique UUID-based group name per test invocation - Pass group= uniformly to all 6 facade constructors (ASEIO, ObjectIO, BlobIO, AsyncASEIO, AsyncObjectIO, AsyncBlobIO) - Prevents data leakage between MongoDB/Redis tests sharing a single server Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs(08-01): complete test isolation plan - SUMMARY.md documents UUID-based group isolation for all facade fixtures - STATE.md updated with phase 8 completion - ROADMAP.md marks phase 8 plan complete - REQUIREMENTS.md marks ISO-01, ISO-02, ISO-03 complete Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(mongodb): make cache_ttl configurable with None to disable caching The 1s TTL cache improves performance for typical use but prevents cross-instance visibility in the stale-cache test. Adding cache_ttl parameter (default 1.0, None=disabled) lets callers opt out when they need immediate consistency across backend instances. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7b4b9c9 commit 765174a

35 files changed

Lines changed: 3181 additions & 5073 deletions

File tree

.benchmarks/Darwin-CPython-3.11-64bit/0001_baseline.json

Lines changed: 0 additions & 1967 deletions
This file was deleted.

.benchmarks/Darwin-CPython-3.11-64bit/0002_perf_analysis.json

Lines changed: 0 additions & 1967 deletions
This file was deleted.

.github/workflows/benchmark.yml

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Benchmark CI Pipeline
2+
#
3+
# Runs after the "Tests" workflow succeeds on main. Executes the full benchmark
4+
# suite on Python 3.13 and pushes results to gh-pages at /dev/bench/ via
5+
# github-action-benchmark.
6+
#
7+
# PRs receive a benchmark comparison table in Job Summary and fail on
8+
# regressions beyond 150% (PR-01, PR-02, PR-03).
9+
#
10+
# To enforce the merge gate, enable branch protection requiring the
11+
# 'Benchmarks' check to pass: Settings > Branches > Branch protection rules.
12+
#
13+
# CI-04: Release/tag events do NOT get a separate benchmark run. Every push to
14+
# main updates the gh-pages dashboard, so releases inherit the latest baseline.
15+
#
16+
# CI-01: github-action-benchmark with auto-push: true auto-creates the gh-pages
17+
# branch on first run. GitHub Pages must be manually enabled once:
18+
# Settings > Pages > Source: Deploy from a branch > gh-pages / root.
19+
20+
name: Benchmarks
21+
22+
on:
23+
workflow_run:
24+
workflows: ["Tests"]
25+
types: [completed]
26+
branches: [main]
27+
pull_request:
28+
types: [opened, synchronize]
29+
30+
permissions:
31+
contents: write
32+
deployments: write
33+
34+
concurrency:
35+
group: benchmark-${{ github.event.pull_request.number || github.sha }}
36+
cancel-in-progress: true
37+
38+
jobs:
39+
benchmark:
40+
runs-on: ubuntu-latest
41+
if: github.event_name == 'pull_request' || github.event.workflow_run.conclusion == 'success'
42+
43+
services:
44+
redis:
45+
image: redis:7
46+
ports:
47+
- 6379:6379
48+
options: >-
49+
--health-cmd "redis-cli ping"
50+
--health-interval 10s
51+
--health-timeout 5s
52+
--health-retries 5
53+
54+
mongodb:
55+
image: mongo:7
56+
env:
57+
MONGO_INITDB_ROOT_USERNAME: root
58+
MONGO_INITDB_ROOT_PASSWORD: example
59+
ports:
60+
- 27017:27017
61+
options: >-
62+
--health-cmd "mongosh --eval 'db.runCommand(\"ping\").ok' --quiet"
63+
--health-interval 10s
64+
--health-timeout 5s
65+
--health-retries 5
66+
67+
steps:
68+
- uses: actions/checkout@v4
69+
70+
- name: Install uv and set the python version
71+
uses: astral-sh/setup-uv@v5
72+
with:
73+
python-version: "3.13"
74+
75+
- name: Install package
76+
run: |
77+
uv sync --all-extras --dev
78+
79+
- name: Run benchmarks
80+
run: |
81+
uv run pytest -m benchmark --benchmark-only --benchmark-json=benchmark_results.json
82+
83+
- name: Store benchmark results (main)
84+
if: github.event_name == 'workflow_run'
85+
uses: benchmark-action/github-action-benchmark@v1
86+
with:
87+
tool: "pytest"
88+
output-file-path: benchmark_results.json
89+
gh-pages-branch: gh-pages
90+
benchmark-data-dir-path: dev/bench
91+
github-token: ${{ secrets.GITHUB_TOKEN }}
92+
auto-push: true
93+
94+
- name: Compare benchmark results (PR)
95+
if: github.event_name == 'pull_request'
96+
uses: benchmark-action/github-action-benchmark@v1
97+
with:
98+
tool: "pytest"
99+
output-file-path: benchmark_results.json
100+
gh-pages-branch: gh-pages
101+
benchmark-data-dir-path: dev/bench
102+
github-token: ${{ secrets.GITHUB_TOKEN }}
103+
auto-push: false
104+
save-data-file: false
105+
summary-always: true
106+
comment-on-alert: true
107+
fail-on-alert: true
108+
alert-threshold: "150%"

.github/workflows/tests.yml

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -55,21 +55,3 @@ jobs:
5555
uv run python --version
5656
uv run pytest
5757
58-
- name: Run benchmarks
59-
run: |
60-
uv run python --version
61-
uv run pytest -m benchmark --benchmark-only --benchmark-json=benchmark_results.json
62-
63-
- name: Visualize benchmarks
64-
run: |
65-
uv run docs/visualize_benchmarks.py benchmark_results.json
66-
if: always()
67-
68-
- name: Upload benchmark results
69-
uses: actions/upload-artifact@v4
70-
if: always()
71-
with:
72-
name: benchmark-results-${{ matrix.python-version }}
73-
path: |
74-
benchmark_results.json
75-
*.png

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ tests/data/
1515

1616
# Benchmark results (machine-specific)
1717
benchmark_results.json
18+
.benchmarks/
1819

1920
# Git worktrees
2021
.worktrees/

.planning/PROJECT.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,13 @@ Every storage backend must be fast, correct, and tested through a single paramet
2626

2727
### Active
2828

29+
- [ ] PR benchmark comments showing perf diff vs base branch (BENCH-01)
30+
- [ ] Benchmark JSON committed to repo, overwritten per merge/tag (BENCH-02)
31+
- [ ] GitHub Pages dashboard tracking performance over releases (BENCH-03)
32+
- [ ] Evaluate and select CI benchmark tooling (CML, github-action-benchmark, etc.) (BENCH-04)
33+
34+
### Backlog
35+
2936
- [ ] Store schema in backend metadata at write time for O(1) introspection (OPT-01)
3037
- [ ] Improve cache-to secondary backend pattern in ASEIO (OPT-02)
3138
- [ ] Investigate pytest-codspeed for CI-stable benchmarks (OPT-03)
@@ -83,4 +90,4 @@ Known performance characteristics:
8390
| Facade bounds-check elimination | Delegate IndexError to backend instead of pre-checking len() | ✓ Good — saves round-trip for positive indices |
8491

8592
---
86-
*Last updated: 2026-03-06 after v1.0 milestone*
93+
*Last updated: 2026-03-09 after v0.3.1 milestone start*

.planning/REQUIREMENTS.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Requirements: asebytes
2+
3+
**Defined:** 2026-03-09
4+
**Core Value:** Every storage backend must be fast, correct, and tested through a single parametrized test suite
5+
6+
## v0.3.1 Requirements
7+
8+
Requirements for CI benchmark infrastructure milestone. Each maps to roadmap phases.
9+
10+
### CI Infrastructure
11+
12+
- [x] **CI-01**: gh-pages branch exists with GitHub Pages enabled serving benchmark dashboard
13+
- [x] **CI-02**: Post-matrix benchmark job runs github-action-benchmark for a single Python version (latest)
14+
- [x] **CI-03**: Auto-push to gh-pages only on main branch pushes, not PRs
15+
- [x] **CI-04**: Release/tag events trigger a benchmark snapshot on gh-pages
16+
17+
### PR Feedback
18+
19+
- [ ] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements
20+
- [ ] **PR-02**: Alert threshold is configurable (starting at 150%)
21+
- [ ] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression
22+
23+
### Dashboard
24+
25+
- [ ] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
26+
- [ ] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
27+
- [ ] **DASH-03**: max-items-in-chart limits data growth on gh-pages
28+
29+
## Maintenance Requirements
30+
31+
### Test Isolation (Phase 8)
32+
33+
- [x] **ISO-01**: MongoDB contract tests pass without data leaking between tests
34+
- [x] **ISO-02**: Redis contract tests pass without data leaking between tests
35+
- [x] **ISO-03**: All other backend contract tests remain green after isolation changes (no regressions)
36+
37+
## Future Requirements
38+
39+
### Enhanced PR Comments
40+
41+
- **PR-04**: Per-backend grouping in PR comparison tables
42+
- **PR-05**: Visualization PNGs embedded in PR comments
43+
44+
### Dashboard Enhancements
45+
46+
- **DASH-04**: Release-tagged benchmark snapshots with comparison view
47+
- **DASH-05**: Memory profiling pipeline integrated into dashboard
48+
49+
## Out of Scope
50+
51+
| Feature | Reason |
52+
|---------|--------|
53+
| Per-Python-version benchmark tracking | Adds complexity without proportional regression detection benefit |
54+
| Hosted SaaS dashboard (codspeed, bencher) | External dependency; Chart.js on gh-pages is sufficient |
55+
| Fork PR benchmark comments | GitHub token scoping prevents it; low fork contribution volume |
56+
| Custom React dashboard | Maintenance overhead; Chart.js auto-generation covers needs |
57+
| pytest-codspeed integration | Orthogonal to CI tracking; codspeed measures CPU not I/O |
58+
59+
## Traceability
60+
61+
Which phases cover which requirements. Updated during roadmap creation.
62+
63+
| Requirement | Phase | Status |
64+
|-------------|-------|--------|
65+
| CI-01 | Phase 5 | Complete |
66+
| CI-02 | Phase 5 | Complete |
67+
| CI-03 | Phase 5 | Complete |
68+
| CI-04 | Phase 5 | Complete |
69+
| PR-01 | Phase 6 | Pending |
70+
| PR-02 | Phase 6 | Pending |
71+
| PR-03 | Phase 6 | Pending |
72+
| DASH-01 | Phase 7 | Pending |
73+
| DASH-02 | Phase 7 | Pending |
74+
| DASH-03 | Phase 7 | Pending |
75+
| ISO-01 | Phase 8 | Complete |
76+
| ISO-02 | Phase 8 | Complete |
77+
| ISO-03 | Phase 8 | Complete |
78+
79+
**Coverage:**
80+
- v0.3.1 requirements: 10 total
81+
- Maintenance requirements: 3 total
82+
- Mapped to phases: 13
83+
- Unmapped: 0
84+
85+
---
86+
*Requirements defined: 2026-03-09*
87+
*Last updated: 2026-03-09 after phase 8 planning*

.planning/ROADMAP.md

Lines changed: 81 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,102 @@
22

33
## Milestones
44

5-
-**v1.0 Maintenance & Performance Overhaul** — Phases 1-4 (shipped 2026-03-06)
5+
- v1.0 Maintenance & Performance Overhaul -- Phases 1-4 (shipped 2026-03-06)
6+
- v0.3.1 CI Benchmark Infrastructure -- Phases 5-7 (in progress)
67

78
## Phases
89

10+
**Phase Numbering:**
11+
- Integer phases (1, 2, 3): Planned milestone work
12+
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
13+
914
<details>
10-
<summary>v1.0 Maintenance & Performance Overhaul (Phases 1-4) SHIPPED 2026-03-06</summary>
15+
<summary>v1.0 Maintenance & Performance Overhaul (Phases 1-4) -- SHIPPED 2026-03-06</summary>
1116

12-
- [x] Phase 1: Backend Architecture (3/3 plans) completed 2026-03-06
13-
- [x] Phase 2: H5MD Compliance (4/4 plans) completed 2026-03-06
14-
- [x] Phase 3: Contract Test Suite (4/4 plans) completed 2026-03-06
15-
- [x] Phase 4: Benchmarks & Performance (2/2 plans) completed 2026-03-06
17+
- [x] Phase 1: Backend Architecture (3/3 plans) -- completed 2026-03-06
18+
- [x] Phase 2: H5MD Compliance (4/4 plans) -- completed 2026-03-06
19+
- [x] Phase 3: Contract Test Suite (4/4 plans) -- completed 2026-03-06
20+
- [x] Phase 4: Benchmarks & Performance (2/2 plans) -- completed 2026-03-06
1621

1722
Full details: `.planning/milestones/v1.0-ROADMAP.md`
1823

1924
</details>
2025

26+
### v0.3.1 CI Benchmark Infrastructure (In Progress)
27+
28+
**Milestone Goal:** Automated benchmark tracking in CI with PR regression feedback and a public GitHub Pages dashboard.
29+
30+
- [x] **Phase 5: Benchmark Pipeline** - gh-pages branch, benchmark workflow job, auto-push on main, release snapshots (completed 2026-03-09)
31+
- [ ] **Phase 6: PR Feedback** - PR comparison comments, configurable alert threshold, fail-on-regression gate
32+
- [ ] **Phase 7: Dashboard and README** - Chart.js dashboard with project docs, README live figures, data growth limits
33+
34+
## Phase Details
35+
36+
### Phase 5: Benchmark Pipeline
37+
**Goal**: Every push to main and every release tag produces benchmark results stored on gh-pages, building a historical baseline
38+
**Depends on**: Nothing (first phase of v0.3.1)
39+
**Requirements**: CI-01, CI-02, CI-03, CI-04
40+
**Success Criteria** (what must be TRUE):
41+
1. gh-pages branch exists and GitHub Pages serves content from it
42+
2. Pushing a commit to main triggers a post-matrix benchmark job that stores results on gh-pages
43+
3. Opening or updating a PR does NOT push benchmark data to gh-pages
44+
4. Tagging a release triggers a benchmark snapshot committed to gh-pages
45+
**Plans**: 1 plan
46+
47+
Plans:
48+
- [ ] 05-01-PLAN.md — Create benchmark.yml workflow, clean up tests.yml and legacy files
49+
50+
### Phase 6: PR Feedback
51+
**Goal**: PR authors see benchmark comparison results and regressions block merge
52+
**Depends on**: Phase 5 (baseline data must exist on gh-pages)
53+
**Requirements**: PR-01, PR-02, PR-03
54+
**Success Criteria** (what must be TRUE):
55+
1. PRs receive a comment with a full benchmark comparison table showing deltas (regressions and improvements) vs main
56+
2. The alert threshold percentage is configurable in the workflow YAML (default 150%)
57+
3. A PR with a benchmark regression beyond the threshold is blocked from merging
58+
**Plans**: 1 plan
59+
60+
Plans:
61+
- [ ] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml
62+
63+
### Phase 7: Dashboard and README
64+
**Goal**: Users can view benchmark trends over time on a public dashboard and see live figures in the README
65+
**Depends on**: Phase 5 (dashboard auto-generated by github-action-benchmark)
66+
**Requirements**: DASH-01, DASH-02, DASH-03
67+
**Success Criteria** (what must be TRUE):
68+
1. GitHub Pages serves a Chart.js time-series dashboard with project description, usage, and links
69+
2. README displays live benchmark figures sourced from GitHub Pages, replacing any static visualization PNGs
70+
3. max-items-in-chart is configured to limit data growth on gh-pages
71+
**Plans**: TBD
72+
73+
Plans:
74+
- [ ] 07-01: TBD
75+
76+
### Phase 8: Fix failing tests in Redis/Mongo backends (test isolation)
77+
**Goal:** MongoDB and Redis contract tests pass reliably with per-test data isolation via unique group names
78+
**Depends on:** Nothing (independent bugfix)
79+
**Requirements**: ISO-01, ISO-02, ISO-03
80+
**Success Criteria** (what must be TRUE):
81+
1. MongoDB tests pass without data leaking between tests
82+
2. Redis tests pass without data leaking between tests
83+
3. All other backend tests remain green (no regressions)
84+
**Plans**: 1 plan
85+
86+
Plans:
87+
- [x] 08-01-PLAN.md — Add unique group= to all facade fixtures for per-test isolation
88+
2189
## Progress
2290

91+
**Execution Order:**
92+
Phases execute in numeric order: 5 -> 6 -> 7
93+
2394
| Phase | Milestone | Plans Complete | Status | Completed |
2495
|-------|-----------|----------------|--------|-----------|
2596
| 1. Backend Architecture | v1.0 | 3/3 | Complete | 2026-03-06 |
2697
| 2. H5MD Compliance | v1.0 | 4/4 | Complete | 2026-03-06 |
2798
| 3. Contract Test Suite | v1.0 | 4/4 | Complete | 2026-03-06 |
2899
| 4. Benchmarks & Performance | v1.0 | 2/2 | Complete | 2026-03-06 |
100+
| 5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | - |
101+
| 6. PR Feedback | v0.3.1 | 0/1 | Not started | - |
102+
| 7. Dashboard and README | v0.3.1 | 0/? | Not started | - |
103+
| 8. Test Isolation Fix | Maintenance | 1/1 | Complete | 2026-03-09 |

0 commit comments

Comments
 (0)