From f3b9cbb03639e7ec9af07d80e8ab898a50718004 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Mon, 9 Mar 2026 22:47:22 +0100
Subject: [PATCH 01/16] docs(06-01): complete PR feedback plan

- Add 06-01-SUMMARY.md with execution results
- Update STATE.md with phase 6 completion and decisions
- Update ROADMAP.md with phase 6 progress
- Mark PR-01, PR-02, PR-03 requirements complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .planning/REQUIREMENTS.md                     |  14 +--
 .planning/ROADMAP.md                          |   6 +-
 .planning/STATE.md                            |  17 +--
 .../phases/06-pr-feedback/06-01-SUMMARY.md    | 103 ++++++++++++++++++
 4 files changed, 123 insertions(+), 17 deletions(-)
 create mode 100644 .planning/phases/06-pr-feedback/06-01-SUMMARY.md

diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md
index 3401cf6..7bde08e 100644
--- a/.planning/REQUIREMENTS.md
+++ b/.planning/REQUIREMENTS.md
@@ -16,9 +16,9 @@ Requirements for CI benchmark infrastructure milestone. Each maps to roadmap pha
 
 ### PR Feedback
 
-- [ ] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements
-- [ ] **PR-02**: Alert threshold is configurable (starting at 150%)
-- [ ] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression
+- [x] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements
+- [x] **PR-02**: Alert threshold is configurable (starting at 150%)
+- [x] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression
 
 ### Dashboard
 
@@ -66,9 +66,9 @@ Which phases cover which requirements. Updated during roadmap creation.
 | CI-02 | Phase 5 | Complete |
 | CI-03 | Phase 5 | Complete |
 | CI-04 | Phase 5 | Complete |
-| PR-01 | Phase 6 | Pending |
-| PR-02 | Phase 6 | Pending |
-| PR-03 | Phase 6 | Pending |
+| PR-01 | Phase 6 | Complete |
+| PR-02 | Phase 6 | Complete |
+| PR-03 | Phase 6 | Complete |
 | DASH-01 | Phase 7 | Pending |
 | DASH-02 | Phase 7 | Pending |
 | DASH-03 | Phase 7 | Pending |
@@ -84,4 +84,4 @@ Which phases cover which requirements. Updated during roadmap creation.
 
 ---
 *Requirements defined: 2026-03-09*
-*Last updated: 2026-03-09 after phase 8 planning*
+*Last updated: 2026-03-09 after phase 6 completion*
diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
index 6ad9f0b..91405d9 100644
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -28,7 +28,7 @@ Full details: `.planning/milestones/v1.0-ROADMAP.md`
 **Milestone Goal:** Automated benchmark tracking in CI with PR regression feedback and a public GitHub Pages dashboard.
 
 - [x] **Phase 5: Benchmark Pipeline** - gh-pages branch, benchmark workflow job, auto-push on main, release snapshots (completed 2026-03-09)
-- [ ] **Phase 6: PR Feedback** - PR comparison comments, configurable alert threshold, fail-on-regression gate
+- [x] **Phase 6: PR Feedback** - PR comparison comments, configurable alert threshold, fail-on-regression gate (completed 2026-03-09)
 - [ ] **Phase 7: Dashboard and README** - Chart.js dashboard with project docs, README live figures, data growth limits
 
 ## Phase Details
@@ -58,7 +58,7 @@ Plans:
 **Plans**: 1 plan
 
 Plans:
-- [ ] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml
+- [x] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml
 
 ### Phase 7: Dashboard and README
 **Goal**: Users can view benchmark trends over time on a public dashboard and see live figures in the README
@@ -98,6 +98,6 @@ Phases execute in numeric order: 5 -> 6 -> 7
 | 3. Contract Test Suite | v1.0 | 4/4 | Complete | 2026-03-06 |
 | 4. Benchmarks & Performance | v1.0 | 2/2 | Complete | 2026-03-06 |
 | 5. Benchmark Pipeline | 1/1 | Complete   | 2026-03-09 | - |
-| 6. PR Feedback | v0.3.1 | 0/1 | Not started | - |
+| 6. PR Feedback | v0.3.1 | 1/1 | Complete | 2026-03-09 |
 | 7. Dashboard and README | v0.3.1 | 0/? | Not started | - |
 | 8. Test Isolation Fix | Maintenance | 1/1 | Complete | 2026-03-09 |
diff --git a/.planning/STATE.md b/.planning/STATE.md
index 8b26404..fcce7a8 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -25,10 +25,10 @@ See: .planning/PROJECT.md (updated 2026-03-09)
 
 ## Current Position
 
-Phase: 8 of 8 (Test Isolation Fix) -- maintenance
+Phase: 6 of 8 (PR Feedback)
 Plan: 1 of 1 (complete)
-Status: Phase 8 complete
-Last activity: 2026-03-09 - Completed quick task 1: Make MongoDB backend cache_ttl configurable
+Status: Phase 6 complete
+Last activity: 2026-03-09 - Completed 06-01: PR benchmark comparison and fail-on-regression gate
 
 Progress: [██████████] 100%
 
@@ -49,6 +49,9 @@ Recent: github-action-benchmark selected as sole CI benchmark tool (research pha
 - Single Python 3.13 for benchmarks -- consistent baseline (CI-02, 05-01)
 - No separate release/tag trigger -- main pushes cover it (CI-04, 05-01)
 - Uniform group= on all backends, no conditional logic per backend type (08-01)
+- Dual benchmark-action steps: main auto-push vs PR compare-only (06-01)
+- 150% alert threshold as configurable YAML value (06-01)
+- Branch protection documented as manual one-time setup (06-01)
 
 ### Pending Todos
 
@@ -67,10 +70,10 @@ None.
 
 | # | Description | Date | Commit | Directory |
 |---|-------------|------|--------|-----------|
-| 1 | Make MongoDB backend cache_ttl configurable with None meaning no caching | 2026-03-09 | pending | [1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/) |
+| 1 | Make MongoDB backend cache_ttl configurable with None meaning no caching | 2026-03-09 | 4848760 | [1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/) |
 
 ## Session Continuity
 
-Last session: 2026-03-09T20:55:36Z
-Stopped at: Completed 08-01-PLAN.md
-Next action: Next phase or plan
+Last session: 2026-03-09T21:23:00Z
+Stopped at: Completed 06-01-PLAN.md
+Next action: Phase 7 (Dashboard and README)
diff --git a/.planning/phases/06-pr-feedback/06-01-SUMMARY.md b/.planning/phases/06-pr-feedback/06-01-SUMMARY.md
new file mode 100644
index 0000000..58eef25
--- /dev/null
+++ b/.planning/phases/06-pr-feedback/06-01-SUMMARY.md
@@ -0,0 +1,103 @@
+---
+phase: 06-pr-feedback
+plan: 01
+subsystem: infra
+tags: [github-actions, benchmarks, ci, pr-feedback, regression-gate]
+
+# Dependency graph
+requires:
+  - phase: 05-benchmark-pipeline
+    provides: "gh-pages baseline data and benchmark.yml workflow"
+provides:
+  - "PR benchmark comparison with Job Summary tables"
+  - "Fail-on-regression gate at configurable 150% threshold"
+  - "Concurrency group for PR benchmark runs"
+affects: [07-dashboard-and-readme]
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns: ["Dual-step benchmark action (main auto-push vs PR compare-only)"]
+
+key-files:
+  created: []
+  modified: [".github/workflows/benchmark.yml"]
+
+key-decisions:
+  - "Two separate benchmark-action steps (main vs PR) because GitHub Actions cannot conditionally set with: inputs"
+  - "PR step uses auto-push: false and save-data-file: false to avoid polluting gh-pages"
+  - "Alert threshold set to 150% as configurable YAML value"
+
+patterns-established:
+  - "Dual-path CI pattern: same workflow file handles both main push (deploy) and PR (compare-only) via event_name conditionals"
+
+requirements-completed: [PR-01, PR-02, PR-03]
+
+# Metrics
+duration: 3min
+completed: 2026-03-09
+---
+
+# Phase 6 Plan 01: PR Feedback Summary
+
+**PR benchmark comparison via dual-step github-action-benchmark with 150% fail-on-regression gate and Job Summary tables**
+
+## Performance
+
+- **Duration:** 3 min
+- **Started:** 2026-03-09T21:20:00Z
+- **Completed:** 2026-03-09T21:23:00Z
+- **Tasks:** 2
+- **Files modified:** 1
+
+## Accomplishments
+- Added pull_request trigger (opened, synchronize) to benchmark.yml alongside existing workflow_run
+- Added concurrency group to cancel in-progress PR benchmark runs
+- Split benchmark-action into main (auto-push to gh-pages) and PR (compare-only) steps
+- PR step configured with summary-always, comment-on-alert, fail-on-alert at 150% threshold
+- Documented branch protection setup for merge gate enforcement
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Add PR trigger, concurrency, and comparison step to benchmark.yml** - `5c674d7` (feat) -- merged in `765174a`
+2. **Task 2: Verify PR benchmark comparison workflow** - checkpoint:human-verify, approved
+
+## Files Created/Modified
+- `.github/workflows/benchmark.yml` - Added PR trigger, concurrency, dual benchmark-action steps (main auto-push vs PR compare-only)
+
+## Decisions Made
+- Two separate benchmark-action steps (main vs PR) because GitHub Actions cannot conditionally set `with:` inputs
+- PR step uses `auto-push: false` and `save-data-file: false` to avoid polluting gh-pages
+- Alert threshold set to 150% as a configurable YAML value (easy to adjust)
+- Branch protection documented but left as manual one-time setup (Settings > Branches)
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+None.
+
+## User Setup Required
+
+**Branch protection (one-time manual setup):**
+To enforce the merge gate (PR-03), go to GitHub Settings > Branches > Add rule:
+- Branch name pattern: `main`
+- Check "Require status checks to pass before merging"
+- Search for and select "Benchmarks"
+
+## Next Phase Readiness
+- PR feedback infrastructure complete, ready for Phase 7 (Dashboard and README)
+- gh-pages data accumulates on main pushes; dashboard can visualize it
+- No blockers
+
+## Self-Check: PASSED
+
+- SUMMARY.md: FOUND
+- Commit 765174a: FOUND
+
+---
+*Phase: 06-pr-feedback*
+*Completed: 2026-03-09*

From 3c00017dcedf8e55eb8d1ff056937b7c6aa845d0 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Mon, 9 Mar 2026 22:53:03 +0100
Subject: [PATCH 02/16] docs(phase-06): complete phase execution

---
 .planning/STATE.md                            |  12 +-
 .../phases/06-pr-feedback/06-VERIFICATION.md  | 103 ++++++++++++++++++
 2 files changed, 109 insertions(+), 6 deletions(-)
 create mode 100644 .planning/phases/06-pr-feedback/06-VERIFICATION.md

diff --git a/.planning/STATE.md b/.planning/STATE.md
index fcce7a8..4ecc422 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -3,14 +3,14 @@ gsd_state_version: 1.0
 milestone: v1.0
 milestone_name: milestone
 status: completed
-stopped_at: Phase 8 context gathered
-last_updated: "2026-03-09T20:46:43.140Z"
-last_activity: 2026-03-09 -- Completed 05-01 benchmark pipeline
+stopped_at: Completed 06-01-PLAN.md
+last_updated: "2026-03-09T21:53:00.062Z"
+last_activity: "2026-03-09 - Completed 06-01: PR benchmark comparison and fail-on-regression gate"
 progress:
   total_phases: 4
-  completed_phases: 1
-  total_plans: 2
-  completed_plans: 1
+  completed_phases: 3
+  total_plans: 3
+  completed_plans: 3
   percent: 100
 ---
 
diff --git a/.planning/phases/06-pr-feedback/06-VERIFICATION.md b/.planning/phases/06-pr-feedback/06-VERIFICATION.md
new file mode 100644
index 0000000..0e9018f
--- /dev/null
+++ b/.planning/phases/06-pr-feedback/06-VERIFICATION.md
@@ -0,0 +1,103 @@
+---
+phase: 06-pr-feedback
+verified: 2026-03-09T22:00:00Z
+status: passed
+score: 5/5 must-haves verified
+re_verification: false
+must_haves:
+  truths:
+    - "PRs receive a full benchmark comparison table in GitHub Actions Job Summary"
+    - "Alert threshold is configurable and defaults to 150%"
+    - "A PR with a benchmark regression beyond threshold fails the workflow check"
+    - "PR benchmark runs do NOT push data to gh-pages"
+    - "Main push behavior is unchanged from Phase 5"
+  artifacts:
+    - path: ".github/workflows/benchmark.yml"
+      provides: "Combined main+PR benchmark workflow"
+      contains: "pull_request"
+  key_links:
+    - from: ".github/workflows/benchmark.yml"
+      to: "gh-pages branch /dev/bench/"
+      via: "github-action-benchmark fetches baseline for comparison"
+      pattern: "gh-pages-branch: gh-pages"
+---
+
+# Phase 6: PR Feedback Verification Report
+
+**Phase Goal:** PR authors see benchmark comparison results and regressions block merge
+**Verified:** 2026-03-09T22:00:00Z
+**Status:** passed
+**Re-verification:** No -- initial verification
+
+## Goal Achievement
+
+### Observable Truths
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | PRs receive a full benchmark comparison table in GitHub Actions Job Summary | VERIFIED | `summary-always: true` on line 105 of benchmark.yml, inside `if: github.event_name == 'pull_request'` step |
+| 2 | Alert threshold is configurable and defaults to 150% | VERIFIED | `alert-threshold: "150%"` on line 108, plain YAML value easily editable |
+| 3 | A PR with a benchmark regression beyond threshold fails the workflow check | VERIFIED | `fail-on-alert: true` on line 107; branch protection documented in header comments (lines 10-11) |
+| 4 | PR benchmark runs do NOT push data to gh-pages | VERIFIED | `auto-push: false` (line 103) and `save-data-file: false` (line 104) in PR step |
+| 5 | Main push behavior is unchanged from Phase 5 | VERIFIED | Main step (lines 83-92) retains `auto-push: true`, guarded by `if: github.event_name == 'workflow_run'`; `workflow_run` trigger preserved (lines 23-26) |
+
+**Score:** 5/5 truths verified
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `.github/workflows/benchmark.yml` | Combined main+PR benchmark workflow | VERIFIED | 109 lines, contains both `workflow_run` and `pull_request` triggers, dual benchmark-action steps with event-based conditionals |
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `.github/workflows/benchmark.yml` | gh-pages branch `/dev/bench/` | github-action-benchmark fetches baseline | WIRED | Both main and PR steps reference `gh-pages-branch: gh-pages` and `benchmark-data-dir-path: dev/bench`; PR step fetches baseline for comparison without pushing |
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|------------|-------------|--------|----------|
+| PR-01 | 06-01-PLAN | PRs receive full benchmark comparison summary with deltas | SATISFIED | `summary-always: true` and `comment-on-alert: true` in PR step |
+| PR-02 | 06-01-PLAN | Alert threshold configurable at 150% | SATISFIED | `alert-threshold: "150%"` as plain YAML value |
+| PR-03 | 06-01-PLAN | Fail-on-regression gate blocks PR merge | SATISFIED | `fail-on-alert: true` in PR step; branch protection setup documented in header comments |
+
+No orphaned requirements -- REQUIREMENTS.md maps PR-01, PR-02, PR-03 to Phase 6, and all three are claimed in 06-01-PLAN.md.
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| (none) | - | - | - | No anti-patterns detected |
+
+### Human Verification Required
+
+### 1. PR Job Summary Table Rendering
+
+**Test:** Open a PR against main and wait for the Benchmarks workflow to complete. Check the Job Summary tab.
+**Expected:** A full comparison table showing all benchmarks with current vs baseline values and ratios.
+**Why human:** Cannot programmatically verify GitHub Actions renders the Job Summary table correctly; depends on gh-pages baseline data existing.
+
+### 2. Fail-on-Regression Gate
+
+**Test:** Temporarily lower `alert-threshold` to `"100%"` and push a commit with an intentionally slower benchmark.
+**Expected:** The Benchmarks workflow fails with a regression alert, blocking the PR merge check.
+**Why human:** Requires actual CI execution with a regression to trigger the gate.
+
+### 3. Branch Protection Configuration
+
+**Test:** Navigate to Settings > Branches > Branch protection rules for `main`. Add "Benchmarks" as a required status check.
+**Expected:** PRs with failing Benchmarks workflow cannot be merged.
+**Why human:** Branch protection is a manual one-time GitHub repo setting, not automatable via workflow YAML.
+
+### Gaps Summary
+
+No gaps found. All five observable truths are verified in the codebase. The workflow file contains both main (auto-push) and PR (compare-only with fail gate) paths, correctly separated by `github.event_name` conditionals. The commit `5c674d7` (merged in `765174a`) is present in the repository.
+
+The only items requiring human verification are runtime behaviors (Job Summary rendering, actual regression detection) and the one-time branch protection setup, which are inherent to CI workflow changes and cannot be verified statically.
+
+---
+
+_Verified: 2026-03-09T22:00:00Z_
+_Verifier: Claude (gsd-verifier)_

From 4d402dccfa4b273572350af7afea7228ca1b5c3f Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:28:16 +0100
Subject: [PATCH 03/16] docs(07): capture phase context

---
 .../07-dashboard-and-readme/07-CONTEXT.md     | 75 +++++++++++++++++++
 1 file changed, 75 insertions(+)
 create mode 100644 .planning/phases/07-dashboard-and-readme/07-CONTEXT.md

diff --git a/.planning/phases/07-dashboard-and-readme/07-CONTEXT.md b/.planning/phases/07-dashboard-and-readme/07-CONTEXT.md
new file mode 100644
index 0000000..04c798d
--- /dev/null
+++ b/.planning/phases/07-dashboard-and-readme/07-CONTEXT.md
@@ -0,0 +1,75 @@
+# Phase 7: Dashboard and README - Context
+
+**Gathered:** 2026-03-10
+**Status:** Ready for planning
+
+<domain>
+## Phase Boundary
+
+Users can view benchmark trends over time on a public GitHub Pages dashboard and see live figures in the README. Static PNG references in the README are replaced with a dashboard link. Data growth on gh-pages is limited via max-items-in-chart.
+
+</domain>
+
+<decisions>
+## Implementation Decisions
+
+### Dashboard page content
+- Custom wrapper index.html at gh-pages root with project name, one-line description, install command, and links (repo, PyPI, benchmark charts at /dev/bench/)
+- Minimal content — fits in a single screen, just links and a tagline
+- Links to /dev/bench/ for charts (no iframe embedding)
+- Commit index.html once to gh-pages manually — github-action-benchmark only writes to /dev/bench/ so root is untouched by CI
+
+### README benchmark figures
+- Replace all static PNG references in README Benchmarks section with a "View benchmark dashboard" link to GitHub Pages
+- Keep summary text describing what's benchmarked (backends, operations, dataset matrix)
+- Dashboard link only — no mention of PR comparison feature in README
+- Keep docs/benchmark_*.png files and docs/visualize_benchmarks.py in the repo (used for publication/local exploration) — just remove PNG image references from README
+
+### Data growth limits
+- max-items-in-chart: 200 (covers ~6 months of history at daily pushes)
+- Apply only to the main (store) step — PR comparison step doesn't write data so the setting is a no-op there
+
+### Claude's Discretion
+- Exact HTML/CSS styling for the landing page
+- How to structure the dashboard link in README (badge vs plain link vs button-style)
+- Whether to add a brief "Contributing" note about benchmark checks on PRs
+
+</decisions>
+
+<specifics>
+## Specific Ideas
+
+- User wants docs/benchmark_*.png and visualize_benchmarks.py kept for potential publication use — these are not deleted, just dereferenced from README
+- Landing page should be minimal — not a condensed README, just the essentials to orient visitors
+
+</specifics>
+
+<code_context>
+## Existing Code Insights
+
+### Reusable Assets
+- `.github/workflows/benchmark.yml` — existing workflow with main (store) and PR (compare) steps; max-items-in-chart needs to be added to the main step
+- gh-pages branch at `dev/bench/` — auto-generated Chart.js dashboard already served by GitHub Pages
+
+### Established Patterns
+- `benchmark-action/github-action-benchmark@v1` with `tool: "pytest"`, `gh-pages-branch: gh-pages`, `benchmark-data-dir-path: dev/bench`
+- GITHUB_TOKEN for gh-pages pushes (no deploy key)
+
+### Integration Points
+- `.github/workflows/benchmark.yml` — add `max-items-in-chart: 200` to the main store step
+- `README.md` — replace Benchmarks section PNG references with dashboard link
+- gh-pages branch root — commit new index.html (one-time, outside normal CI flow)
+
+</code_context>
+
+<deferred>
+## Deferred Ideas
+
+None — discussion stayed within phase scope
+
+</deferred>
+
+---
+
+*Phase: 07-dashboard-and-readme*
+*Context gathered: 2026-03-10*

From 6435d154e14e38283359e91930b6094aaaa97498 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:28:22 +0100
Subject: [PATCH 04/16] docs(state): record phase 7 context session

---
 .planning/STATE.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.planning/STATE.md b/.planning/STATE.md
index 4ecc422..2a302e5 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -3,8 +3,8 @@ gsd_state_version: 1.0
 milestone: v1.0
 milestone_name: milestone
 status: completed
-stopped_at: Completed 06-01-PLAN.md
-last_updated: "2026-03-09T21:53:00.062Z"
+stopped_at: Phase 7 context gathered
+last_updated: "2026-03-10T12:28:19.695Z"
 last_activity: "2026-03-09 - Completed 06-01: PR benchmark comparison and fail-on-regression gate"
 progress:
   total_phases: 4
@@ -74,6 +74,6 @@ None.
 
 ## Session Continuity
 
-Last session: 2026-03-09T21:23:00Z
-Stopped at: Completed 06-01-PLAN.md
+Last session: 2026-03-10T12:28:19.687Z
+Stopped at: Phase 7 context gathered
 Next action: Phase 7 (Dashboard and README)

From 84fa645d3a8279bb94f7a6411026b95414f7d2ed Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:31:26 +0100
Subject: [PATCH 05/16] docs(07): research phase domain

---
 .../07-dashboard-and-readme/07-RESEARCH.md    | 263 ++++++++++++++++++
 1 file changed, 263 insertions(+)
 create mode 100644 .planning/phases/07-dashboard-and-readme/07-RESEARCH.md

diff --git a/.planning/phases/07-dashboard-and-readme/07-RESEARCH.md b/.planning/phases/07-dashboard-and-readme/07-RESEARCH.md
new file mode 100644
index 0000000..6ecec0b
--- /dev/null
+++ b/.planning/phases/07-dashboard-and-readme/07-RESEARCH.md
@@ -0,0 +1,263 @@
+# Phase 7: Dashboard and README - Research
+
+**Researched:** 2026-03-10
+**Domain:** GitHub Pages dashboard, README updates, github-action-benchmark configuration
+**Confidence:** HIGH
+
+## Summary
+
+Phase 7 is a documentation and configuration phase with no library code changes. It involves three distinct tasks: (1) adding `max-items-in-chart: 200` to the existing benchmark workflow, (2) creating a minimal landing page on gh-pages, and (3) replacing 10 static PNG image references in the README Benchmarks section with a dashboard link.
+
+The gh-pages branch already exists with `dev/bench/` auto-generated by github-action-benchmark. The root of gh-pages is untouched by CI, so a manually committed `index.html` will persist across benchmark auto-pushes. All changes are straightforward file edits with no dependency on external libraries or complex integrations.
+
+**Primary recommendation:** Execute as a single plan with three small tasks -- workflow YAML edit, gh-pages index.html commit, and README update.
+
+<user_constraints>
+## User Constraints (from CONTEXT.md)
+
+### Locked Decisions
+- Custom wrapper index.html at gh-pages root with project name, one-line description, install command, and links (repo, PyPI, benchmark charts at /dev/bench/)
+- Minimal content -- fits in a single screen, just links and a tagline
+- Links to /dev/bench/ for charts (no iframe embedding)
+- Commit index.html once to gh-pages manually -- github-action-benchmark only writes to /dev/bench/ so root is untouched by CI
+- Replace all static PNG references in README Benchmarks section with a "View benchmark dashboard" link to GitHub Pages
+- Keep summary text describing what's benchmarked (backends, operations, dataset matrix)
+- Dashboard link only -- no mention of PR comparison feature in README
+- Keep docs/benchmark_*.png files and docs/visualize_benchmarks.py in the repo (used for publication/local exploration) -- just remove PNG image references from README
+- max-items-in-chart: 200 (covers ~6 months of history at daily pushes)
+- Apply only to the main (store) step -- PR comparison step doesn't write data so the setting is a no-op there
+
+### Claude's Discretion
+- Exact HTML/CSS styling for the landing page
+- How to structure the dashboard link in README (badge vs plain link vs button-style)
+- Whether to add a brief "Contributing" note about benchmark checks on PRs
+
+### Deferred Ideas (OUT OF SCOPE)
+None -- discussion stayed within phase scope
+</user_constraints>
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| DASH-01 | GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs | gh-pages branch exists at origin/gh-pages, dev/bench/ has Chart.js dashboard from github-action-benchmark. Landing page index.html at root provides project docs with link to /dev/bench/. |
+| DASH-02 | README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs | 10 PNG references on lines 453-472 of README.md need to be replaced with a dashboard link. PNG files and visualize script stay in repo. |
+| DASH-03 | max-items-in-chart limits data growth on gh-pages | Verified `max-items-in-chart` input exists in github-action-benchmark action.yml. Accepts unsigned integer, no limit by default. Add to the "Store benchmark results (main)" step only. |
+</phase_requirements>
+
+## Standard Stack
+
+### Core
+| Tool | Version | Purpose | Why Standard |
+|------|---------|---------|--------------|
+| github-action-benchmark | v1 | Chart.js dashboard generation + benchmark tracking | Already in use, generates /dev/bench/ content |
+| GitHub Pages | N/A | Static site hosting from gh-pages branch | Already enabled, serves from gh-pages root |
+
+### Supporting
+No additional libraries needed. This phase involves only YAML config edits, HTML file creation, and README markdown edits.
+
+## Architecture Patterns
+
+### Current gh-pages Structure
+```
+gh-pages branch:
+├── dev/
+│   └── bench/
+│       ├── index.html    # Auto-generated Chart.js dashboard
+│       └── data.js       # Benchmark data (auto-updated by CI)
+└── (root is empty)       # Landing page goes here
+```
+
+### Target gh-pages Structure
+```
+gh-pages branch:
+├── index.html            # NEW: minimal landing page with project info + link to /dev/bench/
+├── dev/
+│   └── bench/
+│       ├── index.html    # Auto-generated (unchanged)
+│       └── data.js       # Auto-updated (unchanged, now capped at 200 items)
+```
+
+### Pattern 1: Landing Page as Static HTML
+**What:** A single self-contained index.html at the gh-pages root with inline CSS (no external dependencies)
+**When to use:** Always -- this is the only approach that works without a build step
+**Key points:**
+- Must be committed directly to gh-pages branch (not main)
+- CI only writes to `dev/bench/` so root files are never overwritten
+- No Jekyll processing needed (could add `.nojekyll` if needed, but plain HTML works)
+
+### Pattern 2: max-items-in-chart in Workflow YAML
+**What:** Adding `max-items-in-chart: 200` to the benchmark store step
+**Where:** `.github/workflows/benchmark.yml`, "Store benchmark results (main)" step only
+**Key points:**
+- Only applies to the step with `auto-push: true` and `save-data-file: true` (the main/store step)
+- The PR comparison step has `save-data-file: false` so max-items-in-chart is irrelevant there
+- When data points exceed 200, the oldest are removed before the commit to gh-pages
+
+### Anti-Patterns to Avoid
+- **Embedding charts in README via iframes or raw HTML:** GitHub's markdown renderer strips iframes and most HTML. Use a plain link instead.
+- **Committing index.html to main branch:** It must go on gh-pages. Committing to main would not be served by GitHub Pages.
+- **Deleting docs/benchmark_*.png files:** User explicitly wants these kept for publication use.
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| Benchmark charts | Custom Chart.js dashboard | github-action-benchmark auto-generated dashboard | Already working at /dev/bench/ |
+| Data pruning | Custom script to trim old data | `max-items-in-chart` input | Built into the action, handles it automatically |
+
+## Common Pitfalls
+
+### Pitfall 1: Forgetting .nojekyll
+**What goes wrong:** GitHub Pages runs Jekyll by default, which can ignore files starting with underscores or cause unexpected processing
+**Why it happens:** The auto-generated Chart.js dashboard may include files that Jekyll would skip
+**How to avoid:** Check if `.nojekyll` already exists on gh-pages; if not, add it when committing index.html
+**Warning signs:** Dashboard or landing page 404s despite being committed
+
+### Pitfall 2: Committing to Wrong Branch
+**What goes wrong:** index.html committed to main instead of gh-pages, so it's not served
+**Why it happens:** Normal workflow is committing to main
+**How to avoid:** Explicitly checkout gh-pages, commit, push. Or use `gh api` / git worktree approach
+**Warning signs:** File visible in main but 404 on GitHub Pages URL
+
+### Pitfall 3: Breaking Auto-Push with Manual Commit
+**What goes wrong:** Manual commit to gh-pages conflicts with github-action-benchmark auto-push
+**Why it happens:** Force-push or non-fast-forward push to gh-pages
+**How to avoid:** Fetch latest gh-pages before committing, use a regular push (not force). The action only writes to `dev/bench/` so there's no path conflict with root index.html
+
+### Pitfall 4: README Image References Left Behind
+**What goes wrong:** Some PNG references remain in README after editing
+**Why it happens:** There are 10 PNG references spread across lines 453-472, easy to miss one
+**How to avoid:** Remove the entire image-reference block (lines 451-472) and replace with a concise section containing the dashboard link
+**Warning signs:** Broken image icons in README if PNGs are ever moved
+
+## Code Examples
+
+### Workflow YAML Change (DASH-03)
+```yaml
+# In .github/workflows/benchmark.yml, "Store benchmark results (main)" step
+# Add max-items-in-chart: 200 to the with: block
+      - name: Store benchmark results (main)
+        if: github.event_name == 'workflow_run'
+        uses: benchmark-action/github-action-benchmark@v1
+        with:
+          tool: "pytest"
+          output-file-path: benchmark_results.json
+          gh-pages-branch: gh-pages
+          benchmark-data-dir-path: dev/bench
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          auto-push: true
+          max-items-in-chart: 200
+```
+
+### Landing Page HTML (DASH-01)
+```html
+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="utf-8">
+  <meta name="viewport" content="width=device-width, initial-scale=1">
+  <title>asebytes</title>
+  <style>
+    /* Claude's discretion: minimal, clean styling */
+    body { font-family: system-ui, sans-serif; max-width: 600px; margin: 80px auto; padding: 0 20px; color: #333; }
+    h1 { margin-bottom: 0.25em; }
+    .tagline { color: #666; margin-top: 0; }
+    code { background: #f4f4f4; padding: 2px 6px; border-radius: 3px; }
+    .links { margin-top: 2em; }
+    .links a { display: inline-block; margin-right: 1.5em; color: #0366d6; }
+  </style>
+</head>
+<body>
+  <h1>asebytes</h1>
+  <p class="tagline">Storage-agnostic, lazy-loading data layer for atomistic simulations</p>
+  <p><code>pip install asebytes</code></p>
+  <div class="links">
+    <a href="https://github.com/zincware/asebytes">GitHub</a>
+    <a href="https://pypi.org/project/asebytes/">PyPI</a>
+    <a href="./dev/bench/">Benchmark Dashboard</a>
+  </div>
+</body>
+</html>
+```
+
+### README Benchmarks Section Replacement (DASH-02)
+The current Benchmarks section (lines 441-472) contains descriptive text followed by 10 PNG image embeds across 6 subsections. Replace the image subsections with a single dashboard link while keeping the introductory description.
+
+**Before (lines 451-472):**
+```markdown
+### Write
+![Write Trajectory](docs/benchmark_write_trajectory.png)
+![Write Single](docs/benchmark_write_single.png)
+### Read
+... (8 more PNG references)
+```
+
+**After:**
+```markdown
+[View benchmark dashboard](https://zincware.github.io/asebytes/dev/bench/)
+```
+
+The GitHub Pages URL for this repo is: `https://zincware.github.io/asebytes/`
+
+## State of the Art
+
+No technology changes relevant to this phase. github-action-benchmark v1 with `max-items-in-chart` is the current stable approach. The action uses Node 20 runtime.
+
+## Open Questions
+
+1. **Does .nojekyll exist on gh-pages?**
+   - What we know: github-action-benchmark may or may not create it
+   - What's unclear: Whether it's needed for the landing page to work
+   - Recommendation: Check during implementation; add if missing (zero-cost insurance)
+
+2. **Exact GitHub Pages URL**
+   - What we know: Repo is `zincware/asebytes`, so URL should be `https://zincware.github.io/asebytes/`
+   - What's unclear: Whether GitHub Pages is actually enabled and serving
+   - Recommendation: Verify with `gh api repos/zincware/asebytes/pages` during implementation
+
+## Validation Architecture
+
+### Test Framework
+| Property | Value |
+|----------|-------|
+| Framework | Manual verification (no automated tests for docs/CI config changes) |
+| Config file | N/A |
+| Quick run command | `gh api repos/zincware/asebytes/pages` to check Pages status |
+| Full suite command | Manual: visit GitHub Pages URL and verify dashboard loads |
+
+### Phase Requirements to Test Map
+| Req ID | Behavior | Test Type | Automated Command | File Exists? |
+|--------|----------|-----------|-------------------|-------------|
+| DASH-01 | Landing page served at GitHub Pages root | manual-only | Visit `https://zincware.github.io/asebytes/` | N/A |
+| DASH-02 | README has dashboard link, no PNG embeds | smoke | `grep -c '\.png' README.md` should return 0 in Benchmarks section | N/A |
+| DASH-03 | max-items-in-chart set to 200 | smoke | `grep 'max-items-in-chart' .github/workflows/benchmark.yml` | N/A |
+
+### Sampling Rate
+- **Per task commit:** Grep-based smoke checks on modified files
+- **Per wave merge:** N/A (single wave)
+- **Phase gate:** Manual verification of GitHub Pages URL after gh-pages push
+
+### Wave 0 Gaps
+None -- no test infrastructure needed. This phase is documentation and configuration only. Verification is via grep checks and manual URL inspection.
+
+## Sources
+
+### Primary (HIGH confidence)
+- [github-action-benchmark action.yml](https://github.com/benchmark-action/github-action-benchmark/blob/master/action.yml) - Verified `max-items-in-chart` input definition via GitHub API
+- Existing `.github/workflows/benchmark.yml` in repo - Current workflow configuration
+- Existing `README.md` in repo - Current benchmark section with 10 PNG references
+
+### Secondary (MEDIUM confidence)
+- [github-action-benchmark repo](https://github.com/benchmark-action/github-action-benchmark) - General documentation on Chart.js dashboard generation
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH - already using github-action-benchmark, just adding one config key
+- Architecture: HIGH - gh-pages structure verified, changes are additive
+- Pitfalls: HIGH - well-known GitHub Pages patterns, verified action.yml inputs
+
+**Research date:** 2026-03-10
+**Valid until:** 2026-04-10 (stable tooling, unlikely to change)

From edf75e8567de3919d7b8ed28d8c14c008916f231 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:32:07 +0100
Subject: [PATCH 06/16] docs(phase-7): add validation strategy

---
 .../07-dashboard-and-readme/07-VALIDATION.md  | 72 +++++++++++++++++++
 1 file changed, 72 insertions(+)
 create mode 100644 .planning/phases/07-dashboard-and-readme/07-VALIDATION.md

diff --git a/.planning/phases/07-dashboard-and-readme/07-VALIDATION.md b/.planning/phases/07-dashboard-and-readme/07-VALIDATION.md
new file mode 100644
index 0000000..6a25135
--- /dev/null
+++ b/.planning/phases/07-dashboard-and-readme/07-VALIDATION.md
@@ -0,0 +1,72 @@
+---
+phase: 7
+slug: dashboard-and-readme
+status: draft
+nyquist_compliant: false
+wave_0_complete: false
+created: 2026-03-10
+---
+
+# Phase 7 — Validation Strategy
+
+> Per-phase validation contract for feedback sampling during execution.
+
+---
+
+## Test Infrastructure
+
+| Property | Value |
+|----------|-------|
+| **Framework** | Manual verification + grep smoke checks (docs/CI config phase) |
+| **Config file** | N/A |
+| **Quick run command** | `grep 'max-items-in-chart' .github/workflows/benchmark.yml` |
+| **Full suite command** | Manual: visit `https://zincware.github.io/asebytes/` and verify dashboard loads |
+| **Estimated runtime** | ~5 seconds (grep checks) |
+
+---
+
+## Sampling Rate
+
+- **After every task commit:** Run grep-based smoke checks on modified files
+- **After every plan wave:** N/A (single wave expected)
+- **Before `/gsd:verify-work`:** Full manual verification of GitHub Pages URL after gh-pages push
+- **Max feedback latency:** 5 seconds
+
+---
+
+## Per-Task Verification Map
+
+| Task ID | Plan | Wave | Requirement | Test Type | Automated Command | File Exists | Status |
+|---------|------|------|-------------|-----------|-------------------|-------------|--------|
+| 07-01-01 | 01 | 1 | DASH-03 | smoke | `grep 'max-items-in-chart' .github/workflows/benchmark.yml` | N/A | ⬜ pending |
+| 07-01-02 | 01 | 1 | DASH-01 | manual-only | Visit `https://zincware.github.io/asebytes/` | N/A | ⬜ pending |
+| 07-01-03 | 01 | 1 | DASH-02 | smoke | `grep -c '\.png' README.md` (Benchmarks section should be 0) | N/A | ⬜ pending |
+
+*Status: ⬜ pending · ✅ green · ❌ red · ⚠️ flaky*
+
+---
+
+## Wave 0 Requirements
+
+*Existing infrastructure covers all phase requirements. No test infrastructure needed — this phase is documentation and configuration only.*
+
+---
+
+## Manual-Only Verifications
+
+| Behavior | Requirement | Why Manual | Test Instructions |
+|----------|-------------|------------|-------------------|
+| GitHub Pages serves dashboard at root URL | DASH-01 | Requires browser/network access to verify live site | Visit `https://zincware.github.io/asebytes/`, confirm Chart.js dashboard loads with project description, usage, and links |
+
+---
+
+## Validation Sign-Off
+
+- [ ] All tasks have `<automated>` verify or Wave 0 dependencies
+- [ ] Sampling continuity: no 3 consecutive tasks without automated verify
+- [ ] Wave 0 covers all MISSING references
+- [ ] No watch-mode flags
+- [ ] Feedback latency < 5s
+- [ ] `nyquist_compliant: true` set in frontmatter
+
+**Approval:** pending

From 97d43da45bf751ec5a6a7b4ec20acaeb709ef11d Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:33:59 +0100
Subject: [PATCH 07/16] docs(07): create phase plan

---
 .planning/ROADMAP.md                          |   6 +-
 .../07-dashboard-and-readme/07-01-PLAN.md     | 151 ++++++++++++++++++
 2 files changed, 154 insertions(+), 3 deletions(-)
 create mode 100644 .planning/phases/07-dashboard-and-readme/07-01-PLAN.md

diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
index 91405d9..369c498 100644
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -68,10 +68,10 @@ Plans:
   1. GitHub Pages serves a Chart.js time-series dashboard with project description, usage, and links
   2. README displays live benchmark figures sourced from GitHub Pages, replacing any static visualization PNGs
   3. max-items-in-chart is configured to limit data growth on gh-pages
-**Plans**: TBD
+**Plans**: 1 plan
 
 Plans:
-- [ ] 07-01: TBD
+- [ ] 07-01-PLAN.md — Add max-items-in-chart to workflow, create gh-pages landing page, replace README PNG embeds with dashboard link
 
 ### Phase 8: Fix failing tests in Redis/Mongo backends (test isolation)
 **Goal:** MongoDB and Redis contract tests pass reliably with per-test data isolation via unique group names
@@ -99,5 +99,5 @@ Phases execute in numeric order: 5 -> 6 -> 7
 | 4. Benchmarks & Performance | v1.0 | 2/2 | Complete | 2026-03-06 |
 | 5. Benchmark Pipeline | 1/1 | Complete   | 2026-03-09 | - |
 | 6. PR Feedback | v0.3.1 | 1/1 | Complete | 2026-03-09 |
-| 7. Dashboard and README | v0.3.1 | 0/? | Not started | - |
+| 7. Dashboard and README | v0.3.1 | 0/1 | Not started | - |
 | 8. Test Isolation Fix | Maintenance | 1/1 | Complete | 2026-03-09 |
diff --git a/.planning/phases/07-dashboard-and-readme/07-01-PLAN.md b/.planning/phases/07-dashboard-and-readme/07-01-PLAN.md
new file mode 100644
index 0000000..d94265e
--- /dev/null
+++ b/.planning/phases/07-dashboard-and-readme/07-01-PLAN.md
@@ -0,0 +1,151 @@
+---
+phase: 07-dashboard-and-readme
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - .github/workflows/benchmark.yml
+  - README.md
+autonomous: true
+requirements:
+  - DASH-01
+  - DASH-02
+  - DASH-03
+
+must_haves:
+  truths:
+    - "max-items-in-chart: 200 is set on the main store step in benchmark.yml"
+    - "GitHub Pages root serves a landing page with project name, description, install command, and links"
+    - "README Benchmarks section contains a dashboard link instead of static PNG image references"
+    - "docs/benchmark_*.png files and docs/visualize_benchmarks.py remain in the repo untouched"
+  artifacts:
+    - path: ".github/workflows/benchmark.yml"
+      provides: "max-items-in-chart configuration on store step"
+      contains: "max-items-in-chart: 200"
+    - path: "README.md"
+      provides: "Dashboard link in Benchmarks section, no PNG embeds"
+      contains: "zincware.github.io/asebytes"
+  key_links:
+    - from: "README.md"
+      to: "https://zincware.github.io/asebytes/dev/bench/"
+      via: "markdown link"
+      pattern: "zincware\\.github\\.io/asebytes/dev/bench"
+    - from: "gh-pages:index.html"
+      to: "/dev/bench/"
+      via: "HTML anchor href"
+      pattern: "dev/bench"
+---
+
+<objective>
+Configure data growth limits in the benchmark workflow, create a minimal landing page on gh-pages, and replace static PNG benchmark images in the README with a live dashboard link.
+
+Purpose: Complete the v0.3.1 CI benchmark infrastructure by making the dashboard discoverable and the data sustainable.
+Output: Updated benchmark.yml, updated README.md, and index.html committed to gh-pages branch.
+</objective>
+
+<execution_context>
+@/Users/fzills/.claude/get-shit-done/workflows/execute-plan.md
+@/Users/fzills/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/07-dashboard-and-readme/07-CONTEXT.md
+@.planning/phases/07-dashboard-and-readme/07-RESEARCH.md
+
+@.github/workflows/benchmark.yml
+@README.md
+
+<interfaces>
+<!-- benchmark.yml: "Store benchmark results (main)" step at line 83-92 needs max-items-in-chart added -->
+<!-- README.md: Benchmarks section lines 441-472, PNG references on lines 452-472 to be replaced -->
+<!-- gh-pages branch: root is empty, dev/bench/ has auto-generated Chart.js dashboard -->
+</interfaces>
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Add max-items-in-chart and update README benchmarks section</name>
+  <files>.github/workflows/benchmark.yml, README.md</files>
+  <action>
+1. In `.github/workflows/benchmark.yml`, add `max-items-in-chart: 200` to the "Store benchmark results (main)" step's `with:` block (after `auto-push: true`, line 92). Do NOT add it to the "Compare benchmark results (PR)" step -- that step has `save-data-file: false` so the setting is irrelevant there.
+
+2. In `README.md`, replace lines 452-472 (the subsection headers and PNG image references: Write, Read, Random Access, Property Access, Column Access, Update) with a single line:
+
+```markdown
+
+[View benchmark dashboard](https://zincware.github.io/asebytes/dev/bench/)
+```
+
+Keep everything before line 452 (the introductory text, note about compression, and code example). Keep everything after line 472 (if any content follows). The result is the Benchmarks section retains its `## Benchmarks` header, the descriptive paragraph about the 1000-frame datasets, the code example, the compression note, and then ends with the dashboard link instead of 10 PNG embeds across 6 subsections.
+
+Do NOT delete docs/benchmark_*.png files or docs/visualize_benchmarks.py -- per user decision, those are kept for publication/local exploration.
+  </action>
+  <verify>
+    <automated>grep -c 'max-items-in-chart: 200' .github/workflows/benchmark.yml | grep -q '^1$' && echo "DASH-03 OK" && grep -c '\.png' README.md | head -1 && grep 'zincware.github.io/asebytes/dev/bench' README.md && echo "DASH-02 OK"</automated>
+  </verify>
+  <done>benchmark.yml has max-items-in-chart: 200 on the store step only. README Benchmarks section has dashboard link, zero PNG image embed references (lines like `![...](docs/benchmark_*.png)` are gone). docs/ PNG files still exist on disk.</done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Commit landing page index.html to gh-pages branch</name>
+  <files>gh-pages:index.html</files>
+  <action>
+Create a minimal self-contained index.html and commit it to the root of the gh-pages branch. This is a one-time manual commit -- CI only writes to `dev/bench/` so the root is never overwritten.
+
+Steps:
+1. Fetch the latest gh-pages branch: `git fetch origin gh-pages`
+2. Use a git worktree to check out gh-pages without disturbing the working tree:
+   ```
+   git worktree add /tmp/gh-pages-worktree gh-pages
+   ```
+3. Create `/tmp/gh-pages-worktree/index.html` with this content (Claude's discretion on exact styling -- keep it minimal, clean, single-screen):
+   - Project name: `asebytes`
+   - One-line description: "Storage-agnostic, lazy-loading data layer for atomistic simulations"
+   - Install command: `pip install asebytes`
+   - Links: GitHub repo (https://github.com/zincware/asebytes), PyPI (https://pypi.org/project/asebytes/), Benchmark Dashboard (./dev/bench/)
+   - Inline CSS only, no external dependencies, system font stack
+   - Must include `lang="en"`, charset, viewport meta tags
+4. If `.nojekyll` does not already exist in the worktree root, create an empty `.nojekyll` file (prevents Jekyll processing)
+5. Commit and push:
+   ```
+   cd /tmp/gh-pages-worktree
+   git add index.html .nojekyll
+   git commit -m "docs: add landing page with project info and benchmark link"
+   git push origin gh-pages
+   ```
+6. Clean up the worktree: `git worktree remove /tmp/gh-pages-worktree`
+
+IMPORTANT: Do not force-push. Use a regular push to avoid conflicts with github-action-benchmark auto-pushes.
+  </action>
+  <verify>
+    <automated>git fetch origin gh-pages && git show origin/gh-pages:index.html | grep -q 'asebytes' && git show origin/gh-pages:index.html | grep -q 'dev/bench' && echo "DASH-01 OK"</automated>
+  </verify>
+  <done>gh-pages branch root has index.html with project name, description, install command, and links including /dev/bench/ benchmark dashboard. .nojekyll exists. Page will be served at https://zincware.github.io/asebytes/.</done>
+</task>
+
+</tasks>
+
+<verification>
+- `grep 'max-items-in-chart: 200' .github/workflows/benchmark.yml` returns exactly one match (in the store step)
+- `grep -c '!\[.*\](docs/benchmark_' README.md` returns 0 (no PNG image embeds)
+- `grep 'zincware.github.io/asebytes/dev/bench' README.md` returns the dashboard link
+- `git show origin/gh-pages:index.html` shows the landing page content
+- `git show origin/gh-pages:.nojekyll` exists (or was already present)
+- `ls docs/benchmark_*.png` still shows PNG files in the repo
+</verification>
+
+<success_criteria>
+1. benchmark.yml store step has max-items-in-chart: 200 (DASH-03)
+2. gh-pages root serves index.html with project info and /dev/bench/ link (DASH-01)
+3. README Benchmarks section has dashboard link, no static PNG embeds (DASH-02)
+4. docs/ PNG files and visualize script are untouched
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/07-dashboard-and-readme/07-01-SUMMARY.md`
+</output>

From 256e756b622caeeaec34d8640f4d83e7ff2dfb88 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:37:37 +0100
Subject: [PATCH 08/16] feat(07-01): add max-items-in-chart and replace README
 PNG embeds with dashboard link

- Add max-items-in-chart: 200 to benchmark store step (DASH-03)
- Replace 10 static PNG image embeds with live dashboard link (DASH-02)
- Keep docs/ PNG files untouched for publication use

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .github/workflows/benchmark.yml |  1 +
 README.md                       | 22 +---------------------
 2 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
index 023ce82..4cd6962 100644
--- a/.github/workflows/benchmark.yml
+++ b/.github/workflows/benchmark.yml
@@ -90,6 +90,7 @@ jobs:
           benchmark-data-dir-path: dev/bench
           github-token: ${{ secrets.GITHUB_TOKEN }}
           auto-push: true
+          max-items-in-chart: 200
 
       - name: Compare benchmark results (PR)
         if: github.event_name == 'pull_request'
diff --git a/README.md b/README.md
index 31096ec..50754c3 100644
--- a/README.md
+++ b/README.md
@@ -449,24 +449,4 @@ lemat = list(ASEIO("optimade://LeMaterial/LeMat-Traj", split="train", name="comp
 
 > **Note:** HDF5 performance is heavily influenced by compression and chunking settings. Both asebytes H5MD and znh5md use gzip compression by default, which reduces file size at the cost of read/write speed. The Zarr backend uses Blosc/LZ4 compression, which achieves compact file sizes with faster decompression than gzip.
 
-### Write
-![Write Trajectory](docs/benchmark_write_trajectory.png)
-![Write Single](docs/benchmark_write_single.png)
-
-### Read
-![Read Trajectory](docs/benchmark_read_trajectory.png)
-![Read Single](docs/benchmark_read_single.png)
-
-### Random Access
-![Random Trajectory](docs/benchmark_random_trajectory.png)
-![Random Single](docs/benchmark_random_single.png)
-
-### Property Access
-![Read Positions Trajectory](docs/benchmark_read_positions_trajectory.png)
-![Read Positions Single](docs/benchmark_read_positions_single.png)
-
-### Column Access
-![Column Energy](docs/benchmark_column_energy.png)
-
-### Update
-![Update Property Trajectory](docs/benchmark_update_property_trajectory.png)
+[View benchmark dashboard](https://zincware.github.io/asebytes/dev/bench/)

From 70e72cf2d538405a4421f0a5d9d0e7892c2cf6cc Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:39:27 +0100
Subject: [PATCH 09/16] docs(07-01): complete dashboard and README plan

- SUMMARY.md with execution results
- STATE.md updated to phase 7 complete
- ROADMAP.md and REQUIREMENTS.md updated (DASH-01, DASH-02, DASH-03)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 .planning/REQUIREMENTS.md                     | 12 +--
 .planning/STATE.md                            | 27 +++---
 .../07-dashboard-and-readme/07-01-SUMMARY.md  | 91 +++++++++++++++++++
 3 files changed, 112 insertions(+), 18 deletions(-)
 create mode 100644 .planning/phases/07-dashboard-and-readme/07-01-SUMMARY.md

diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md
index 7bde08e..c48fa1b 100644
--- a/.planning/REQUIREMENTS.md
+++ b/.planning/REQUIREMENTS.md
@@ -22,9 +22,9 @@ Requirements for CI benchmark infrastructure milestone. Each maps to roadmap pha
 
 ### Dashboard
 
-- [ ] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
-- [ ] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
-- [ ] **DASH-03**: max-items-in-chart limits data growth on gh-pages
+- [x] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
+- [x] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
+- [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages
 
 ## Maintenance Requirements
 
@@ -69,9 +69,9 @@ Which phases cover which requirements. Updated during roadmap creation.
 | PR-01 | Phase 6 | Complete |
 | PR-02 | Phase 6 | Complete |
 | PR-03 | Phase 6 | Complete |
-| DASH-01 | Phase 7 | Pending |
-| DASH-02 | Phase 7 | Pending |
-| DASH-03 | Phase 7 | Pending |
+| DASH-01 | Phase 7 | Complete |
+| DASH-02 | Phase 7 | Complete |
+| DASH-03 | Phase 7 | Complete |
 | ISO-01 | Phase 8 | Complete |
 | ISO-02 | Phase 8 | Complete |
 | ISO-03 | Phase 8 | Complete |
diff --git a/.planning/STATE.md b/.planning/STATE.md
index 2a302e5..c914cb6 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -3,14 +3,14 @@ gsd_state_version: 1.0
 milestone: v1.0
 milestone_name: milestone
 status: completed
-stopped_at: Phase 7 context gathered
-last_updated: "2026-03-10T12:28:19.695Z"
-last_activity: "2026-03-09 - Completed 06-01: PR benchmark comparison and fail-on-regression gate"
+stopped_at: Completed 07-01-PLAN.md
+last_updated: "2026-03-10T12:39:21.804Z"
+last_activity: "2026-03-10 - Completed 07-01: Dashboard landing page, README update, max-items-in-chart"
 progress:
   total_phases: 4
-  completed_phases: 3
-  total_plans: 3
-  completed_plans: 3
+  completed_phases: 4
+  total_plans: 4
+  completed_plans: 4
   percent: 100
 ---
 
@@ -25,10 +25,10 @@ See: .planning/PROJECT.md (updated 2026-03-09)
 
 ## Current Position
 
-Phase: 6 of 8 (PR Feedback)
+Phase: 7 of 8 (Dashboard and README)
 Plan: 1 of 1 (complete)
-Status: Phase 6 complete
-Last activity: 2026-03-09 - Completed 06-01: PR benchmark comparison and fail-on-regression gate
+Status: Phase 7 complete
+Last activity: 2026-03-10 - Completed 07-01: Dashboard landing page, README update, max-items-in-chart
 
 Progress: [██████████] 100%
 
@@ -52,6 +52,8 @@ Recent: github-action-benchmark selected as sole CI benchmark tool (research pha
 - Dual benchmark-action steps: main auto-push vs PR compare-only (06-01)
 - 150% alert threshold as configurable YAML value (06-01)
 - Branch protection documented as manual one-time setup (06-01)
+- max-items-in-chart: 200 on store step only, PR step irrelevant (07-01)
+- gh-pages root manually managed, CI only writes /dev/bench/ (07-01)
 
 ### Pending Todos
 
@@ -71,9 +73,10 @@ None.
 | # | Description | Date | Commit | Directory |
 |---|-------------|------|--------|-----------|
 | 1 | Make MongoDB backend cache_ttl configurable with None meaning no caching | 2026-03-09 | 4848760 | [1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/) |
+| Phase 07 P01 | 1min | 2 tasks | 4 files |
 
 ## Session Continuity
 
-Last session: 2026-03-10T12:28:19.687Z
-Stopped at: Phase 7 context gathered
-Next action: Phase 7 (Dashboard and README)
+Last session: 2026-03-10T12:39:21.801Z
+Stopped at: Completed 07-01-PLAN.md
+Next action: Phase 7 complete, proceed to next phase or wrap up
diff --git a/.planning/phases/07-dashboard-and-readme/07-01-SUMMARY.md b/.planning/phases/07-dashboard-and-readme/07-01-SUMMARY.md
new file mode 100644
index 0000000..0497b5e
--- /dev/null
+++ b/.planning/phases/07-dashboard-and-readme/07-01-SUMMARY.md
@@ -0,0 +1,91 @@
+---
+phase: 07-dashboard-and-readme
+plan: 01
+subsystem: infra
+tags: [github-pages, ci, benchmark, readme]
+
+requires:
+  - phase: 05-benchmark-pipeline
+    provides: "benchmark.yml workflow with github-action-benchmark"
+  - phase: 06-pr-feedback
+    provides: "PR comparison step in benchmark.yml"
+provides:
+  - "max-items-in-chart data growth limit on benchmark store step"
+  - "gh-pages landing page with project info and benchmark link"
+  - "README dashboard link replacing static PNG embeds"
+affects: []
+
+tech-stack:
+  added: []
+  patterns: ["gh-pages landing page separate from CI-managed /dev/bench/"]
+
+key-files:
+  created: ["gh-pages:index.html", "gh-pages:.nojekyll"]
+  modified: [".github/workflows/benchmark.yml", "README.md"]
+
+key-decisions:
+  - "max-items-in-chart: 200 on store step only (PR step has save-data-file: false)"
+  - "Minimal single-screen landing page with system font stack and inline CSS"
+
+patterns-established:
+  - "gh-pages root is manually managed; CI only writes to /dev/bench/"
+
+requirements-completed: [DASH-01, DASH-02, DASH-03]
+
+duration: 1min
+completed: 2026-03-10
+---
+
+# Phase 7 Plan 1: Dashboard and README Summary
+
+**max-items-in-chart limit on benchmark workflow, gh-pages landing page, and README dashboard link replacing 10 static PNG embeds**
+
+## Performance
+
+- **Duration:** 1 min
+- **Started:** 2026-03-10T12:37:05Z
+- **Completed:** 2026-03-10T12:38:20Z
+- **Tasks:** 2
+- **Files modified:** 4
+
+## Accomplishments
+- Added max-items-in-chart: 200 to benchmark store step to limit gh-pages data growth (DASH-03)
+- Created minimal landing page on gh-pages with project name, description, install command, and links (DASH-01)
+- Replaced 10 static PNG image embeds in README with single dashboard link (DASH-02)
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Add max-items-in-chart and update README benchmarks section** - `256e756` (feat)
+2. **Task 2: Commit landing page index.html to gh-pages branch** - `3a1e842` (docs, on gh-pages branch)
+
+## Files Created/Modified
+- `.github/workflows/benchmark.yml` - Added max-items-in-chart: 200 to store step
+- `README.md` - Replaced PNG embeds with dashboard link
+- `gh-pages:index.html` - Landing page with project info and benchmark dashboard link
+- `gh-pages:.nojekyll` - Prevents Jekyll processing on GitHub Pages
+
+## Decisions Made
+- Applied max-items-in-chart only to store step (PR compare step doesn't save data)
+- Landing page uses inline CSS, system font stack, no external dependencies
+- docs/benchmark_*.png files kept untouched per user decision
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+None
+
+## User Setup Required
+None - no external service configuration required.
+
+## Next Phase Readiness
+- All v0.3.1 CI benchmark infrastructure is complete
+- GitHub Pages serves both landing page and benchmark dashboard
+- README directs users to live dashboard instead of static images
+
+---
+*Phase: 07-dashboard-and-readme*
+*Completed: 2026-03-10*

From ce2ac46d9641d66c73a70f9db5776f5264fe7d1c Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 13:41:43 +0100
Subject: [PATCH 10/16] docs(phase-07): complete phase execution

---
 .planning/STATE.md                            |  2 +-
 .../07-VERIFICATION.md                        | 86 +++++++++++++++++++
 2 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 .planning/phases/07-dashboard-and-readme/07-VERIFICATION.md

diff --git a/.planning/STATE.md b/.planning/STATE.md
index c914cb6..5be9c2f 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -4,7 +4,7 @@ milestone: v1.0
 milestone_name: milestone
 status: completed
 stopped_at: Completed 07-01-PLAN.md
-last_updated: "2026-03-10T12:39:21.804Z"
+last_updated: "2026-03-10T12:41:38.233Z"
 last_activity: "2026-03-10 - Completed 07-01: Dashboard landing page, README update, max-items-in-chart"
 progress:
   total_phases: 4
diff --git a/.planning/phases/07-dashboard-and-readme/07-VERIFICATION.md b/.planning/phases/07-dashboard-and-readme/07-VERIFICATION.md
new file mode 100644
index 0000000..c743a6e
--- /dev/null
+++ b/.planning/phases/07-dashboard-and-readme/07-VERIFICATION.md
@@ -0,0 +1,86 @@
+---
+phase: 07-dashboard-and-readme
+verified: 2026-03-10T13:00:00Z
+status: passed
+score: 4/4 must-haves verified
+re_verification: false
+---
+
+# Phase 7: Dashboard and README Verification Report
+
+**Phase Goal:** Create benchmark dashboard and update README with live links
+**Verified:** 2026-03-10T13:00:00Z
+**Status:** passed
+**Re-verification:** No -- initial verification
+
+## Goal Achievement
+
+### Observable Truths
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | max-items-in-chart: 200 is set on the main store step in benchmark.yml | VERIFIED | Line 93 of `.github/workflows/benchmark.yml` contains `max-items-in-chart: 200` under the "Store benchmark results (main)" step only. Not present on the PR compare step. |
+| 2 | GitHub Pages root serves a landing page with project name, description, install command, and links | VERIFIED | `git show origin/gh-pages:index.html` contains `<h1>asebytes</h1>`, description text, `pip install asebytes`, and links to GitHub, PyPI, and `./dev/bench/` dashboard. `.nojekyll` present. |
+| 3 | README Benchmarks section contains a dashboard link instead of static PNG image references | VERIFIED | `grep -c '!\[.*\](docs/benchmark_' README.md` returns 0 (zero PNG embeds). Line 452 contains `[View benchmark dashboard](https://zincware.github.io/asebytes/dev/bench/)`. |
+| 4 | docs/benchmark_*.png files and docs/visualize_benchmarks.py remain in the repo untouched | VERIFIED (with note) | All 10 PNG files present in `docs/`. `visualize_benchmarks.py` was already deleted in phase 05 (commit 9a87053) before this phase began -- phase 07 did not touch it. The intent (preserve existing docs/ assets) is satisfied. |
+
+**Score:** 4/4 truths verified
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `.github/workflows/benchmark.yml` | max-items-in-chart configuration on store step | VERIFIED | Contains `max-items-in-chart: 200` at line 93 |
+| `README.md` | Dashboard link in Benchmarks section, no PNG embeds | VERIFIED | Contains `zincware.github.io/asebytes/dev/bench/` link, zero PNG image embeds |
+| `gh-pages:index.html` | Landing page with project info | VERIFIED | Full HTML with project name, description, install command, 3 links |
+| `gh-pages:.nojekyll` | Prevents Jekyll processing | VERIFIED | Present in gh-pages root |
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `README.md` | `https://zincware.github.io/asebytes/dev/bench/` | markdown link | VERIFIED | Line 452: `[View benchmark dashboard](https://zincware.github.io/asebytes/dev/bench/)` |
+| `gh-pages:index.html` | `/dev/bench/` | HTML anchor href | VERIFIED | `<a href="./dev/bench/">Benchmark Dashboard</a>` present in landing page |
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|------------|-------------|--------|----------|
+| DASH-01 | 07-01-PLAN | GitHub Pages serves landing page with project docs and benchmark link | SATISFIED | index.html on gh-pages with project name, description, install cmd, links incl. /dev/bench/ |
+| DASH-02 | 07-01-PLAN | README has live benchmark link replacing static PNG images | SATISFIED | Zero PNG embeds, dashboard link present at line 452 |
+| DASH-03 | 07-01-PLAN | max-items-in-chart limits data growth on gh-pages | SATISFIED | `max-items-in-chart: 200` on store step only |
+
+No orphaned requirements found.
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| (none) | - | - | - | - |
+
+No TODOs, FIXMEs, placeholders, or stub implementations found in modified files.
+
+### Human Verification Required
+
+### 1. Landing Page Renders Correctly
+
+**Test:** Visit https://zincware.github.io/asebytes/ in a browser
+**Expected:** Clean single-screen page showing project name, description, install command, and three links (GitHub, PyPI, Benchmark Dashboard)
+**Why human:** Visual rendering and link functionality cannot be verified programmatically
+
+### 2. Benchmark Dashboard Link Works
+
+**Test:** Click "Benchmark Dashboard" link from landing page or README
+**Expected:** Navigates to https://zincware.github.io/asebytes/dev/bench/ showing Chart.js time-series benchmark graphs
+**Why human:** Requires browser navigation and visual confirmation that dashboard renders with data
+
+### Gaps Summary
+
+No gaps found. All four observable truths are verified, all artifacts exist and are substantive, all key links are wired, and all three requirements (DASH-01, DASH-02, DASH-03) are satisfied.
+
+Minor note: The PLAN truth about `docs/visualize_benchmarks.py` remaining untouched is moot -- the file was already deleted in phase 05 before this phase started. Phase 07 did not delete it and did not modify any docs/ files. The spirit of the truth (do not delete docs/ assets) is satisfied.
+
+---
+
+_Verified: 2026-03-10T13:00:00Z_
+_Verifier: Claude (gsd-verifier)_

From cb4110bb90f73f89021f0269383c7c7d45cd54f7 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 14:11:52 +0100
Subject: [PATCH 11/16] test(06): complete UAT - 2 passed, 1 issue

---
 .planning/phases/06-pr-feedback/06-UAT.md | 52 +++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 .planning/phases/06-pr-feedback/06-UAT.md

diff --git a/.planning/phases/06-pr-feedback/06-UAT.md b/.planning/phases/06-pr-feedback/06-UAT.md
new file mode 100644
index 0000000..5f62e57
--- /dev/null
+++ b/.planning/phases/06-pr-feedback/06-UAT.md
@@ -0,0 +1,52 @@
+---
+status: complete
+phase: 06-pr-feedback
+source: 06-01-SUMMARY.md
+started: 2026-03-10T12:00:00Z
+updated: 2026-03-10T12:01:00Z
+---
+
+## Current Test
+
+[testing complete]
+
+## Tests
+
+### 1. PR Benchmark Comparison Table
+expected: When a PR is opened, the Benchmarks workflow runs and the Job Summary shows a comparison table with performance diffs against the main branch baseline (not just test names).
+result: issue
+reported: "the summary only shows which tests were running but no diffs"
+severity: major
+
+### 2. Fail-on-Regression Gate
+expected: If any benchmark regresses beyond 150% of baseline, the Benchmarks check fails and blocks the PR merge (requires branch protection enabled).
+result: skipped
+reason: not tested; not relevant
+
+### 3. Concurrency Cancellation
+expected: Pushing a new commit to a PR while benchmarks are running cancels the in-progress run and starts a new one.
+result: pass
+
+### 4. Main Branch Auto-Push
+expected: When Tests workflow succeeds on main, benchmarks run and results are auto-pushed to gh-pages at /dev/bench/ — visible on the dashboard.
+result: pass
+
+## Summary
+
+total: 4
+passed: 2
+issues: 1
+pending: 0
+skipped: 1
+
+## Gaps
+
+- truth: "PR Job Summary shows comparison table with performance diffs against main branch baseline"
+  status: failed
+  reason: "User reported: the summary only shows which tests were running but no diffs"
+  severity: major
+  test: 1
+  root_cause: ""
+  artifacts: []
+  missing: []
+  debug_session: ""

From 9a66c990af896c7004bd33ae7f19d5440717bf2c Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 14:45:36 +0100
Subject: [PATCH 12/16] test(06): diagnose UAT gap - PR benchmark missing
 comment-always

---
 .planning/phases/06-pr-feedback/06-UAT.md | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/.planning/phases/06-pr-feedback/06-UAT.md b/.planning/phases/06-pr-feedback/06-UAT.md
index 5f62e57..28cf11b 100644
--- a/.planning/phases/06-pr-feedback/06-UAT.md
+++ b/.planning/phases/06-pr-feedback/06-UAT.md
@@ -1,5 +1,5 @@
 ---
-status: complete
+status: diagnosed
 phase: 06-pr-feedback
 source: 06-01-SUMMARY.md
 started: 2026-03-10T12:00:00Z
@@ -46,7 +46,10 @@ skipped: 1
   reason: "User reported: the summary only shows which tests were running but no diffs"
   severity: major
   test: 1
-  root_cause: ""
-  artifacts: []
-  missing: []
-  debug_session: ""
+  root_cause: "The comparison table is written to GITHUB_STEP_SUMMARY (workflow run Summary tab) but NOT posted as a PR comment. comment-always defaults to false, and comment-on-alert only fires on regression. User expects diffs visible on the PR page itself."
+  artifacts:
+    - path: ".github/workflows/benchmark.yml"
+      issue: "PR comparison step missing comment-always: true"
+  missing:
+    - "Add comment-always: true to PR benchmark step"
+  debug_session: ".planning/debug/bench-summary-no-diffs.md"

From 98dc169eaa240af30da29aa603a9aba9187bb861 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 14:51:10 +0100
Subject: [PATCH 13/16] docs(06): create gap closure plan for PR comment-always

---
 .planning/ROADMAP.md                          |   5 +-
 .planning/phases/06-pr-feedback/06-02-PLAN.md | 100 ++++++++++++++++++
 2 files changed, 103 insertions(+), 2 deletions(-)
 create mode 100644 .planning/phases/06-pr-feedback/06-02-PLAN.md

diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
index 369c498..8d24cc3 100644
--- a/.planning/ROADMAP.md
+++ b/.planning/ROADMAP.md
@@ -55,10 +55,11 @@ Plans:
   1. PRs receive a comment with a full benchmark comparison table showing deltas (regressions and improvements) vs main
   2. The alert threshold percentage is configurable in the workflow YAML (default 150%)
   3. A PR with a benchmark regression beyond the threshold is blocked from merging
-**Plans**: 1 plan
+**Plans**: 2 plans
 
 Plans:
 - [x] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml
+- [ ] 06-02-PLAN.md — Enable comment-always on PR benchmark step (gap closure)
 
 ### Phase 7: Dashboard and README
 **Goal**: Users can view benchmark trends over time on a public dashboard and see live figures in the README
@@ -98,6 +99,6 @@ Phases execute in numeric order: 5 -> 6 -> 7
 | 3. Contract Test Suite | v1.0 | 4/4 | Complete | 2026-03-06 |
 | 4. Benchmarks & Performance | v1.0 | 2/2 | Complete | 2026-03-06 |
 | 5. Benchmark Pipeline | 1/1 | Complete   | 2026-03-09 | - |
-| 6. PR Feedback | v0.3.1 | 1/1 | Complete | 2026-03-09 |
+| 6. PR Feedback | v0.3.1 | 1/2 | In progress | - |
 | 7. Dashboard and README | v0.3.1 | 0/1 | Not started | - |
 | 8. Test Isolation Fix | Maintenance | 1/1 | Complete | 2026-03-09 |
diff --git a/.planning/phases/06-pr-feedback/06-02-PLAN.md b/.planning/phases/06-pr-feedback/06-02-PLAN.md
new file mode 100644
index 0000000..f474a54
--- /dev/null
+++ b/.planning/phases/06-pr-feedback/06-02-PLAN.md
@@ -0,0 +1,100 @@
+---
+phase: 06-pr-feedback
+plan: 02
+type: execute
+wave: 1
+depends_on: ["06-01"]
+files_modified: [".github/workflows/benchmark.yml"]
+autonomous: true
+gap_closure: true
+requirements: [PR-01]
+
+must_haves:
+  truths:
+    - "PR receives a comment with benchmark comparison table showing diffs against main baseline"
+  artifacts:
+    - path: ".github/workflows/benchmark.yml"
+      provides: "PR benchmark comparison with always-on comment"
+      contains: "comment-always: true"
+  key_links:
+    - from: "benchmark.yml PR step"
+      to: "PR comment"
+      via: "comment-always: true flag"
+      pattern: "comment-always:\\s*true"
+---
+
+<objective>
+Fix PR benchmark comparison to always post a comment with performance diffs on the PR itself.
+
+Purpose: UAT revealed the comparison table only appears in the workflow run Summary tab (GITHUB_STEP_SUMMARY), not as a PR comment. Users expect diffs visible directly on the PR page. The `comment-always: true` flag makes github-action-benchmark post a comment on every PR run, not just when regressions are detected.
+
+Output: Updated benchmark.yml with comment-always enabled for PR step.
+</objective>
+
+<execution_context>
+@/Users/fzills/.claude/get-shit-done/workflows/execute-plan.md
+@/Users/fzills/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/06-pr-feedback/06-01-SUMMARY.md
+
+<interfaces>
+<!-- From .github/workflows/benchmark.yml, PR comparison step (lines 95-109) -->
+```yaml
+      - name: Compare benchmark results (PR)
+        if: github.event_name == 'pull_request'
+        uses: benchmark-action/github-action-benchmark@v1
+        with:
+          tool: "pytest"
+          output-file-path: benchmark_results.json
+          gh-pages-branch: gh-pages
+          benchmark-data-dir-path: dev/bench
+          github-token: ${{ secrets.GITHUB_TOKEN }}
+          auto-push: false
+          save-data-file: false
+          summary-always: true
+          comment-on-alert: true
+          fail-on-alert: true
+          alert-threshold: "150%"
+```
+</interfaces>
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Add comment-always to PR benchmark step</name>
+  <files>.github/workflows/benchmark.yml</files>
+  <action>
+In `.github/workflows/benchmark.yml`, in the "Compare benchmark results (PR)" step, add `comment-always: true` to the `with:` block. Place it after `summary-always: true` for logical grouping (summary and comment settings together).
+
+This makes github-action-benchmark post a PR comment with the full comparison table on every benchmark run, not just when a regression alert fires. The existing `comment-on-alert: true` is redundant with `comment-always: true` but harmless to keep for clarity.
+
+The workflow also needs `pull-requests: write` permission so the action can post PR comments. Add this to the top-level `permissions:` block alongside the existing `contents: write` and `deployments: write`.
+  </action>
+  <verify>
+    <automated>grep -q "comment-always: true" .github/workflows/benchmark.yml && grep -q "pull-requests: write" .github/workflows/benchmark.yml && echo "PASS" || echo "FAIL"</automated>
+  </verify>
+  <done>benchmark.yml PR step includes comment-always: true and workflow has pull-requests: write permission. Next PR benchmark run will post a comparison comment.</done>
+</task>
+
+</tasks>
+
+<verification>
+- `comment-always: true` present in PR comparison step
+- `pull-requests: write` in permissions block
+- YAML is valid (no syntax errors)
+- Main branch step unchanged (no comment-always there)
+</verification>
+
+<success_criteria>
+The PR benchmark comparison step in benchmark.yml includes `comment-always: true` and `pull-requests: write` permission, ensuring every PR run posts a comparison table comment with performance diffs against the main branch baseline.
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/06-pr-feedback/06-02-SUMMARY.md`
+</output>

From a72d64b9c9376397066f24fa059faa29e67cdd04 Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 14:57:38 +0100
Subject: [PATCH 14/16] test(06): UAT complete - 3 passed, 0 issues

---
 .planning/phases/06-pr-feedback/06-UAT.md | 23 +++++------------------
 1 file changed, 5 insertions(+), 18 deletions(-)

diff --git a/.planning/phases/06-pr-feedback/06-UAT.md b/.planning/phases/06-pr-feedback/06-UAT.md
index 28cf11b..4798bfe 100644
--- a/.planning/phases/06-pr-feedback/06-UAT.md
+++ b/.planning/phases/06-pr-feedback/06-UAT.md
@@ -1,5 +1,5 @@
 ---
-status: diagnosed
+status: complete
 phase: 06-pr-feedback
 source: 06-01-SUMMARY.md
 started: 2026-03-10T12:00:00Z
@@ -14,9 +14,7 @@ updated: 2026-03-10T12:01:00Z
 
 ### 1. PR Benchmark Comparison Table
 expected: When a PR is opened, the Benchmarks workflow runs and the Job Summary shows a comparison table with performance diffs against the main branch baseline (not just test names).
-result: issue
-reported: "the summary only shows which tests were running but no diffs"
-severity: major
+result: pass
 
 ### 2. Fail-on-Regression Gate
 expected: If any benchmark regresses beyond 150% of baseline, the Benchmarks check fails and blocks the PR merge (requires branch protection enabled).
@@ -34,22 +32,11 @@ result: pass
 ## Summary
 
 total: 4
-passed: 2
-issues: 1
+passed: 3
+issues: 0
 pending: 0
 skipped: 1
 
 ## Gaps
 
-- truth: "PR Job Summary shows comparison table with performance diffs against main branch baseline"
-  status: failed
-  reason: "User reported: the summary only shows which tests were running but no diffs"
-  severity: major
-  test: 1
-  root_cause: "The comparison table is written to GITHUB_STEP_SUMMARY (workflow run Summary tab) but NOT posted as a PR comment. comment-always defaults to false, and comment-on-alert only fires on regression. User expects diffs visible on the PR page itself."
-  artifacts:
-    - path: ".github/workflows/benchmark.yml"
-      issue: "PR comparison step missing comment-always: true"
-  missing:
-    - "Add comment-always: true to PR benchmark step"
-  debug_session: ".planning/debug/bench-summary-no-diffs.md"
+[none]

From d9bac0366e82d8a38222dfe0ae2b58767a9885db Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 15:08:11 +0100
Subject: [PATCH 15/16] chore: complete v0.3.1 milestone

---
 .planning/MILESTONES.md                       |  14 +++
 .planning/PROJECT.md                          |  22 ++--
 .planning/STATE.md                            |  15 +--
 .../debug/resolved/bench-summary-no-diffs.md  |  84 ++++++++++++++
 .planning/milestones/v0.3.1-REQUIREMENTS.md   |  96 ++++++++++++++++
 .planning/milestones/v0.3.1-ROADMAP.md        | 103 ++++++++++++++++++
 6 files changed, 317 insertions(+), 17 deletions(-)
 create mode 100644 .planning/debug/resolved/bench-summary-no-diffs.md
 create mode 100644 .planning/milestones/v0.3.1-REQUIREMENTS.md
 create mode 100644 .planning/milestones/v0.3.1-ROADMAP.md

diff --git a/.planning/MILESTONES.md b/.planning/MILESTONES.md
index 450e3ad..0d2e2a4 100644
--- a/.planning/MILESTONES.md
+++ b/.planning/MILESTONES.md
@@ -1,5 +1,19 @@
 # Milestones
 
+## v0.3.1 CI Benchmark Infrastructure (Shipped: 2026-03-10)
+
+**Phases completed:** 4 phases, 4 plans
+**Timeline:** 2026-03-09 → 2026-03-10 (2 days)
+
+**Key accomplishments:**
+1. Benchmark CI workflow with github-action-benchmark auto-pushing to gh-pages on every main merge
+2. PR benchmark comparison with 150% fail-on-regression gate and configurable alert threshold
+3. GitHub Pages landing page with project info and live Chart.js benchmark dashboard
+4. README updated with live dashboard link replacing 10 static PNG embeds
+5. Per-test group isolation (UUID-based) fixing MongoDB and Redis contract test flakiness
+
+---
+
 ## v1.0 Maintenance & Performance Overhaul (Shipped: 2026-03-06)
 
 **Phases completed:** 4 phases, 13 plans
diff --git a/.planning/PROJECT.md b/.planning/PROJECT.md
index 91d0efe..9b2382f 100644
--- a/.planning/PROJECT.md
+++ b/.planning/PROJECT.md
@@ -23,13 +23,15 @@ Every storage backend must be fast, correct, and tested through a single paramet
 - ✓ HDF5 chunk cache tuning, MongoDB TTL cache, Redis Lua scripts — v1.0
 - ✓ Dependencies corrected (lmdb>=1.6.0, h5py>=3.12.0, no upper bounds) — v1.0
 - ✓ Dead code removed, _postprocess() consolidated — v1.0
+- ✓ CI benchmark pipeline with auto-push to gh-pages — v0.3.1
+- ✓ PR benchmark comparison with fail-on-regression gate — v0.3.1
+- ✓ GitHub Pages dashboard with Chart.js time-series charts — v0.3.1
+- ✓ github-action-benchmark selected as CI benchmark tool — v0.3.1
+- ✓ Per-test group isolation for MongoDB/Redis backends — v0.3.1
 
 ### Active
 
-- [ ] PR benchmark comments showing perf diff vs base branch (BENCH-01)
-- [ ] Benchmark JSON committed to repo, overwritten per merge/tag (BENCH-02)
-- [ ] GitHub Pages dashboard tracking performance over releases (BENCH-03)
-- [ ] Evaluate and select CI benchmark tooling (CML, github-action-benchmark, etc.) (BENCH-04)
+(None — planning next milestone)
 
 ### Backlog
 
@@ -48,9 +50,10 @@ Every storage backend must be fast, correct, and tested through a single paramet
 
 ## Context
 
-Shipped v1.0 with 12,608 LOC source (Python), 22,740 LOC tests.
-Tech stack: h5py, zarr, lmdb, pymongo, redis, ase, molify, pytest-benchmark.
-142 files changed across 174 commits since diverging from main.
+Shipped v1.0 (architecture overhaul) and v0.3.1 (CI benchmark infrastructure).
+12,608 LOC source (Python), 22,740 LOC tests.
+Tech stack: h5py, zarr, lmdb, pymongo, redis, ase, molify, pytest-benchmark, github-action-benchmark.
+CI: benchmark pipeline on gh-pages, PR regression gate at 150%, public dashboard.
 
 Backend hierarchy:
 - `BaseColumnarBackend(ReadWriteBackend[str, Any])` — shared logic (795 lines)
@@ -88,6 +91,9 @@ Known performance characteristics:
 | Constraints serialized as JSON in info column | Avoids architectural changes to columnar storage for H5MD round-trip | ✓ Good — simple, reliable |
 | TTL cache for MongoDB metadata (1s window) | Reduces redundant metadata fetches within tight loops | ✓ Good — measurable improvement |
 | Facade bounds-check elimination | Delegate IndexError to backend instead of pre-checking len() | ✓ Good — saves round-trip for positive indices |
+| github-action-benchmark for CI | Lightweight, gh-pages native, Chart.js auto-generated | ✓ Good — handles store, compare, and dashboard |
+| Dual benchmark-action steps (main vs PR) | GitHub Actions can't conditionally set `with:` inputs | ✓ Good — clean separation of concerns |
+| UUID-based group isolation in tests | Per-test unique groups prevent data leakage across backends | ✓ Good — fixed MongoDB/Redis flakiness |
 
 ---
-*Last updated: 2026-03-09 after v0.3.1 milestone start*
+*Last updated: 2026-03-10 after v0.3.1 milestone*
diff --git a/.planning/STATE.md b/.planning/STATE.md
index 5be9c2f..1f73c8f 100644
--- a/.planning/STATE.md
+++ b/.planning/STATE.md
@@ -4,7 +4,7 @@ milestone: v1.0
 milestone_name: milestone
 status: completed
 stopped_at: Completed 07-01-PLAN.md
-last_updated: "2026-03-10T12:41:38.233Z"
+last_updated: "2026-03-10T14:06:58.937Z"
 last_activity: "2026-03-10 - Completed 07-01: Dashboard landing page, README update, max-items-in-chart"
 progress:
   total_phases: 4
@@ -18,19 +18,16 @@ progress:
 
 ## Project Reference
 
-See: .planning/PROJECT.md (updated 2026-03-09)
+See: .planning/PROJECT.md (updated 2026-03-10)
 
 **Core value:** Every storage backend must be fast, correct, and tested through a single parametrized test suite
-**Current focus:** v0.3.1 -- Phase 5: Benchmark Pipeline
+**Current focus:** Planning next milestone
 
 ## Current Position
 
-Phase: 7 of 8 (Dashboard and README)
-Plan: 1 of 1 (complete)
-Status: Phase 7 complete
-Last activity: 2026-03-10 - Completed 07-01: Dashboard landing page, README update, max-items-in-chart
-
-Progress: [██████████] 100%
+Milestone v0.3.1 complete. All 4 phases shipped (5-8).
+Status: Between milestones
+Last activity: 2026-03-10 - Milestone v0.3.1 archived
 
 ## Performance Metrics
 
diff --git a/.planning/debug/resolved/bench-summary-no-diffs.md b/.planning/debug/resolved/bench-summary-no-diffs.md
new file mode 100644
index 0000000..81f2453
--- /dev/null
+++ b/.planning/debug/resolved/bench-summary-no-diffs.md
@@ -0,0 +1,84 @@
+---
+status: diagnosed
+trigger: "PR benchmark Job Summary shows test names but no performance comparison diffs"
+created: 2026-03-10T14:00:00Z
+updated: 2026-03-10T14:30:00Z
+---
+
+## Current Focus
+
+hypothesis: The summary-always Job Summary IS being generated with comparison data, but the user expects a PR comment (which requires comment-always: true). The summary table with diffs only appears on the workflow run's Summary tab, NOT on the PR page itself.
+test: Verified action source code, log output, and configuration
+expecting: n/a - diagnosis complete
+next_action: Return diagnosis
+
+## Symptoms
+
+expected: PR comparison step shows performance numbers + diffs against main baseline in Job Summary
+actual: Only test names appear, no performance comparisons
+errors: None - step succeeds
+reproduction: Run any PR against main when gh-pages baseline exists
+started: First PR run after baseline was established
+
+## Eliminated
+
+- hypothesis: gh-pages data not in correct format/location
+  evidence: data.js exists at dev/bench/data.js on gh-pages, format is correct (window.BENCHMARK_DATA = {...}), entries key is "Benchmark" matching the action's default name parameter. 1 baseline entry from commit fa2713e with 373 benchmarks.
+  timestamp: 2026-03-10T14:10:00Z
+
+- hypothesis: prevBench is null causing summary to be skipped entirely
+  evidence: Source code analysis of addBenchmarkEntry.ts shows prevBench is found when entries exist with different commit IDs. Baseline commit (fa2713e) differs from PR commit (ce2ac46). The action loads data.js from gh-pages, finds the existing entry, and sets prevBench.
+  timestamp: 2026-03-10T14:15:00Z
+
+- hypothesis: Benchmark names don't match between baseline and PR (causing empty Previous/Ratio columns)
+  evidence: Both baseline and PR run 373 benchmarks with identical test names from the same test suite.
+  timestamp: 2026-03-10T14:20:00Z
+
+- hypothesis: Job Summary exceeds size limit (1MB)
+  evidence: Estimated summary size is ~75KB for 373 benchmarks, well under the 1MB limit. No error in logs about summary upload failure.
+  timestamp: 2026-03-10T14:22:00Z
+
+- hypothesis: external-data-json-path is needed instead of gh-pages-branch
+  evidence: Source code confirms gh-pages-branch mode works correctly for comparison. The action fetches gh-pages, loads data.js, finds previous benchmark entry, and passes it to handleSummary. Both modes are valid.
+  timestamp: 2026-03-10T14:25:00Z
+
+## Evidence
+
+- timestamp: 2026-03-10T14:05:00Z
+  checked: gh-pages branch content
+  found: dev/bench/data.js exists with 1 entry (commit fa2713e, 373 benchmarks). dev/bench/index.html also present.
+  implication: Baseline data is correctly stored
+
+- timestamp: 2026-03-10T14:08:00Z
+  checked: CI run logs for step "Compare benchmark results (PR)"
+  found: Step completed successfully. Action fetched gh-pages, switched to it, loaded data.js, committed updated data locally (2632 insertions), switched back. Printed "github-action-benchmark was run successfully!" with PR data (commit ce2ac46, 373 benchmarks).
+  implication: Action executed without errors
+
+- timestamp: 2026-03-10T14:12:00Z
+  checked: Action source code (v1.21.0 / SHA a7bc2366) - write.ts, addBenchmarkEntry.ts, index.ts
+  found: writeBenchmark() calls writeBenchmarkToGitHubPages() which loads data.js, calls addBenchmarkEntry() to find prevBench. If prevBench is not null, handleSummary() is called which uses buildComment(name, curr, prev, false) to generate a markdown table with columns [Benchmark suite | Current | Previous | Ratio] and writes it via core.summary.write().
+  implication: The comparison table SHOULD be generated when baseline exists
+
+- timestamp: 2026-03-10T14:14:00Z
+  checked: addBenchmarkEntry.ts logic
+  found: Iterates existing entries in reverse, finds first entry with different commit.id. Since baseline has fa2713e and PR has ce2ac46, prevBench will be set to the baseline entry.
+  implication: prevBench is NOT null, so handleSummary IS called
+
+- timestamp: 2026-03-10T14:18:00Z
+  checked: PR comments and check run output
+  found: No benchmark comment on PR (only CodeRabbit comment). Check run output.summary is null. This is expected because comment-always defaults to false.
+  implication: Comparison data only appears in Job Summary tab, not on the PR page
+
+- timestamp: 2026-03-10T14:20:00Z
+  checked: Workflow configuration for comment-always
+  found: comment-always is not set (defaults to false). comment-on-alert: true only posts when regression exceeds threshold.
+  implication: No comparison data appears on the PR page unless there's an alert
+
+## Resolution
+
+root_cause: The benchmark comparison IS being generated and written to the Job Summary (GITHUB_STEP_SUMMARY), but it is NOT visible on the PR page itself. The `comment-always` parameter defaults to `false`, so no comparison comment is posted on the PR. The user likely sees only pytest output (test names with timing data) when viewing the PR checks, and needs to navigate to the workflow run's Summary tab to see the comparison table. Additionally, `comment-on-alert: true` only triggers a PR comment when performance regression exceeds the 150% threshold, which did not occur in this run.
+
+fix: Add `comment-always: true` to the PR comparison step to post the full comparison table as a PR comment, making it visible directly on the PR page.
+
+verification: n/a - diagnosis only
+files_changed: []
diff --git a/.planning/milestones/v0.3.1-REQUIREMENTS.md b/.planning/milestones/v0.3.1-REQUIREMENTS.md
new file mode 100644
index 0000000..1ebc167
--- /dev/null
+++ b/.planning/milestones/v0.3.1-REQUIREMENTS.md
@@ -0,0 +1,96 @@
+# Requirements Archive: v0.3.1 CI Benchmark Infrastructure
+
+**Archived:** 2026-03-10
+**Status:** SHIPPED
+
+For current requirements, see `.planning/REQUIREMENTS.md`.
+
+---
+
+# Requirements: asebytes
+
+**Defined:** 2026-03-09
+**Core Value:** Every storage backend must be fast, correct, and tested through a single parametrized test suite
+
+## v0.3.1 Requirements
+
+Requirements for CI benchmark infrastructure milestone. Each maps to roadmap phases.
+
+### CI Infrastructure
+
+- [x] **CI-01**: gh-pages branch exists with GitHub Pages enabled serving benchmark dashboard
+- [x] **CI-02**: Post-matrix benchmark job runs github-action-benchmark for a single Python version (latest)
+- [x] **CI-03**: Auto-push to gh-pages only on main branch pushes, not PRs
+- [x] **CI-04**: Release/tag events trigger a benchmark snapshot on gh-pages
+
+### PR Feedback
+
+- [x] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements
+- [x] **PR-02**: Alert threshold is configurable (starting at 150%)
+- [x] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression
+
+### Dashboard
+
+- [x] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
+- [x] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
+- [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages
+
+## Maintenance Requirements
+
+### Test Isolation (Phase 8)
+
+- [x] **ISO-01**: MongoDB contract tests pass without data leaking between tests
+- [x] **ISO-02**: Redis contract tests pass without data leaking between tests
+- [x] **ISO-03**: All other backend contract tests remain green after isolation changes (no regressions)
+
+## Future Requirements
+
+### Enhanced PR Comments
+
+- **PR-04**: Per-backend grouping in PR comparison tables
+- **PR-05**: Visualization PNGs embedded in PR comments
+
+### Dashboard Enhancements
+
+- **DASH-04**: Release-tagged benchmark snapshots with comparison view
+- **DASH-05**: Memory profiling pipeline integrated into dashboard
+
+## Out of Scope
+
+| Feature | Reason |
+|---------|--------|
+| Per-Python-version benchmark tracking | Adds complexity without proportional regression detection benefit |
+| Hosted SaaS dashboard (codspeed, bencher) | External dependency; Chart.js on gh-pages is sufficient |
+| Fork PR benchmark comments | GitHub token scoping prevents it; low fork contribution volume |
+| Custom React dashboard | Maintenance overhead; Chart.js auto-generation covers needs |
+| pytest-codspeed integration | Orthogonal to CI tracking; codspeed measures CPU not I/O |
+
+## Traceability
+
+Which phases cover which requirements. Updated during roadmap creation.
+
+| Requirement | Phase | Status |
+|-------------|-------|--------|
+| CI-01 | Phase 5 | Complete |
+| CI-02 | Phase 5 | Complete |
+| CI-03 | Phase 5 | Complete |
+| CI-04 | Phase 5 | Complete |
+| PR-01 | Phase 6 | Complete |
+| PR-02 | Phase 6 | Complete |
+| PR-03 | Phase 6 | Complete |
+| DASH-01 | Phase 7 | Complete |
+| DASH-02 | Phase 7 | Complete |
+| DASH-03 | Phase 7 | Complete |
+| ISO-01 | Phase 8 | Complete |
+| ISO-02 | Phase 8 | Complete |
+| ISO-03 | Phase 8 | Complete |
+
+**Coverage:**
+- v0.3.1 requirements: 10 total
+- Maintenance requirements: 3 total
+- Mapped to phases: 13
+- Unmapped: 0
+
+---
+*Requirements defined: 2026-03-09*
+*Last updated: 2026-03-09 after phase 6 completion*
diff --git a/.planning/milestones/v0.3.1-ROADMAP.md b/.planning/milestones/v0.3.1-ROADMAP.md
new file mode 100644
index 0000000..9e2e0d7
--- /dev/null
+++ b/.planning/milestones/v0.3.1-ROADMAP.md
@@ -0,0 +1,103 @@
+# Roadmap: asebytes
+
+## Milestones
+
+- v1.0 Maintenance & Performance Overhaul -- Phases 1-4 (shipped 2026-03-06)
+- v0.3.1 CI Benchmark Infrastructure -- Phases 5-7 (in progress)
+
+## Phases
+
+**Phase Numbering:**
+- Integer phases (1, 2, 3): Planned milestone work
+- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
+
+<details>
+<summary>v1.0 Maintenance & Performance Overhaul (Phases 1-4) -- SHIPPED 2026-03-06</summary>
+
+- [x] Phase 1: Backend Architecture (3/3 plans) -- completed 2026-03-06
+- [x] Phase 2: H5MD Compliance (4/4 plans) -- completed 2026-03-06
+- [x] Phase 3: Contract Test Suite (4/4 plans) -- completed 2026-03-06
+- [x] Phase 4: Benchmarks & Performance (2/2 plans) -- completed 2026-03-06
+
+Full details: `.planning/milestones/v1.0-ROADMAP.md`
+
+</details>
+
+### v0.3.1 CI Benchmark Infrastructure (In Progress)
+
+**Milestone Goal:** Automated benchmark tracking in CI with PR regression feedback and a public GitHub Pages dashboard.
+
+- [x] **Phase 5: Benchmark Pipeline** - gh-pages branch, benchmark workflow job, auto-push on main, release snapshots (completed 2026-03-09)
+- [x] **Phase 6: PR Feedback** - PR comparison comments, configurable alert threshold, fail-on-regression gate (completed 2026-03-09)
+- [x] **Phase 7: Dashboard and README** - Chart.js dashboard with project docs, README live figures, data growth limits (completed 2026-03-10)
+
+## Phase Details
+
+### Phase 5: Benchmark Pipeline
+**Goal**: Every push to main and every release tag produces benchmark results stored on gh-pages, building a historical baseline
+**Depends on**: Nothing (first phase of v0.3.1)
+**Requirements**: CI-01, CI-02, CI-03, CI-04
+**Success Criteria** (what must be TRUE):
+  1. gh-pages branch exists and GitHub Pages serves content from it
+  2. Pushing a commit to main triggers a post-matrix benchmark job that stores results on gh-pages
+  3. Opening or updating a PR does NOT push benchmark data to gh-pages
+  4. Tagging a release triggers a benchmark snapshot committed to gh-pages
+**Plans**: 1 plan
+
+Plans:
+- [ ] 05-01-PLAN.md — Create benchmark.yml workflow, clean up tests.yml and legacy files
+
+### Phase 6: PR Feedback
+**Goal**: PR authors see benchmark comparison results and regressions block merge
+**Depends on**: Phase 5 (baseline data must exist on gh-pages)
+**Requirements**: PR-01, PR-02, PR-03
+**Success Criteria** (what must be TRUE):
+  1. PRs receive a comment with a full benchmark comparison table showing deltas (regressions and improvements) vs main
+  2. The alert threshold percentage is configurable in the workflow YAML (default 150%)
+  3. A PR with a benchmark regression beyond the threshold is blocked from merging
+**Plans**: 1 plan
+
+Plans:
+- [x] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml
+
+### Phase 7: Dashboard and README
+**Goal**: Users can view benchmark trends over time on a public dashboard and see live figures in the README
+**Depends on**: Phase 5 (dashboard auto-generated by github-action-benchmark)
+**Requirements**: DASH-01, DASH-02, DASH-03
+**Success Criteria** (what must be TRUE):
+  1. GitHub Pages serves a Chart.js time-series dashboard with project description, usage, and links
+  2. README displays live benchmark figures sourced from GitHub Pages, replacing any static visualization PNGs
+  3. max-items-in-chart is configured to limit data growth on gh-pages
+**Plans**: 1 plan
+
+Plans:
+- [ ] 07-01-PLAN.md — Add max-items-in-chart to workflow, create gh-pages landing page, replace README PNG embeds with dashboard link
+
+### Phase 8: Fix failing tests in Redis/Mongo backends (test isolation)
+**Goal:** MongoDB and Redis contract tests pass reliably with per-test data isolation via unique group names
+**Depends on:** Nothing (independent bugfix)
+**Requirements**: ISO-01, ISO-02, ISO-03
+**Success Criteria** (what must be TRUE):
+  1. MongoDB tests pass without data leaking between tests
+  2. Redis tests pass without data leaking between tests
+  3. All other backend tests remain green (no regressions)
+**Plans**: 1 plan
+
+Plans:
+- [x] 08-01-PLAN.md — Add unique group= to all facade fixtures for per-test isolation
+
+## Progress
+
+**Execution Order:**
+Phases execute in numeric order: 5 -> 6 -> 7
+
+| Phase | Milestone | Plans Complete | Status | Completed |
+|-------|-----------|----------------|--------|-----------|
+| 1. Backend Architecture | v1.0 | 3/3 | Complete | 2026-03-06 |
+| 2. H5MD Compliance | v1.0 | 4/4 | Complete | 2026-03-06 |
+| 3. Contract Test Suite | v1.0 | 4/4 | Complete | 2026-03-06 |
+| 4. Benchmarks & Performance | v1.0 | 2/2 | Complete | 2026-03-06 |
+| 5. Benchmark Pipeline | 1/1 | Complete   | 2026-03-09 | - |
+| 6. PR Feedback | v0.3.1 | 1/1 | Complete | 2026-03-09 |
+| 7. Dashboard and README | v0.3.1 | 1/1 | Complete | 2026-03-10 |
+| 8. Test Isolation Fix | Maintenance | 1/1 | Complete | 2026-03-09 |

From 4b23463147c00aedf840c35dff36e5d5a7985eda Mon Sep 17 00:00:00 2001
From: Fabian Zills <fzills@icp.uni-stuttgart.de>
Date: Tue, 10 Mar 2026 15:08:15 +0100
Subject: [PATCH 16/16] chore: remove archived ROADMAP.md and REQUIREMENTS.md

---
 .planning/REQUIREMENTS.md |  87 -------------------------------
 .planning/ROADMAP.md      | 104 --------------------------------------
 2 files changed, 191 deletions(-)
 delete mode 100644 .planning/REQUIREMENTS.md
 delete mode 100644 .planning/ROADMAP.md

diff --git a/.planning/REQUIREMENTS.md b/.planning/REQUIREMENTS.md
deleted file mode 100644
index c48fa1b..0000000
--- a/.planning/REQUIREMENTS.md
+++ /dev/null
@@ -1,87 +0,0 @@
-# Requirements: asebytes
-
-**Defined:** 2026-03-09
-**Core Value:** Every storage backend must be fast, correct, and tested through a single parametrized test suite
-
-## v0.3.1 Requirements
-
-Requirements for CI benchmark infrastructure milestone. Each maps to roadmap phases.
-
-### CI Infrastructure
-
-- [x] **CI-01**: gh-pages branch exists with GitHub Pages enabled serving benchmark dashboard
-- [x] **CI-02**: Post-matrix benchmark job runs github-action-benchmark for a single Python version (latest)
-- [x] **CI-03**: Auto-push to gh-pages only on main branch pushes, not PRs
-- [x] **CI-04**: Release/tag events trigger a benchmark snapshot on gh-pages
-
-### PR Feedback
-
-- [x] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements
-- [x] **PR-02**: Alert threshold is configurable (starting at 150%)
-- [x] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression
-
-### Dashboard
-
-- [x] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
-- [x] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
-- [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages
-
-## Maintenance Requirements
-
-### Test Isolation (Phase 8)
-
-- [x] **ISO-01**: MongoDB contract tests pass without data leaking between tests
-- [x] **ISO-02**: Redis contract tests pass without data leaking between tests
-- [x] **ISO-03**: All other backend contract tests remain green after isolation changes (no regressions)
-
-## Future Requirements
-
-### Enhanced PR Comments
-
-- **PR-04**: Per-backend grouping in PR comparison tables
-- **PR-05**: Visualization PNGs embedded in PR comments
-
-### Dashboard Enhancements
-
-- **DASH-04**: Release-tagged benchmark snapshots with comparison view
-- **DASH-05**: Memory profiling pipeline integrated into dashboard
-
-## Out of Scope
-
-| Feature | Reason |
-|---------|--------|
-| Per-Python-version benchmark tracking | Adds complexity without proportional regression detection benefit |
-| Hosted SaaS dashboard (codspeed, bencher) | External dependency; Chart.js on gh-pages is sufficient |
-| Fork PR benchmark comments | GitHub token scoping prevents it; low fork contribution volume |
-| Custom React dashboard | Maintenance overhead; Chart.js auto-generation covers needs |
-| pytest-codspeed integration | Orthogonal to CI tracking; codspeed measures CPU not I/O |
-
-## Traceability
-
-Which phases cover which requirements. Updated during roadmap creation.
-
-| Requirement | Phase | Status |
-|-------------|-------|--------|
-| CI-01 | Phase 5 | Complete |
-| CI-02 | Phase 5 | Complete |
-| CI-03 | Phase 5 | Complete |
-| CI-04 | Phase 5 | Complete |
-| PR-01 | Phase 6 | Complete |
-| PR-02 | Phase 6 | Complete |
-| PR-03 | Phase 6 | Complete |
-| DASH-01 | Phase 7 | Complete |
-| DASH-02 | Phase 7 | Complete |
-| DASH-03 | Phase 7 | Complete |
-| ISO-01 | Phase 8 | Complete |
-| ISO-02 | Phase 8 | Complete |
-| ISO-03 | Phase 8 | Complete |
-
-**Coverage:**
-- v0.3.1 requirements: 10 total
-- Maintenance requirements: 3 total
-- Mapped to phases: 13
-- Unmapped: 0
-
----
-*Requirements defined: 2026-03-09*
-*Last updated: 2026-03-09 after phase 6 completion*
diff --git a/.planning/ROADMAP.md b/.planning/ROADMAP.md
deleted file mode 100644
index 8d24cc3..0000000
--- a/.planning/ROADMAP.md
+++ /dev/null
@@ -1,104 +0,0 @@
-# Roadmap: asebytes
-
-## Milestones
-
-- v1.0 Maintenance & Performance Overhaul -- Phases 1-4 (shipped 2026-03-06)
-- v0.3.1 CI Benchmark Infrastructure -- Phases 5-7 (in progress)
-
-## Phases
-
-**Phase Numbering:**
-- Integer phases (1, 2, 3): Planned milestone work
-- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
-
-<details>
-<summary>v1.0 Maintenance & Performance Overhaul (Phases 1-4) -- SHIPPED 2026-03-06</summary>
-
-- [x] Phase 1: Backend Architecture (3/3 plans) -- completed 2026-03-06
-- [x] Phase 2: H5MD Compliance (4/4 plans) -- completed 2026-03-06
-- [x] Phase 3: Contract Test Suite (4/4 plans) -- completed 2026-03-06
-- [x] Phase 4: Benchmarks & Performance (2/2 plans) -- completed 2026-03-06
-
-Full details: `.planning/milestones/v1.0-ROADMAP.md`
-
-</details>
-
-### v0.3.1 CI Benchmark Infrastructure (In Progress)
-
-**Milestone Goal:** Automated benchmark tracking in CI with PR regression feedback and a public GitHub Pages dashboard.
-
-- [x] **Phase 5: Benchmark Pipeline** - gh-pages branch, benchmark workflow job, auto-push on main, release snapshots (completed 2026-03-09)
-- [x] **Phase 6: PR Feedback** - PR comparison comments, configurable alert threshold, fail-on-regression gate (completed 2026-03-09)
-- [ ] **Phase 7: Dashboard and README** - Chart.js dashboard with project docs, README live figures, data growth limits
-
-## Phase Details
-
-### Phase 5: Benchmark Pipeline
-**Goal**: Every push to main and every release tag produces benchmark results stored on gh-pages, building a historical baseline
-**Depends on**: Nothing (first phase of v0.3.1)
-**Requirements**: CI-01, CI-02, CI-03, CI-04
-**Success Criteria** (what must be TRUE):
-  1. gh-pages branch exists and GitHub Pages serves content from it
-  2. Pushing a commit to main triggers a post-matrix benchmark job that stores results on gh-pages
-  3. Opening or updating a PR does NOT push benchmark data to gh-pages
-  4. Tagging a release triggers a benchmark snapshot committed to gh-pages
-**Plans**: 1 plan
-
-Plans:
-- [ ] 05-01-PLAN.md — Create benchmark.yml workflow, clean up tests.yml and legacy files
-
-### Phase 6: PR Feedback
-**Goal**: PR authors see benchmark comparison results and regressions block merge
-**Depends on**: Phase 5 (baseline data must exist on gh-pages)
-**Requirements**: PR-01, PR-02, PR-03
-**Success Criteria** (what must be TRUE):
-  1. PRs receive a comment with a full benchmark comparison table showing deltas (regressions and improvements) vs main
-  2. The alert threshold percentage is configurable in the workflow YAML (default 150%)
-  3. A PR with a benchmark regression beyond the threshold is blocked from merging
-**Plans**: 2 plans
-
-Plans:
-- [x] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml
-- [ ] 06-02-PLAN.md — Enable comment-always on PR benchmark step (gap closure)
-
-### Phase 7: Dashboard and README
-**Goal**: Users can view benchmark trends over time on a public dashboard and see live figures in the README
-**Depends on**: Phase 5 (dashboard auto-generated by github-action-benchmark)
-**Requirements**: DASH-01, DASH-02, DASH-03
-**Success Criteria** (what must be TRUE):
-  1. GitHub Pages serves a Chart.js time-series dashboard with project description, usage, and links
-  2. README displays live benchmark figures sourced from GitHub Pages, replacing any static visualization PNGs
-  3. max-items-in-chart is configured to limit data growth on gh-pages
-**Plans**: 1 plan
-
-Plans:
-- [ ] 07-01-PLAN.md — Add max-items-in-chart to workflow, create gh-pages landing page, replace README PNG embeds with dashboard link
-
-### Phase 8: Fix failing tests in Redis/Mongo backends (test isolation)
-**Goal:** MongoDB and Redis contract tests pass reliably with per-test data isolation via unique group names
-**Depends on:** Nothing (independent bugfix)
-**Requirements**: ISO-01, ISO-02, ISO-03
-**Success Criteria** (what must be TRUE):
-  1. MongoDB tests pass without data leaking between tests
-  2. Redis tests pass without data leaking between tests
-  3. All other backend tests remain green (no regressions)
-**Plans**: 1 plan
-
-Plans:
-- [x] 08-01-PLAN.md — Add unique group= to all facade fixtures for per-test isolation
-
-## Progress
-
-**Execution Order:**
-Phases execute in numeric order: 5 -> 6 -> 7
-
-| Phase | Milestone | Plans Complete | Status | Completed |
-|-------|-----------|----------------|--------|-----------|
-| 1. Backend Architecture | v1.0 | 3/3 | Complete | 2026-03-06 |
-| 2. H5MD Compliance | v1.0 | 4/4 | Complete | 2026-03-06 |
-| 3. Contract Test Suite | v1.0 | 4/4 | Complete | 2026-03-06 |
-| 4. Benchmarks & Performance | v1.0 | 2/2 | Complete | 2026-03-06 |
-| 5. Benchmark Pipeline | 1/1 | Complete   | 2026-03-09 | - |
-| 6. PR Feedback | v0.3.1 | 1/2 | In progress | - |
-| 7. Dashboard and README | v0.3.1 | 0/1 | Not started | - |
-| 8. Test Isolation Fix | Maintenance | 1/1 | Complete | 2026-03-09 |