Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/benchmark.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ jobs:
benchmark-data-dir-path: dev/bench
github-token: ${{ secrets.GITHUB_TOKEN }}
auto-push: true
max-items-in-chart: 200

- name: Compare benchmark results (PR)
if: github.event_name == 'pull_request'
Expand Down
14 changes: 14 additions & 0 deletions .planning/MILESTONES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# Milestones

## v0.3.1 CI Benchmark Infrastructure (Shipped: 2026-03-10)

**Phases completed:** 4 phases, 4 plans
**Timeline:** 2026-03-09 → 2026-03-10 (2 days)

**Key accomplishments:**
1. Benchmark CI workflow with github-action-benchmark auto-pushing to gh-pages on every main merge
2. PR benchmark comparison with 150% fail-on-regression gate and configurable alert threshold
3. GitHub Pages landing page with project info and live Chart.js benchmark dashboard
4. README updated with live dashboard link replacing 10 static PNG embeds
5. Per-test group isolation (UUID-based) fixing MongoDB and Redis contract test flakiness

---

## v1.0 Maintenance & Performance Overhaul (Shipped: 2026-03-06)

**Phases completed:** 4 phases, 13 plans
Expand Down
22 changes: 14 additions & 8 deletions .planning/PROJECT.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,15 @@ Every storage backend must be fast, correct, and tested through a single paramet
- ✓ HDF5 chunk cache tuning, MongoDB TTL cache, Redis Lua scripts — v1.0
- ✓ Dependencies corrected (lmdb>=1.6.0, h5py>=3.12.0, no upper bounds) — v1.0
- ✓ Dead code removed, _postprocess() consolidated — v1.0
- ✓ CI benchmark pipeline with auto-push to gh-pages — v0.3.1
- ✓ PR benchmark comparison with fail-on-regression gate — v0.3.1
- ✓ GitHub Pages dashboard with Chart.js time-series charts — v0.3.1
- ✓ github-action-benchmark selected as CI benchmark tool — v0.3.1
- ✓ Per-test group isolation for MongoDB/Redis backends — v0.3.1

### Active

- [ ] PR benchmark comments showing perf diff vs base branch (BENCH-01)
- [ ] Benchmark JSON committed to repo, overwritten per merge/tag (BENCH-02)
- [ ] GitHub Pages dashboard tracking performance over releases (BENCH-03)
- [ ] Evaluate and select CI benchmark tooling (CML, github-action-benchmark, etc.) (BENCH-04)
(None — planning next milestone)

### Backlog

Expand All @@ -48,9 +50,10 @@ Every storage backend must be fast, correct, and tested through a single paramet

## Context

Shipped v1.0 with 12,608 LOC source (Python), 22,740 LOC tests.
Tech stack: h5py, zarr, lmdb, pymongo, redis, ase, molify, pytest-benchmark.
142 files changed across 174 commits since diverging from main.
Shipped v1.0 (architecture overhaul) and v0.3.1 (CI benchmark infrastructure).
12,608 LOC source (Python), 22,740 LOC tests.
Tech stack: h5py, zarr, lmdb, pymongo, redis, ase, molify, pytest-benchmark, github-action-benchmark.
CI: benchmark pipeline on gh-pages, PR regression gate at 150%, public dashboard.

Backend hierarchy:
- `BaseColumnarBackend(ReadWriteBackend[str, Any])` — shared logic (795 lines)
Expand Down Expand Up @@ -88,6 +91,9 @@ Known performance characteristics:
| Constraints serialized as JSON in info column | Avoids architectural changes to columnar storage for H5MD round-trip | ✓ Good — simple, reliable |
| TTL cache for MongoDB metadata (1s window) | Reduces redundant metadata fetches within tight loops | ✓ Good — measurable improvement |
| Facade bounds-check elimination | Delegate IndexError to backend instead of pre-checking len() | ✓ Good — saves round-trip for positive indices |
| github-action-benchmark for CI | Lightweight, gh-pages native, Chart.js auto-generated | ✓ Good — handles store, compare, and dashboard |
| Dual benchmark-action steps (main vs PR) | GitHub Actions can't conditionally set `with:` inputs | ✓ Good — clean separation of concerns |
| UUID-based group isolation in tests | Per-test unique groups prevent data leakage across backends | ✓ Good — fixed MongoDB/Redis flakiness |

---
*Last updated: 2026-03-09 after v0.3.1 milestone start*
*Last updated: 2026-03-10 after v0.3.1 milestone*
39 changes: 21 additions & 18 deletions .planning/STATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,34 +3,31 @@ gsd_state_version: 1.0
milestone: v1.0
milestone_name: milestone
status: completed
stopped_at: Phase 8 context gathered
last_updated: "2026-03-09T20:46:43.140Z"
last_activity: 2026-03-09 -- Completed 05-01 benchmark pipeline
stopped_at: Completed 07-01-PLAN.md
last_updated: "2026-03-10T14:06:58.937Z"
last_activity: "2026-03-10 - Completed 07-01: Dashboard landing page, README update, max-items-in-chart"
progress:
total_phases: 4
completed_phases: 1
total_plans: 2
completed_plans: 1
completed_phases: 4
total_plans: 4
completed_plans: 4
Comment on lines +11 to +13
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the phase counters and current-position text consistent.

progress now reports 4/4 phases and 100% completion, while the same file says Phase: 7 of 8. Those states cannot both be true, so this document no longer provides a reliable source of project status.

Also applies to: 28-31

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/STATE.md around lines 11 - 13, The phase counters in STATE.md are
inconsistent: update the numeric fields (completed_phases, total_plans,
completed_plans) to match the human-readable phase text ("Phase: 7 of 8") or
vice‑versa so the file reports a single consistent project state; locate and
reconcile the entries completed_phases, total_plans, completed_plans and the
"Phase: X of Y" line (also referenced around lines 28–31) so they all reflect
the same phase and completion percentage.

percent: 100
---

# Project State

## Project Reference

See: .planning/PROJECT.md (updated 2026-03-09)
See: .planning/PROJECT.md (updated 2026-03-10)

**Core value:** Every storage backend must be fast, correct, and tested through a single parametrized test suite
**Current focus:** v0.3.1 -- Phase 5: Benchmark Pipeline
**Current focus:** Planning next milestone

## Current Position

Phase: 8 of 8 (Test Isolation Fix) -- maintenance
Plan: 1 of 1 (complete)
Status: Phase 8 complete
Last activity: 2026-03-09 - Completed quick task 1: Make MongoDB backend cache_ttl configurable

Progress: [██████████] 100%
Milestone v0.3.1 complete. All 4 phases shipped (5-8).
Status: Between milestones
Last activity: 2026-03-10 - Milestone v0.3.1 archived

## Performance Metrics

Expand All @@ -49,6 +46,11 @@ Recent: github-action-benchmark selected as sole CI benchmark tool (research pha
- Single Python 3.13 for benchmarks -- consistent baseline (CI-02, 05-01)
- No separate release/tag trigger -- main pushes cover it (CI-04, 05-01)
- Uniform group= on all backends, no conditional logic per backend type (08-01)
- Dual benchmark-action steps: main auto-push vs PR compare-only (06-01)
- 150% alert threshold as configurable YAML value (06-01)
- Branch protection documented as manual one-time setup (06-01)
- max-items-in-chart: 200 on store step only, PR step irrelevant (07-01)
- gh-pages root manually managed, CI only writes /dev/bench/ (07-01)

### Pending Todos

Expand All @@ -67,10 +69,11 @@ None.

| # | Description | Date | Commit | Directory |
|---|-------------|------|--------|-----------|
| 1 | Make MongoDB backend cache_ttl configurable with None meaning no caching | 2026-03-09 | pending | [1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/) |
| 1 | Make MongoDB backend cache_ttl configurable with None meaning no caching | 2026-03-09 | 4848760 | [1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/) |
| Phase 07 P01 | 1min | 2 tasks | 4 files |
Comment on lines +72 to +73
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the malformed Quick Tasks table row.

Line 76 has fewer cells than the 5-column header, so the Markdown table will render incorrectly and shift columns.

🧰 Tools
🪛 LanguageTool

[grammar] ~76-~76: Ensure spelling is correct
Context: ...ache-ttl-configura/) | | Phase 07 P01 | 1min | 2 tasks | 4 files | ## Session Conti...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/STATE.md around lines 75 - 76, The table row containing "| 1 |
Make MongoDB backend cache_ttl configurable with None meaning no caching |
2026-03-09 | 4848760 |
[1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/)
|" is malformed because the following metadata line "Phase 07 P01 | 1min | 2
tasks | 4 files" has fewer cells than the 5-column header; fix by ensuring every
Markdown table row has the same number of pipe-separated cells as the header:
either split the combined content into two proper rows (one for the task entry
and one for metadata) or add an extra empty cell(s) to the metadata row so it
has five columns, locating the fix near the row that starts with "| 1 | Make
MongoDB backend cache_ttl..." in .planning/STATE.md.


## Session Continuity

Last session: 2026-03-09T20:55:36Z
Stopped at: Completed 08-01-PLAN.md
Next action: Next phase or plan
Last session: 2026-03-10T12:39:21.801Z
Stopped at: Completed 07-01-PLAN.md
Next action: Phase 7 complete, proceed to next phase or wrap up
84 changes: 84 additions & 0 deletions .planning/debug/resolved/bench-summary-no-diffs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
status: diagnosed
trigger: "PR benchmark Job Summary shows test names but no performance comparison diffs"
created: 2026-03-10T14:00:00Z
updated: 2026-03-10T14:30:00Z
---

## Current Focus

hypothesis: The summary-always Job Summary IS being generated with comparison data, but the user expects a PR comment (which requires comment-always: true). The summary table with diffs only appears on the workflow run's Summary tab, NOT on the PR page itself.
test: Verified action source code, log output, and configuration
expecting: n/a - diagnosis complete
next_action: Return diagnosis

## Symptoms

expected: PR comparison step shows performance numbers + diffs against main baseline in Job Summary
actual: Only test names appear, no performance comparisons
errors: None - step succeeds
reproduction: Run any PR against main when gh-pages baseline exists
started: First PR run after baseline was established

## Eliminated

- hypothesis: gh-pages data not in correct format/location
evidence: data.js exists at dev/bench/data.js on gh-pages, format is correct (window.BENCHMARK_DATA = {...}), entries key is "Benchmark" matching the action's default name parameter. 1 baseline entry from commit fa2713e with 373 benchmarks.
timestamp: 2026-03-10T14:10:00Z

- hypothesis: prevBench is null causing summary to be skipped entirely
evidence: Source code analysis of addBenchmarkEntry.ts shows prevBench is found when entries exist with different commit IDs. Baseline commit (fa2713e) differs from PR commit (ce2ac46). The action loads data.js from gh-pages, finds the existing entry, and sets prevBench.
timestamp: 2026-03-10T14:15:00Z

- hypothesis: Benchmark names don't match between baseline and PR (causing empty Previous/Ratio columns)
evidence: Both baseline and PR run 373 benchmarks with identical test names from the same test suite.
timestamp: 2026-03-10T14:20:00Z

- hypothesis: Job Summary exceeds size limit (1MB)
evidence: Estimated summary size is ~75KB for 373 benchmarks, well under the 1MB limit. No error in logs about summary upload failure.
timestamp: 2026-03-10T14:22:00Z

- hypothesis: external-data-json-path is needed instead of gh-pages-branch
evidence: Source code confirms gh-pages-branch mode works correctly for comparison. The action fetches gh-pages, loads data.js, finds previous benchmark entry, and passes it to handleSummary. Both modes are valid.
timestamp: 2026-03-10T14:25:00Z

## Evidence

- timestamp: 2026-03-10T14:05:00Z
checked: gh-pages branch content
found: dev/bench/data.js exists with 1 entry (commit fa2713e, 373 benchmarks). dev/bench/index.html also present.
implication: Baseline data is correctly stored

- timestamp: 2026-03-10T14:08:00Z
checked: CI run logs for step "Compare benchmark results (PR)"
found: Step completed successfully. Action fetched gh-pages, switched to it, loaded data.js, committed updated data locally (2632 insertions), switched back. Printed "github-action-benchmark was run successfully!" with PR data (commit ce2ac46, 373 benchmarks).
implication: Action executed without errors

- timestamp: 2026-03-10T14:12:00Z
checked: Action source code (v1.21.0 / SHA a7bc2366) - write.ts, addBenchmarkEntry.ts, index.ts
found: writeBenchmark() calls writeBenchmarkToGitHubPages() which loads data.js, calls addBenchmarkEntry() to find prevBench. If prevBench is not null, handleSummary() is called which uses buildComment(name, curr, prev, false) to generate a markdown table with columns [Benchmark suite | Current | Previous | Ratio] and writes it via core.summary.write().
implication: The comparison table SHOULD be generated when baseline exists

- timestamp: 2026-03-10T14:14:00Z
checked: addBenchmarkEntry.ts logic
found: Iterates existing entries in reverse, finds first entry with different commit.id. Since baseline has fa2713e and PR has ce2ac46, prevBench will be set to the baseline entry.
implication: prevBench is NOT null, so handleSummary IS called

- timestamp: 2026-03-10T14:18:00Z
checked: PR comments and check run output
found: No benchmark comment on PR (only CodeRabbit comment). Check run output.summary is null. This is expected because comment-always defaults to false.
implication: Comparison data only appears in Job Summary tab, not on the PR page

- timestamp: 2026-03-10T14:20:00Z
checked: Workflow configuration for comment-always
found: comment-always is not set (defaults to false). comment-on-alert: true only posts when regression exceeds threshold.
implication: No comparison data appears on the PR page unless there's an alert

## Resolution

root_cause: The benchmark comparison IS being generated and written to the Job Summary (GITHUB_STEP_SUMMARY), but it is NOT visible on the PR page itself. The `comment-always` parameter defaults to `false`, so no comparison comment is posted on the PR. The user likely sees only pytest output (test names with timing data) when viewing the PR checks, and needs to navigate to the workflow run's Summary tab to see the comparison table. Additionally, `comment-on-alert: true` only triggers a PR comment when performance regression exceeds the 150% threshold, which did not occur in this run.

fix: Add `comment-always: true` to the PR comparison step to post the full comparison table as a PR comment, making it visible directly on the PR page.

verification: n/a - diagnosis only
files_changed: []
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
# Requirements Archive: v0.3.1 CI Benchmark Infrastructure

**Archived:** 2026-03-10
**Status:** SHIPPED

For current requirements, see `.planning/REQUIREMENTS.md`.

---

# Requirements: asebytes

**Defined:** 2026-03-09
Expand All @@ -16,15 +25,15 @@ Requirements for CI benchmark infrastructure milestone. Each maps to roadmap pha

### PR Feedback

- [ ] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements
- [ ] **PR-02**: Alert threshold is configurable (starting at 150%)
- [ ] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression
- [x] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements
- [x] **PR-02**: Alert threshold is configurable (starting at 150%)
- [x] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression

### Dashboard

- [ ] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
- [ ] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
- [ ] **DASH-03**: max-items-in-chart limits data growth on gh-pages
- [x] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
- [x] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
- [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages
Comment on lines +34 to +36
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't mark the dashboard requirements complete yet.

These updates record all DASH-* items as complete, but the rest of this PR still treats Phase 7 as unfinished: .planning/ROADMAP.md Line 102 says "Not started", the Phase 7 validation doc is still draft, and DASH-01 depends on the manual gh-pages landing-page work described in the Phase 7 context docs. DASH-02 is also worded as "embeds live benchmark figures", while README.md now links to the dashboard instead of embedding it. Right now the requirements tracker is ahead of the implementation and the other planning artifacts.

Suggested direction
- - [x] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
- - [x] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
- - [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages
+ - [ ] **DASH-01**: GitHub Pages serves a landing page with project docs and links to the auto-generated Chart.js dashboard
+ - [x] **DASH-02**: README links to the live benchmark dashboard on GitHub Pages, replacing static visualization PNGs
+ - [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages

- | DASH-01 | Phase 7 | Complete |
- | DASH-02 | Phase 7 | Complete |
- | DASH-03 | Phase 7 | Complete |
+ | DASH-01 | Phase 7 | In progress |
+ | DASH-02 | Phase 7 | Complete |
+ | DASH-03 | Phase 7 | Complete |

-*Last updated: 2026-03-09 after phase 6 completion*
+*Last updated: 2026-03-10 after Phase 7 partial implementation*

Also applies to: 72-74, 87-87

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.planning/REQUIREMENTS.md around lines 25 - 27, The checklist prematurely
marks DASH-01, DASH-02, and DASH-03 complete while Phase 7 artifacts are still
unfinished; revert those DASH-* checklist ticks to unchecked and make the
requirements status consistent with the Phase 7 documents: update the Phase 7
entry in ROADMAP (the Phase 7 "Not started" state and the draft Phase 7
validation doc) to reflect current progress, and either restore README.md to
embed the live benchmark figures or change the DASH-02 wording to “links to”
instead of “embeds” so the requirement matches the README; ensure DASH-01
remains dependent on completing the manual gh-pages landing-page work before
marking complete.


## Maintenance Requirements

Expand Down Expand Up @@ -66,12 +75,12 @@ Which phases cover which requirements. Updated during roadmap creation.
| CI-02 | Phase 5 | Complete |
| CI-03 | Phase 5 | Complete |
| CI-04 | Phase 5 | Complete |
| PR-01 | Phase 6 | Pending |
| PR-02 | Phase 6 | Pending |
| PR-03 | Phase 6 | Pending |
| DASH-01 | Phase 7 | Pending |
| DASH-02 | Phase 7 | Pending |
| DASH-03 | Phase 7 | Pending |
| PR-01 | Phase 6 | Complete |
| PR-02 | Phase 6 | Complete |
| PR-03 | Phase 6 | Complete |
| DASH-01 | Phase 7 | Complete |
| DASH-02 | Phase 7 | Complete |
| DASH-03 | Phase 7 | Complete |
| ISO-01 | Phase 8 | Complete |
| ISO-02 | Phase 8 | Complete |
| ISO-03 | Phase 8 | Complete |
Expand All @@ -84,4 +93,4 @@ Which phases cover which requirements. Updated during roadmap creation.

---
*Requirements defined: 2026-03-09*
*Last updated: 2026-03-09 after phase 8 planning*
*Last updated: 2026-03-09 after phase 6 completion*
14 changes: 7 additions & 7 deletions .planning/ROADMAP.md → .planning/milestones/v0.3.1-ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ Full details: `.planning/milestones/v1.0-ROADMAP.md`
**Milestone Goal:** Automated benchmark tracking in CI with PR regression feedback and a public GitHub Pages dashboard.

- [x] **Phase 5: Benchmark Pipeline** - gh-pages branch, benchmark workflow job, auto-push on main, release snapshots (completed 2026-03-09)
- [ ] **Phase 6: PR Feedback** - PR comparison comments, configurable alert threshold, fail-on-regression gate
- [ ] **Phase 7: Dashboard and README** - Chart.js dashboard with project docs, README live figures, data growth limits
- [x] **Phase 6: PR Feedback** - PR comparison comments, configurable alert threshold, fail-on-regression gate (completed 2026-03-09)
- [x] **Phase 7: Dashboard and README** - Chart.js dashboard with project docs, README live figures, data growth limits (completed 2026-03-10)

## Phase Details

Expand Down Expand Up @@ -58,7 +58,7 @@ Plans:
**Plans**: 1 plan

Plans:
- [ ] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml
- [x] 06-01-PLAN.md — Add PR trigger, comparison step, and fail-on-regression gate to benchmark.yml

### Phase 7: Dashboard and README
**Goal**: Users can view benchmark trends over time on a public dashboard and see live figures in the README
Expand All @@ -68,10 +68,10 @@ Plans:
1. GitHub Pages serves a Chart.js time-series dashboard with project description, usage, and links
2. README displays live benchmark figures sourced from GitHub Pages, replacing any static visualization PNGs
3. max-items-in-chart is configured to limit data growth on gh-pages
**Plans**: TBD
**Plans**: 1 plan

Plans:
- [ ] 07-01: TBD
- [ ] 07-01-PLAN.md — Add max-items-in-chart to workflow, create gh-pages landing page, replace README PNG embeds with dashboard link

### Phase 8: Fix failing tests in Redis/Mongo backends (test isolation)
**Goal:** MongoDB and Redis contract tests pass reliably with per-test data isolation via unique group names
Expand All @@ -98,6 +98,6 @@ Phases execute in numeric order: 5 -> 6 -> 7
| 3. Contract Test Suite | v1.0 | 4/4 | Complete | 2026-03-06 |
| 4. Benchmarks & Performance | v1.0 | 2/2 | Complete | 2026-03-06 |
| 5. Benchmark Pipeline | 1/1 | Complete | 2026-03-09 | - |
| 6. PR Feedback | v0.3.1 | 0/1 | Not started | - |
| 7. Dashboard and README | v0.3.1 | 0/? | Not started | - |
| 6. PR Feedback | v0.3.1 | 1/1 | Complete | 2026-03-09 |
| 7. Dashboard and README | v0.3.1 | 1/1 | Complete | 2026-03-10 |
| 8. Test Isolation Fix | Maintenance | 1/1 | Complete | 2026-03-09 |
Loading
Loading