-
Notifications
You must be signed in to change notification settings - Fork 0
Gsd/bench-dash #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gsd/bench-dash #14
Changes from all commits
f3b9cbb
3c00017
4d402dc
6435d15
84fa645
edf75e8
97d43da
256e756
70e72cf
ce2ac46
cb4110b
9a66c99
98dc169
a72d64b
d9bac03
4b23463
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,34 +3,31 @@ gsd_state_version: 1.0 | |
| milestone: v1.0 | ||
| milestone_name: milestone | ||
| status: completed | ||
| stopped_at: Phase 8 context gathered | ||
| last_updated: "2026-03-09T20:46:43.140Z" | ||
| last_activity: 2026-03-09 -- Completed 05-01 benchmark pipeline | ||
| stopped_at: Completed 07-01-PLAN.md | ||
| last_updated: "2026-03-10T14:06:58.937Z" | ||
| last_activity: "2026-03-10 - Completed 07-01: Dashboard landing page, README update, max-items-in-chart" | ||
| progress: | ||
| total_phases: 4 | ||
| completed_phases: 1 | ||
| total_plans: 2 | ||
| completed_plans: 1 | ||
| completed_phases: 4 | ||
| total_plans: 4 | ||
| completed_plans: 4 | ||
| percent: 100 | ||
| --- | ||
|
|
||
| # Project State | ||
|
|
||
| ## Project Reference | ||
|
|
||
| See: .planning/PROJECT.md (updated 2026-03-09) | ||
| See: .planning/PROJECT.md (updated 2026-03-10) | ||
|
|
||
| **Core value:** Every storage backend must be fast, correct, and tested through a single parametrized test suite | ||
| **Current focus:** v0.3.1 -- Phase 5: Benchmark Pipeline | ||
| **Current focus:** Planning next milestone | ||
|
|
||
| ## Current Position | ||
|
|
||
| Phase: 8 of 8 (Test Isolation Fix) -- maintenance | ||
| Plan: 1 of 1 (complete) | ||
| Status: Phase 8 complete | ||
| Last activity: 2026-03-09 - Completed quick task 1: Make MongoDB backend cache_ttl configurable | ||
|
|
||
| Progress: [██████████] 100% | ||
| Milestone v0.3.1 complete. All 4 phases shipped (5-8). | ||
| Status: Between milestones | ||
| Last activity: 2026-03-10 - Milestone v0.3.1 archived | ||
|
|
||
| ## Performance Metrics | ||
|
|
||
|
|
@@ -49,6 +46,11 @@ Recent: github-action-benchmark selected as sole CI benchmark tool (research pha | |
| - Single Python 3.13 for benchmarks -- consistent baseline (CI-02, 05-01) | ||
| - No separate release/tag trigger -- main pushes cover it (CI-04, 05-01) | ||
| - Uniform group= on all backends, no conditional logic per backend type (08-01) | ||
| - Dual benchmark-action steps: main auto-push vs PR compare-only (06-01) | ||
| - 150% alert threshold as configurable YAML value (06-01) | ||
| - Branch protection documented as manual one-time setup (06-01) | ||
| - max-items-in-chart: 200 on store step only, PR step irrelevant (07-01) | ||
| - gh-pages root manually managed, CI only writes /dev/bench/ (07-01) | ||
|
|
||
| ### Pending Todos | ||
|
|
||
|
|
@@ -67,10 +69,11 @@ None. | |
|
|
||
| | # | Description | Date | Commit | Directory | | ||
| |---|-------------|------|--------|-----------| | ||
| | 1 | Make MongoDB backend cache_ttl configurable with None meaning no caching | 2026-03-09 | pending | [1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/) | | ||
| | 1 | Make MongoDB backend cache_ttl configurable with None meaning no caching | 2026-03-09 | 4848760 | [1-make-mongodb-backend-cache-ttl-configura](./quick/1-make-mongodb-backend-cache-ttl-configura/) | | ||
| | Phase 07 P01 | 1min | 2 tasks | 4 files | | ||
|
Comment on lines
+72
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix the malformed Quick Tasks table row. Line 76 has fewer cells than the 5-column header, so the Markdown table will render incorrectly and shift columns. 🧰 Tools🪛 LanguageTool[grammar] ~76-~76: Ensure spelling is correct (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) 🤖 Prompt for AI Agents |
||
|
|
||
| ## Session Continuity | ||
|
|
||
| Last session: 2026-03-09T20:55:36Z | ||
| Stopped at: Completed 08-01-PLAN.md | ||
| Next action: Next phase or plan | ||
| Last session: 2026-03-10T12:39:21.801Z | ||
| Stopped at: Completed 07-01-PLAN.md | ||
| Next action: Phase 7 complete, proceed to next phase or wrap up | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| --- | ||
| status: diagnosed | ||
| trigger: "PR benchmark Job Summary shows test names but no performance comparison diffs" | ||
| created: 2026-03-10T14:00:00Z | ||
| updated: 2026-03-10T14:30:00Z | ||
| --- | ||
|
|
||
| ## Current Focus | ||
|
|
||
| hypothesis: The summary-always Job Summary IS being generated with comparison data, but the user expects a PR comment (which requires comment-always: true). The summary table with diffs only appears on the workflow run's Summary tab, NOT on the PR page itself. | ||
| test: Verified action source code, log output, and configuration | ||
| expecting: n/a - diagnosis complete | ||
| next_action: Return diagnosis | ||
|
|
||
| ## Symptoms | ||
|
|
||
| expected: PR comparison step shows performance numbers + diffs against main baseline in Job Summary | ||
| actual: Only test names appear, no performance comparisons | ||
| errors: None - step succeeds | ||
| reproduction: Run any PR against main when gh-pages baseline exists | ||
| started: First PR run after baseline was established | ||
|
|
||
| ## Eliminated | ||
|
|
||
| - hypothesis: gh-pages data not in correct format/location | ||
| evidence: data.js exists at dev/bench/data.js on gh-pages, format is correct (window.BENCHMARK_DATA = {...}), entries key is "Benchmark" matching the action's default name parameter. 1 baseline entry from commit fa2713e with 373 benchmarks. | ||
| timestamp: 2026-03-10T14:10:00Z | ||
|
|
||
| - hypothesis: prevBench is null causing summary to be skipped entirely | ||
| evidence: Source code analysis of addBenchmarkEntry.ts shows prevBench is found when entries exist with different commit IDs. Baseline commit (fa2713e) differs from PR commit (ce2ac46). The action loads data.js from gh-pages, finds the existing entry, and sets prevBench. | ||
| timestamp: 2026-03-10T14:15:00Z | ||
|
|
||
| - hypothesis: Benchmark names don't match between baseline and PR (causing empty Previous/Ratio columns) | ||
| evidence: Both baseline and PR run 373 benchmarks with identical test names from the same test suite. | ||
| timestamp: 2026-03-10T14:20:00Z | ||
|
|
||
| - hypothesis: Job Summary exceeds size limit (1MB) | ||
| evidence: Estimated summary size is ~75KB for 373 benchmarks, well under the 1MB limit. No error in logs about summary upload failure. | ||
| timestamp: 2026-03-10T14:22:00Z | ||
|
|
||
| - hypothesis: external-data-json-path is needed instead of gh-pages-branch | ||
| evidence: Source code confirms gh-pages-branch mode works correctly for comparison. The action fetches gh-pages, loads data.js, finds previous benchmark entry, and passes it to handleSummary. Both modes are valid. | ||
| timestamp: 2026-03-10T14:25:00Z | ||
|
|
||
| ## Evidence | ||
|
|
||
| - timestamp: 2026-03-10T14:05:00Z | ||
| checked: gh-pages branch content | ||
| found: dev/bench/data.js exists with 1 entry (commit fa2713e, 373 benchmarks). dev/bench/index.html also present. | ||
| implication: Baseline data is correctly stored | ||
|
|
||
| - timestamp: 2026-03-10T14:08:00Z | ||
| checked: CI run logs for step "Compare benchmark results (PR)" | ||
| found: Step completed successfully. Action fetched gh-pages, switched to it, loaded data.js, committed updated data locally (2632 insertions), switched back. Printed "github-action-benchmark was run successfully!" with PR data (commit ce2ac46, 373 benchmarks). | ||
| implication: Action executed without errors | ||
|
|
||
| - timestamp: 2026-03-10T14:12:00Z | ||
| checked: Action source code (v1.21.0 / SHA a7bc2366) - write.ts, addBenchmarkEntry.ts, index.ts | ||
| found: writeBenchmark() calls writeBenchmarkToGitHubPages() which loads data.js, calls addBenchmarkEntry() to find prevBench. If prevBench is not null, handleSummary() is called which uses buildComment(name, curr, prev, false) to generate a markdown table with columns [Benchmark suite | Current | Previous | Ratio] and writes it via core.summary.write(). | ||
| implication: The comparison table SHOULD be generated when baseline exists | ||
|
|
||
| - timestamp: 2026-03-10T14:14:00Z | ||
| checked: addBenchmarkEntry.ts logic | ||
| found: Iterates existing entries in reverse, finds first entry with different commit.id. Since baseline has fa2713e and PR has ce2ac46, prevBench will be set to the baseline entry. | ||
| implication: prevBench is NOT null, so handleSummary IS called | ||
|
|
||
| - timestamp: 2026-03-10T14:18:00Z | ||
| checked: PR comments and check run output | ||
| found: No benchmark comment on PR (only CodeRabbit comment). Check run output.summary is null. This is expected because comment-always defaults to false. | ||
| implication: Comparison data only appears in Job Summary tab, not on the PR page | ||
|
|
||
| - timestamp: 2026-03-10T14:20:00Z | ||
| checked: Workflow configuration for comment-always | ||
| found: comment-always is not set (defaults to false). comment-on-alert: true only posts when regression exceeds threshold. | ||
| implication: No comparison data appears on the PR page unless there's an alert | ||
|
|
||
| ## Resolution | ||
|
|
||
| root_cause: The benchmark comparison IS being generated and written to the Job Summary (GITHUB_STEP_SUMMARY), but it is NOT visible on the PR page itself. The `comment-always` parameter defaults to `false`, so no comparison comment is posted on the PR. The user likely sees only pytest output (test names with timing data) when viewing the PR checks, and needs to navigate to the workflow run's Summary tab to see the comparison table. Additionally, `comment-on-alert: true` only triggers a PR comment when performance regression exceeds the 150% threshold, which did not occur in this run. | ||
|
|
||
| fix: Add `comment-always: true` to the PR comparison step to post the full comparison table as a PR comment, making it visible directly on the PR page. | ||
|
|
||
| verification: n/a - diagnosis only | ||
| files_changed: [] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,12 @@ | ||
| # Requirements Archive: v0.3.1 CI Benchmark Infrastructure | ||
|
|
||
| **Archived:** 2026-03-10 | ||
| **Status:** SHIPPED | ||
|
|
||
| For current requirements, see `.planning/REQUIREMENTS.md`. | ||
|
|
||
| --- | ||
|
|
||
| # Requirements: asebytes | ||
|
|
||
| **Defined:** 2026-03-09 | ||
|
|
@@ -16,15 +25,15 @@ Requirements for CI benchmark infrastructure milestone. Each maps to roadmap pha | |
|
|
||
| ### PR Feedback | ||
|
|
||
| - [ ] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements | ||
| - [ ] **PR-02**: Alert threshold is configurable (starting at 150%) | ||
| - [ ] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression | ||
| - [x] **PR-01**: PRs receive a full benchmark comparison summary (tables with deltas for all benchmarks) vs main -- showing both regressions and improvements | ||
| - [x] **PR-02**: Alert threshold is configurable (starting at 150%) | ||
| - [x] **PR-03**: Fail-on-regression gate blocks PR merge on benchmark regression | ||
|
|
||
| ### Dashboard | ||
|
|
||
| - [ ] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links) | ||
| - [ ] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs | ||
| - [ ] **DASH-03**: max-items-in-chart limits data growth on gh-pages | ||
| - [x] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links) | ||
| - [x] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs | ||
| - [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages | ||
|
Comment on lines
+34
to
+36
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't mark the dashboard requirements complete yet. These updates record all Suggested direction- - [x] **DASH-01**: GitHub Pages serves auto-generated Chart.js time-series dashboard with minimal project docs (description, usage, links)
- - [x] **DASH-02**: README embeds live benchmark figures from GitHub Pages, replacing static visualization PNGs
- - [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages
+ - [ ] **DASH-01**: GitHub Pages serves a landing page with project docs and links to the auto-generated Chart.js dashboard
+ - [x] **DASH-02**: README links to the live benchmark dashboard on GitHub Pages, replacing static visualization PNGs
+ - [x] **DASH-03**: max-items-in-chart limits data growth on gh-pages
- | DASH-01 | Phase 7 | Complete |
- | DASH-02 | Phase 7 | Complete |
- | DASH-03 | Phase 7 | Complete |
+ | DASH-01 | Phase 7 | In progress |
+ | DASH-02 | Phase 7 | Complete |
+ | DASH-03 | Phase 7 | Complete |
-*Last updated: 2026-03-09 after phase 6 completion*
+*Last updated: 2026-03-10 after Phase 7 partial implementation*Also applies to: 72-74, 87-87 🤖 Prompt for AI Agents |
||
|
|
||
| ## Maintenance Requirements | ||
|
|
||
|
|
@@ -66,12 +75,12 @@ Which phases cover which requirements. Updated during roadmap creation. | |
| | CI-02 | Phase 5 | Complete | | ||
| | CI-03 | Phase 5 | Complete | | ||
| | CI-04 | Phase 5 | Complete | | ||
| | PR-01 | Phase 6 | Pending | | ||
| | PR-02 | Phase 6 | Pending | | ||
| | PR-03 | Phase 6 | Pending | | ||
| | DASH-01 | Phase 7 | Pending | | ||
| | DASH-02 | Phase 7 | Pending | | ||
| | DASH-03 | Phase 7 | Pending | | ||
| | PR-01 | Phase 6 | Complete | | ||
| | PR-02 | Phase 6 | Complete | | ||
| | PR-03 | Phase 6 | Complete | | ||
| | DASH-01 | Phase 7 | Complete | | ||
| | DASH-02 | Phase 7 | Complete | | ||
| | DASH-03 | Phase 7 | Complete | | ||
| | ISO-01 | Phase 8 | Complete | | ||
| | ISO-02 | Phase 8 | Complete | | ||
| | ISO-03 | Phase 8 | Complete | | ||
|
|
@@ -84,4 +93,4 @@ Which phases cover which requirements. Updated during roadmap creation. | |
|
|
||
| --- | ||
| *Requirements defined: 2026-03-09* | ||
| *Last updated: 2026-03-09 after phase 8 planning* | ||
| *Last updated: 2026-03-09 after phase 6 completion* | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the phase counters and current-position text consistent.
progressnow reports 4/4 phases and 100% completion, while the same file saysPhase: 7 of 8. Those states cannot both be true, so this document no longer provides a reliable source of project status.Also applies to: 28-31
🤖 Prompt for AI Agents