From a2647be2a6dd910271ac453fa47ff4cba30808d0 Mon Sep 17 00:00:00 2001 From: Paddy Mullen Date: Tue, 24 Feb 2026 07:54:54 -0500 Subject: [PATCH] docs: make retro test plan concrete and windows-runtime aware --- docs/retroactive-pr-review-2026-02.md | 152 ++++++++++++++++++++++++++ 1 file changed, 152 insertions(+) create mode 100644 docs/retroactive-pr-review-2026-02.md diff --git a/docs/retroactive-pr-review-2026-02.md b/docs/retroactive-pr-review-2026-02.md new file mode 100644 index 000000000..f746ebc2f --- /dev/null +++ b/docs/retroactive-pr-review-2026-02.md @@ -0,0 +1,152 @@ +# Retroactive Code Review Plan (Last 15 Merged PRs) + +## Scope reviewed +The most recent 15 PR-associated commits (by `git log --oneline` with `(#NNN)` tags): +#570, #564, #547, #551, #562, #563, #567, #566, #565, #568, #550, #545, #542, #537, #533. + +## Draft Issue + +**Title:** Retroactive QA findings: `no_browser` request parsing bug + cross-platform test blind spots + +**Problem summary** +A retroactive review of the last 15 merged PRs found one likely behavioral bug and multiple test coverage gaps: + +1. **Likely bug in `/load` request parsing for `no_browser`:** + - `no_browser` is parsed as `bool(body.get("no_browser", False))`. + - Any non-empty string (including `"false"`) becomes `True`, so API clients sending JSON strings can unintentionally suppress browser launch. + +2. **Windows coverage is intentionally non-blocking and reduced (runtime tradeoff):** + - CI marks Windows tests as `continue-on-error: true` because those runs are currently ~8–12 minutes versus ~4 minutes for the rest of the suite. + - Windows runs only `-m "not slow"` and multiple server/cache test modules are skipped on Windows. + - This is a reasonable throughput optimization, but it leaves a platform-risk blind spot for file-locking, process handling, and temp-file lifecycle behavior. + +3. **Regression coverage gaps for recent fixes:** + - The pandas 3.0 compatibility fix in `get_mode()` changed to `is_numeric_dtype(ser)` but no dedicated regression test was added. + - New `/load` `no_browser` behavior has no explicit unit/integration tests for accepted input types (`true/false`, missing key, malformed types). + +**Why this matters** +- The `no_browser` bug can silently invert behavior for API clients. +- Windows stability may drift because failures do not block merges. +- Recent compatibility fixes can regress without targeted tests. + +**Acceptance criteria** +- [ ] Fix `no_browser` parsing to support strict booleans and reject ambiguous values. +- [ ] Add `/load` tests covering `no_browser` permutations and response behavior. +- [ ] Add pandas 3.0+ regression tests for `get_mode()`. +- [ ] Introduce a Windows hardening plan that preserves CI throughput. + +--- + +## Related PR Draft (planning-only PR) + +**Title:** docs: add retroactive review findings and QA hardening plan for recent merged PRs + +**Body:** + +### What this planning PR contains +- Documents retroactive findings from the last 15 merged PRs. +- Proposes an implementation sequence for bug fixes and test hardening. +- Defines testing methodology and quality gates for follow-up fix PRs. + +### Key findings +1. `/load` `no_browser` parsing likely treats string values like `"false"` as truthy. +2. Windows CI is intentionally non-blocking due to runtime cost, but currently excludes multiple risk-heavy paths. +3. Recent compatibility fixes need durable regression tests. + +### Proposed implementation plan (for follow-up fix PRs) +1. **Hotfix-level:** normalize/validate `no_browser` input and add tests. +2. **Short-term:** add pandas 3.0 regression tests in analysis unit tests. +3. **Mid-term:** reduce Windows skips by refactoring temp-file/process cleanup and SQLite lifecycle handling. +4. **Policy:** when a bugfix PR merges, require at least one reproducer test in the same PR. + +## Specific test plan (what to write and what to assert) + +### A) `/load` and `no_browser` handler contract tests +**File target:** `tests/unit/server/test_server.py` (or split into a new `test_load_contract.py`). + +**Implementation guidance** +- Use table-driven tests (`pytest.mark.parametrize`) for request payload variants. +- In tests, monkeypatch `LoadHandler._handle_browser_window` to a spy/mock so assertions are deterministic. +- Assert both HTTP status and JSON `browser_action` in the `/load` response. + +**Test cases to add** +1. `no_browser` omitted → `browser_action != "skipped"` (normal behavior). +2. `no_browser: true` (JSON bool) → `browser_action == "skipped"`. +3. `no_browser: false` (JSON bool) → browser handler invoked; not skipped. +4. `no_browser: "false"` (string) → **either** 400 validation error **or** explicitly coerced false by contract (choose one and lock with test). +5. `no_browser: "true"` (string) → same contract choice as above. +6. `no_browser: 0/1` (numbers) → rejected or explicitly mapped by contract. +7. `no_browser: null` → defaults to false or rejected (decide contract + test). +8. `no_browser: {}` / `[]` → rejected with 400. +9. malformed JSON body → 400 (already exists, keep coverage). +10. missing required fields (`session`, `path`) → 400 (already exists, keep coverage). + +**Regression objective** +- Prevent future truthy-coercion bugs from silently changing API behavior. + +### B) pandas 3.0 `get_mode()` regression tests +**File target:** create `tests/unit/customizations/analysis_test.py` (or nearest existing analysis test module). + +**Implementation guidance** +- Keep tests focused on behavior, not internal pandas API names. +- Include both numeric and non-numeric series, plus null-heavy edge cases. + +**Test cases to add** +1. Numeric series with clear mode `[1,1,2]` → returns `1`. +2. Numeric series all-unique `[1,2,3]` → returns first mode semantics (document expected behavior). +3. Float series with NaNs `[1.0, 1.0, np.nan]` → returns `1.0`. +4. Non-numeric series `['a','a','b']` → returns `np.nan` per current contract. +5. Datetime/categorical series → confirm current intended behavior (`np.nan` unless design says otherwise). +6. Empty series → returns `np.nan`. +7. Ensure function does not raise on pandas 3.x types that used to trigger `is_numeric` breakage. + +**Regression objective** +- Guarantee the pandas 3.0 compatibility path remains stable. + +### C) Windows-risk tests without making main CI slower +**Goal:** improve signal while preserving current non-blocking runtime strategy. + +**Implementation guidance** +- Keep current non-blocking Windows job for PR latency. +- Add a **scheduled/nightly blocking Windows job** with broader coverage (`slow` + currently skipped candidates where feasible). +- Add a lightweight PR Windows smoke subset focused on file-lock/process-lifecycle hotspots. + +**Concrete additions** +1. New marker `@pytest.mark.windows_smoke` for fast, deterministic Windows-critical tests. +2. PR Windows job runs `-m "windows_smoke or not slow"` (or separate step for clarity). +3. Nightly Windows job runs fuller suite and fails loudly. +4. Add timing budget tracking in CI summary (report per-job duration) to monitor regressions. + +**Regression objective** +- Catch Windows-only breakages earlier without turning every PR into a 10+ minute bottleneck. + +### D) Extras smoke tests: expand from import checks to behavior checks +**File target:** `scripts/smoke_test.py`. + +**Implementation guidance** +- Keep quick import checks, but add one representative behavior assertion per extra. + +**Examples** +1. `mcp` extra: instantiate app via `make_app()` and assert routes include `/load` + `/ws`. +2. `marimo` extra: instantiate a minimal `BuckarooDataFrame` and verify expected attribute/method availability. +3. `jupyterlab/notebook` extras: verify minimum major versions + one lightweight runtime call. +4. `polars` extra: serialize + deserialize roundtrip shape/value check (not just byte-length > 0). + +--- + +## Prioritized execution plan + +1. **P0 — Fix and test the `no_browser` request contract.** +2. **P1 — Add targeted pandas `get_mode()` regression tests.** +3. **P1 — Add Windows nightly/expanded coverage while keeping PR Windows non-blocking for speed.** +4. **P2 — Expand extras smoke tests from import-level checks to behavior-level checks.** + +## Should we look at anything else in this process? +Yes—add these checkpoints to future retro reviews: + +- **Rollback readiness:** ensure every risky behavior change has an easy revert path or feature flag. +- **Observability checks:** verify new code paths emit logs/metrics that make regressions detectable. +- **Contract drift audits:** check API handler type expectations against real client payload examples. +- **CI gate integrity:** ensure required jobs actually cover the changed subsystem (avoid all-green but irrelevant checks). +- **Flake accounting:** maintain a known-flaky registry with owners and target dates, not ad-hoc skips. +- **Perf-vs-signal review:** document why a job is non-blocking and what alternate gate (nightly/release) carries the risk.