Skip to content

ci: extend quality-gate matrix to Windows; fix codex backend test isolation#44

Merged
szjanikowski merged 1 commit into
mainfrom
ci/quality-gate-windows
May 8, 2026
Merged

ci: extend quality-gate matrix to Windows; fix codex backend test isolation#44
szjanikowski merged 1 commit into
mainfrom
ci/quality-gate-windows

Conversation

@szjanikowski
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #43. Extends the quality-gate to actually catch Windows-only breakage and fixes the two remaining Windows-only test isolation bugs on main.

  • .github/workflows/quality-gate.yml — pytest job now runs on [ubuntu-latest, windows-latest] × [3.12, 3.13] with fail-fast: false. Lint/typecheck/audit stay single-OS (pure-Python checks, no platform variance).
  • tests/test_evaluator_backends.py — replace monkeypatch.setenv("HOME", str(tmp_path)) with monkeypatch.setattr("pathlib.Path.home", lambda: tmp_path) in the five codex-oauth tests. HOME doesn't drive Path.home() on Windows (which reads USERPROFILE), so _has_chatgpt_oauth() was leaking into the dev's real ~/.codex/auth.json and flipping the strip/promote branches in _build_env. The new pattern matches what tests/test_runner.py:315,398 already uses and is cross-platform.

The ticket flagged a missing monkeypatch.delenv("CODEX_API_KEY") for test_codex_backend_env_strips_api_keys_when_oauth_present, but the test already does monkeypatch.setenv("CODEX_API_KEY", ...) which overrides the dev shell. The actual cause was the same Path.home() issue — _has_chatgpt_oauth() returned False (real ~/.codex not visible), so the OAuth-strip branch never fired and CODEX_API_KEY survived in env.

No production-code change.

Test plan

  • uv run pytest tests/ -v on local Windows — 147 passed, including the three previously failing tests.
  • uv run ruff check tests/test_evaluator_backends.py — clean.
  • uv run ruff format --check tests/test_evaluator_backends.py — clean.
  • CI: all four matrix cells ({ubuntu, windows} × {3.12, 3.13}) green on this PR.

Out of scope

  • macOS coverage — no realistic Windows-style platform bugs hiding there; can be added separately if a need arises.

🤖 Generated with Claude Code

…lation

Linux-only quality-gate hid three Windows-only test failures on main. The
runner-test bug was fixed in #43; the remaining two were `Path.home()`
isolation gaps in `tests/test_evaluator_backends.py`: setting `HOME` does not
redirect `Path.home()` on Windows (which reads `USERPROFILE`), so
`_has_chatgpt_oauth()` leaked into the dev's real `~/.codex/auth.json` and
flipped the strip/promote branches in `_build_env`.

- `tests/test_evaluator_backends.py`: replace `monkeypatch.setenv("HOME", ...)`
  with `monkeypatch.setattr("pathlib.Path.home", lambda: tmp_path)` in the
  five codex-oauth tests. Matches the pattern already used in
  `tests/test_runner.py:315,398`. Cross-platform; tests no longer touch
  real env state.
- `.github/workflows/quality-gate.yml`: pytest job now runs on
  `[ubuntu-latest, windows-latest] × ["3.12", "3.13"]` with
  `fail-fast: false`. Lint/typecheck/audit stay single-OS — pure-Python
  checks with no platform variance.

Verified locally on Windows: all 147 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@szjanikowski szjanikowski merged commit a31164a into main May 8, 2026
9 checks passed
szjanikowski added a commit that referenced this pull request May 9, 2026
* docs(changelog): record Windows compat + skill bundling under [Unreleased]

Captures everything that landed since v0.3.2 (PRs #42, #43, #44, #45) so
the release notes are queued up. Keeps the section under [Unreleased]
deliberately — more fixes are still expected before cutting v0.3.3, and
release docs say the version header should be added at release-cut time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): record .gitattributes LF enforcement (#47)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(changelog): record harbor[cloud] dependency switch (#48)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: release v0.3.3

Cuts v0.3.3 from the accumulated [Unreleased] section. Patch bump
(pre-1.0 policy): no breaking changes to user-facing CLI, nasde.toml
schema, or benchmark project layout — Windows compat, OAuth-script
bundling, harbor[cloud] dep switch are all transparent to users.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Szymon Janikowski <szymon.janikowski@itlibrium.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@szjanikowski szjanikowski deleted the ci/quality-gate-windows branch May 9, 2026 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant