Skip to content

chore(release): operationalize PyPI publishing — bump harbor, smoke gates, docs#38

Merged
szjanikowski merged 7 commits into
mainfrom
chore/bump-harbor-0.6
May 7, 2026
Merged

chore(release): operationalize PyPI publishing — bump harbor, smoke gates, docs#38
szjanikowski merged 7 commits into
mainfrom
chore/bump-harbor-0.6

Conversation

@szjanikowski
Copy link
Copy Markdown
Contributor

@szjanikowski szjanikowski commented May 6, 2026

Summary

What started as a routine harbor 0.4 → 0.6 bump uncovered a chain of issues in the just-merged PyPI publish pipeline (#36, #37) that were only visible when actually trying to install nasde-toolkit from a clean environment. This PR fixes them all and hardens the release flow so we don't repeat the experience.

What's in this PR

Commit Why
chore(deps): bump harbor 0.4 → 0.6, refresh opik to 2.0.22 Latest harbor (0.6.4) and opik (2.0.22). Side benefit: pulls in supabase 2.29.0 instead of broken 2.28.3 (see below).
ci(publish): make TestPyPI uploads idempotent (skip-existing) TestPyPI rejects re-uploads of the same filename. Two consecutive workflow_dispatch runs producing the same devN hit 400 File already exists. skip-existing: true makes re-runs harmless.
fix(runner): adapt to JobStats field rename in harbor 0.6 Harbor 0.6 renamed n_trials → n_completed_trials and n_errors → n_errored_trials on top-level JobStats. Our _print_job_summary crashed after a successful trial. Caught only by an end-to-end smoke test (claude-vanilla on python-gilded-rose with --with-opik).
fix(deps): pin supabase>=2.29.0,<3 to avoid broken transitive wheels Two upstream releases (supabase 2.28.3 and 3.0.0a1) ship wheels missing the _async/ and _sync/ subpackages, which causes nasde --version to fail at import. Constraint pins us within healthy 2.x.
ci(publish): add fresh-install smoke tests + weekly canary After every TestPyPI publish (and PyPI publish on tag push), spin up a clean venv, install the just-published version, and run nasde --version + --help. Weekly Monday cron catches transitive-dep breakage between releases.
ci(publish): linear release flow with TestPyPI gate before PyPI Restructured the workflow so publish-pypi is gated by a successful smoke test on TestPyPI. Previously the PyPI smoke ran after publish — too late to prevent a broken release. Renamed dispatch input test_pypi → publish_to_prod (default false). Added "tag must be on main" guard on PyPI publish.
docs(releasing): rewrite for automated publish.yml flow RELEASING.md and ci-setup.md still described a manual gh release create flow with "do not publish to PyPI" warnings. Rewritten for the automated flow with a recovery table for the failure modes hit during this PR's verification.

The release flow after this PR

git tag vX.Y.Z && git push origin vX.Y.Z
  └─ publish.yml automatically:
       quality-gate → build → publish-testpypi → smoke-test-testpypi
                                                       │ ← GATE
                                                       ▼
                                                  publish-pypi
                                                  (verify tag on main, create GH Release)
                                                       │
                                                       ▼
                                                  smoke-test-pypi

Manual TestPyPI dry-run from any branch (no tag needed):

gh workflow run publish.yml --ref <branch>

Disaster recovery (PyPI publish without a tag):

gh workflow run publish.yml --ref main --field publish_to_prod=true

Weekly canary on Monday 09:00 UTC re-runs the TestPyPI flow — catches transitive-dep breakage between releases.

Test plan

  • uv run pytest — 147 passed (24 in test_update_check.py from feat: PyPI publishing + in-CLI update notifier #36)
  • uv run ruff check / ruff format --check / mypy — all clean
  • uv run nasde --version and CI smoke variants work locally
  • uv build produces wheel + sdist with correct version (0.3.1.devN)
  • End-to-end benchmark (nasde run --variant claude-vanilla --tasks python-gilded-rose-polymorphism --with-opik -C examples/refactoring-skill) completed successfully: trial ran, assessment eval scored 71/100, 7 feedback scores uploaded to Opik
  • TestPyPI clean install — verified 0.3.1.dev5 resolves correctly with harbor 0.6.5, opik 2.0.23, supabase 2.30.0; both nasde --version and nasde --help work
  • Linear flow verified — manual dispatch with publish_to_prod=false (default) correctly skips publish-pypi and smoke-test-pypi
  • After merge: tag the next release (likely v0.3.1) to verify the full tag-push → PyPI flow end-to-end

Related

Harbor 0.6.x has been out for a week with adapter fixes, Modal/Windows
support, and gemini-cli v0.40+ session improvements. No breaking
changes for our usage — Job.create() / await job.run() pattern is
identical to 0.4. Bumped pin to >=0.6,<0.7.

Side benefit: this resolves a transitive dep nightmare. supabase 2.28.3
(pulled in via harbor 0.4) shipped a broken wheel without the
`supabase_auth._async.storage` module, breaking `nasde --version` for
anyone installing from PyPI in a fresh environment. supabase 2.29.0
(via harbor 0.6.4) ships a healthy wheel.

opik refreshed to 2.0.22 within the existing >=2,<3 range — minor
bugfix release from today.

Verified locally:
- 147 tests pass
- ruff, mypy, format clean
- `nasde --version` and `nasde --help` work in installed venv
- supabase-auth 2.29.0 wheel exposes _async.storage correctly

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PyPI/TestPyPI enforce no-file-reuse: a wheel uploaded once cannot be
overwritten. With hatch-vcs producing the same dev version (e.g.
'0.3.1.dev2') for any commit at the same git distance from the last
tag, two consecutive workflow_dispatch runs from different branches
produce wheels with identical filenames. The second run was failing
with HTTP 400 'File already exists'.

skip-existing: true makes the TestPyPI upload no-op when the file is
already present, so re-running the dispatch from a feature branch is
harmless. Production PyPI publish still rejects re-uploads (intended —
we never want a silent overwrite there).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbor 0.6 renamed JobStats top-level counters:
- n_trials → n_completed_trials
- n_errors → n_errored_trials

Caught by an e2e smoke test (claude-vanilla on python-gilded-rose):
the trial completed successfully and Opik scores uploaded, but
_print_job_summary crashed with AttributeError on result.stats.n_trials.

Per-eval stats (AgentDatasetStats.n_trials / .n_errors) still use the
old names in 0.6 — those references are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two upstream supabase-py releases ship wheels missing the _async/ and
_sync/ submodules (sdists are fine, but pip/uv prefer wheels):

- supabase 2.28.3 (released 2026-03-20)
- supabase 3.0.0a1 (released today, 2026-05-07, pre-release)

Both cause `nasde --version` to fail at import with
"ModuleNotFoundError: No module named 'supabase_auth._async'" — caught
during clean-install verification of nasde-toolkit==0.3.1.dev4 from
TestPyPI.

The dev environment doesn't see this because uv.lock pinned 2.29.0
(a healthy build). Fresh installs by users have no such guard, and
`--pre` (which we'll likely document for installing dev builds)
propagates to all dependencies, pulling in 3.0.0a1.

Constraint: keep within 2.x for now. Remove when supabase publishes
a healthy 3.x wheel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After every TestPyPI publish (manual dispatch or scheduled cron) and
every PyPI publish (tag push), spin up a clean venv, install the just-
published nasde-toolkit, and run `nasde --version` + `nasde --help`.
This is the gate that would have caught the supabase 2.28.3 broken-
wheel issue before it reached users — exact scenario we hit during
verification of #36.

Mechanics:
- build job exposes `version` as output so smoke jobs can pin to the
  exact dev/release version that was published.
- smoke-test-testpypi runs after publish-testpypi; uses both PyPI and
  TestPyPI indices (TestPyPI for nasde-toolkit, PyPI for transitives
  that aren't on TestPyPI).
- smoke-test-pypi runs after publish-pypi (only on tag push); uses
  default PyPI index. Failure of this job marks the workflow run as
  failed in GitHub Actions even though the publish itself succeeded —
  surfaces the issue immediately.
- Weekly cron `0 9 * * 1` does a TestPyPI publish + smoke test, so an
  upstream wheel going bad between our releases (e.g. supabase 3.x
  alpha appearing) shows up within a week instead of at our next
  release.

The `--refresh` flag forces uv to re-resolve from the index instead of
using a stale cache, ensuring the test exercises what a brand-new user
would see.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructured publish.yml so PyPI publication is gated by a successful
fresh-install smoke test on TestPyPI. Previously, smoke-test-pypi ran
*after* publish-pypi — too late to prevent a broken release from
reaching users (PyPI doesn't allow file deletion, only yank).

New flow on tag push (e.g. `git push origin v0.3.1`):

  build
    └─ publish-testpypi
         └─ smoke-test-testpypi    ← gate: must pass before PyPI
              └─ publish-pypi
                   ├─ verify tag is on main (rejects rogue tags from
                   │  other branches — security guard)
                   ├─ create GitHub Release with notes
                   └─ smoke-test-pypi (final sanity check)

Renamed dispatch input `test_pypi` → `publish_to_prod` (default false).
Default behavior of manual dispatch is now "TestPyPI dry-run only" —
safer and matches the weekly cron canary. To force a production publish
without a tag (disaster recovery), use `publish_to_prod=true`.

Tag push always reaches publish-pypi (intent is unambiguous), but only
through the TestPyPI gate. The new "tag must be on main" check prevents
publishing tags pushed from arbitrary branches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RELEASING.md and ci-setup.md were written before #36/#37/#38 set up the
publish workflow. They still described a manual `gh release create`
flow with `git tag` doing nothing and "do not publish to PyPI" warnings.
Reality is now the opposite: tag push automates everything end-to-end.

Updated:
- TL;DR — drops the obsolete "bump pinned version in docs" step (README
  no longer pins a tag) and the manual `gh release create` step (the
  workflow does it).
- Step 2 (Tag and push) — explains what publish.yml does after the tag
  push, instead of "no workflow today".
- Step 3 (Verify the release) — replaces the old "Create the GitHub
  Release" step with post-publish verification (PyPI page, fresh install,
  GH Release exists).
- New "How publish.yml works" section — full pipeline diagram, trigger
  matrix, TestPyPI dry-run instructions, recovery table for the
  scenarios we hit during PR #38 verification (broken transitive wheels,
  smoke test flakes, tag-not-on-main rejection, file-already-exists).
- Pre-flight checklist gains an optional TestPyPI dry-run step for
  non-trivial releases.
- Hotfixing — adapted to the "tag must be on main" guard. Forward-port
  to main first, then cherry-pick onto release branch and tag.
- "What NOT to do" — drops the obsolete "do not publish to PyPI" item;
  adds new traps (do not create GH Release manually, do not use
  publish_to_prod=true for routine verification).
- "Future work" — drops PyPI publishing item (done). Adds a forward-
  looking item about exercising `nasde run` in smoke tests.

ci-setup.md gets a fourth row in the workflow table (Publish) and a
short pointer to RELEASING.md for details, so people landing there
first don't miss the publish flow exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@szjanikowski szjanikowski changed the title chore(deps): bump harbor 0.4 → 0.6, refresh opik to 2.0.22 chore(release): operationalize PyPI publishing — bump harbor, smoke gates, docs May 7, 2026
@szjanikowski szjanikowski merged commit c88e24b into main May 7, 2026
6 checks passed
@szjanikowski szjanikowski deleted the chore/bump-harbor-0.6 branch May 8, 2026 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant