chore(release): operationalize PyPI publishing — bump harbor, smoke gates, docs#38
Merged
Conversation
Harbor 0.6.x has been out for a week with adapter fixes, Modal/Windows support, and gemini-cli v0.40+ session improvements. No breaking changes for our usage — Job.create() / await job.run() pattern is identical to 0.4. Bumped pin to >=0.6,<0.7. Side benefit: this resolves a transitive dep nightmare. supabase 2.28.3 (pulled in via harbor 0.4) shipped a broken wheel without the `supabase_auth._async.storage` module, breaking `nasde --version` for anyone installing from PyPI in a fresh environment. supabase 2.29.0 (via harbor 0.6.4) ships a healthy wheel. opik refreshed to 2.0.22 within the existing >=2,<3 range — minor bugfix release from today. Verified locally: - 147 tests pass - ruff, mypy, format clean - `nasde --version` and `nasde --help` work in installed venv - supabase-auth 2.29.0 wheel exposes _async.storage correctly Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PyPI/TestPyPI enforce no-file-reuse: a wheel uploaded once cannot be overwritten. With hatch-vcs producing the same dev version (e.g. '0.3.1.dev2') for any commit at the same git distance from the last tag, two consecutive workflow_dispatch runs from different branches produce wheels with identical filenames. The second run was failing with HTTP 400 'File already exists'. skip-existing: true makes the TestPyPI upload no-op when the file is already present, so re-running the dispatch from a feature branch is harmless. Production PyPI publish still rejects re-uploads (intended — we never want a silent overwrite there). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbor 0.6 renamed JobStats top-level counters: - n_trials → n_completed_trials - n_errors → n_errored_trials Caught by an e2e smoke test (claude-vanilla on python-gilded-rose): the trial completed successfully and Opik scores uploaded, but _print_job_summary crashed with AttributeError on result.stats.n_trials. Per-eval stats (AgentDatasetStats.n_trials / .n_errors) still use the old names in 0.6 — those references are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two upstream supabase-py releases ship wheels missing the _async/ and _sync/ submodules (sdists are fine, but pip/uv prefer wheels): - supabase 2.28.3 (released 2026-03-20) - supabase 3.0.0a1 (released today, 2026-05-07, pre-release) Both cause `nasde --version` to fail at import with "ModuleNotFoundError: No module named 'supabase_auth._async'" — caught during clean-install verification of nasde-toolkit==0.3.1.dev4 from TestPyPI. The dev environment doesn't see this because uv.lock pinned 2.29.0 (a healthy build). Fresh installs by users have no such guard, and `--pre` (which we'll likely document for installing dev builds) propagates to all dependencies, pulling in 3.0.0a1. Constraint: keep within 2.x for now. Remove when supabase publishes a healthy 3.x wheel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After every TestPyPI publish (manual dispatch or scheduled cron) and every PyPI publish (tag push), spin up a clean venv, install the just- published nasde-toolkit, and run `nasde --version` + `nasde --help`. This is the gate that would have caught the supabase 2.28.3 broken- wheel issue before it reached users — exact scenario we hit during verification of #36. Mechanics: - build job exposes `version` as output so smoke jobs can pin to the exact dev/release version that was published. - smoke-test-testpypi runs after publish-testpypi; uses both PyPI and TestPyPI indices (TestPyPI for nasde-toolkit, PyPI for transitives that aren't on TestPyPI). - smoke-test-pypi runs after publish-pypi (only on tag push); uses default PyPI index. Failure of this job marks the workflow run as failed in GitHub Actions even though the publish itself succeeded — surfaces the issue immediately. - Weekly cron `0 9 * * 1` does a TestPyPI publish + smoke test, so an upstream wheel going bad between our releases (e.g. supabase 3.x alpha appearing) shows up within a week instead of at our next release. The `--refresh` flag forces uv to re-resolve from the index instead of using a stale cache, ensuring the test exercises what a brand-new user would see. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restructured publish.yml so PyPI publication is gated by a successful
fresh-install smoke test on TestPyPI. Previously, smoke-test-pypi ran
*after* publish-pypi — too late to prevent a broken release from
reaching users (PyPI doesn't allow file deletion, only yank).
New flow on tag push (e.g. `git push origin v0.3.1`):
build
└─ publish-testpypi
└─ smoke-test-testpypi ← gate: must pass before PyPI
└─ publish-pypi
├─ verify tag is on main (rejects rogue tags from
│ other branches — security guard)
├─ create GitHub Release with notes
└─ smoke-test-pypi (final sanity check)
Renamed dispatch input `test_pypi` → `publish_to_prod` (default false).
Default behavior of manual dispatch is now "TestPyPI dry-run only" —
safer and matches the weekly cron canary. To force a production publish
without a tag (disaster recovery), use `publish_to_prod=true`.
Tag push always reaches publish-pypi (intent is unambiguous), but only
through the TestPyPI gate. The new "tag must be on main" check prevents
publishing tags pushed from arbitrary branches.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RELEASING.md and ci-setup.md were written before #36/#37/#38 set up the publish workflow. They still described a manual `gh release create` flow with `git tag` doing nothing and "do not publish to PyPI" warnings. Reality is now the opposite: tag push automates everything end-to-end. Updated: - TL;DR — drops the obsolete "bump pinned version in docs" step (README no longer pins a tag) and the manual `gh release create` step (the workflow does it). - Step 2 (Tag and push) — explains what publish.yml does after the tag push, instead of "no workflow today". - Step 3 (Verify the release) — replaces the old "Create the GitHub Release" step with post-publish verification (PyPI page, fresh install, GH Release exists). - New "How publish.yml works" section — full pipeline diagram, trigger matrix, TestPyPI dry-run instructions, recovery table for the scenarios we hit during PR #38 verification (broken transitive wheels, smoke test flakes, tag-not-on-main rejection, file-already-exists). - Pre-flight checklist gains an optional TestPyPI dry-run step for non-trivial releases. - Hotfixing — adapted to the "tag must be on main" guard. Forward-port to main first, then cherry-pick onto release branch and tag. - "What NOT to do" — drops the obsolete "do not publish to PyPI" item; adds new traps (do not create GH Release manually, do not use publish_to_prod=true for routine verification). - "Future work" — drops PyPI publishing item (done). Adds a forward- looking item about exercising `nasde run` in smoke tests. ci-setup.md gets a fourth row in the workflow table (Publish) and a short pointer to RELEASING.md for details, so people landing there first don't miss the publish flow exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 7, 2026
Closed
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
What started as a routine
harbor 0.4 → 0.6bump uncovered a chain of issues in the just-merged PyPI publish pipeline (#36, #37) that were only visible when actually trying to install nasde-toolkit from a clean environment. This PR fixes them all and hardens the release flow so we don't repeat the experience.What's in this PR
chore(deps): bump harbor 0.4 → 0.6, refresh opik to 2.0.22supabase 2.29.0instead of broken2.28.3(see below).ci(publish): make TestPyPI uploads idempotent (skip-existing)workflow_dispatchruns producing the samedevNhit400 File already exists.skip-existing: truemakes re-runs harmless.fix(runner): adapt to JobStats field rename in harbor 0.6n_trials → n_completed_trialsandn_errors → n_errored_trialson top-levelJobStats. Our_print_job_summarycrashed after a successful trial. Caught only by an end-to-end smoke test (claude-vanilla on python-gilded-rose with--with-opik).fix(deps): pin supabase>=2.29.0,<3 to avoid broken transitive wheelssupabase 2.28.3and3.0.0a1) ship wheels missing the_async/and_sync/subpackages, which causesnasde --versionto fail at import. Constraint pins us within healthy 2.x.ci(publish): add fresh-install smoke tests + weekly canarynasde --version+--help. Weekly Monday cron catches transitive-dep breakage between releases.ci(publish): linear release flow with TestPyPI gate before PyPIpublish-pypiis gated by a successful smoke test on TestPyPI. Previously the PyPI smoke ran after publish — too late to prevent a broken release. Renamed dispatch inputtest_pypi → publish_to_prod(defaultfalse). Added "tag must be on main" guard on PyPI publish.docs(releasing): rewrite for automated publish.yml flowgh release createflow with "do not publish to PyPI" warnings. Rewritten for the automated flow with a recovery table for the failure modes hit during this PR's verification.The release flow after this PR
Manual TestPyPI dry-run from any branch (no tag needed):
Disaster recovery (PyPI publish without a tag):
Weekly canary on Monday 09:00 UTC re-runs the TestPyPI flow — catches transitive-dep breakage between releases.
Test plan
uv run pytest— 147 passed (24 intest_update_check.pyfrom feat: PyPI publishing + in-CLI update notifier #36)uv run ruff check/ruff format --check/mypy— all cleanuv run nasde --versionand CI smoke variants work locallyuv buildproduces wheel + sdist with correct version (0.3.1.devN)nasde run --variant claude-vanilla --tasks python-gilded-rose-polymorphism --with-opik -C examples/refactoring-skill) completed successfully: trial ran, assessment eval scored 71/100, 7 feedback scores uploaded to Opik0.3.1.dev5resolves correctly withharbor 0.6.5,opik 2.0.23,supabase 2.30.0; bothnasde --versionandnasde --helpworkpublish_to_prod=false(default) correctly skipspublish-pypiandsmoke-test-pypiv0.3.1) to verify the full tag-push → PyPI flow end-to-endRelated
local_schemefix).uv tool install nasde-toolkitinstead of git-tag pinning) remains intentionally out of scope — it lands in PR2 after the first real PyPI release exists, to avoid pointing users at a name that 404s.