refactor: use aleph CLI for operations that were ad-hoc by odesenfans · Pull Request #19 · aleph-im/aleph-testnets

odesenfans · 2026-05-28T14:48:09Z

The v0.10.3 CLI now subsumes plumbing that we'd previously hand-rolled. This PR replaces it where the CLI gives the same answer in fewer moving parts.

Migrations

Before	After
`scripts/crn-up.sh`: 3× `curl /api/v0/aggregates/<addr>.json + jq` to count CCNs by owner / find a CRN hash by URL	`aleph node list --type {ccn,crn} --address \| --all --corechannel-address X --json`
`scripts/fund-test-accounts.sh`: 2× `curl /api/v0/addresses/X/balance + jq .credit_balance`	`aleph account balance X --json` (`.credits`)
`conftest.ccn_aggregates`: urllib + manual `data.X.Y` parsing	`aleph aggregate get <key> --address X --json`
`conftest.crn_nodes`: scrape corechannel.resource_nodes from the aggregate dict	`aleph node list --type crn --all --corechannel-address X --json`
`test_instances`/`test_migration`: poll scheduler-api `/api/v0/allocation/{hash}`, then poll CRN `/v2/about/executions/list` to find `mapped_ports.22.host`, then ssh subprocess with `-o StrictHostKeyChecking=no -p <port> -i <key>`	One `aleph instance show <hash> --verbose --json` (rolls up CCN message + scheduler placement + CRN networking + mapped ports), then `aleph instance ssh <hash> --identity <key> -- <cmd>` for SSH

Infrastructure in `conftest.py`

Two new session-scoped fixtures support the instance-test migration:

aleph_cli_config — creates a tmp XDG_CONFIG_HOME and registers a testnet network whose scheduler-url points at $ALEPH_TESTNET_SCHEDULER_API_URL. Required because aleph instance show/ssh resolve the scheduler from the current network's config; without isolation we'd either hit the CLI's builtin mainnet scheduler or mutate the user's ~/.config/aleph/config.toml.
_ssh_home — minimal $HOME with a permissive ~/.ssh/config (StrictHostKeyChecking no, ephemeral known_hosts). The CLI doesn't expose ssh -o passthrough, and aleph instance ssh runs ssh inheriting parent env, so this is the cleanest knob. $HOME is overridden only for instance ssh invocations.

The aleph_cli fixture also now returns None (instead of raising JSONDecodeError) when parse_json=True meets empty stdout — that's the documented contract for aleph aggregate get on a missing key, and the new ccn_aggregates relies on it to keep "returns None on miss" semantics.

Caveat

aleph instance ssh connects to the VM's IPv6 (the CRN's /124 pool). The previous code went over the CRN's IPv4 + a host-side mapped port. If the CI runner can't reach the CRN droplet's IPv6 we'll see it on the first run; falling back to keeping just the manual ssh subprocess for that one call is straightforward.

Net diff

194 insertions / 263 deletions (–69 net). test_migration.py shrinks ~45%. The test bodies now read like "show this instance / ssh into it" rather than URL+field plumbing.

Verified offline

aleph aggregate get on missing key → exit 0, empty stdout (matches fixture semantics).
aleph node list --all --type crn --json against mainnet → array of {type, hash, name, owner, address, ...} — keys match what crn_nodes / _find_crn_hash-replacement read.
aleph node list --address X --type ccn ... works without ALEPH_PRIVATE_KEY in env (no --chain required; the script's curl-replaced calls had no signing env).
bash -n / py_compile pass.

Replaces hand-rolled CCN/scheduler/CRN plumbing with the v0.10.3 CLI where it now subsumes the same work end-to-end. - crn-up.sh: 3 corechannel `curl + jq` queries → `aleph node list` (filter by owner / type / corechannel-address). - fund-test-accounts.sh: 2 balance `curl + jq` queries → `aleph account balance --json` (.credits). - conftest: `ccn_aggregates` and `crn_nodes` fixtures now run through `aleph aggregate get` and `aleph node list` instead of urllib + manual aggregate parsing. - test_instances / test_migration: drop the scheduler-api allocation poll and the CRN `/about/executions/list` poll in favor of a single `aleph instance show --verbose --json` (placement.allocated_node for CRN identity, mapped_ports for SSH readiness). The marker-write/read and "echo hello" SSH go through `aleph instance ssh` (IPv6). To make `instance show`/`instance ssh` reach the local scheduler, a new session-scoped `aleph_cli_config` fixture seeds an isolated CLI config (XDG_CONFIG_HOME) with a `testnet` network whose scheduler URL points at $ALEPH_TESTNET_SCHEDULER_API_URL. A `_ssh_home` fixture provides a permissive `~/.ssh/config` ($HOME override scoped to `instance ssh` calls) because the CLI does not expose `ssh -o` passthrough. Net: -69 lines in the tests/scripts touched (test_migration.py shrinks by ~45%) and the test bodies now express intent ("show this instance") rather than URL/field plumbing. Caveat: `aleph instance ssh` connects to the VM's IPv6, not the IPv4+port-forward path used previously. If the CI runner can't reach the CRN's IPv6 pool we'll see it on the first run.

foxpatch-aleph

Clean refactor replacing ad-hoc HTTP/curl calls with the aleph CLI across shell scripts and pytest fixtures. The new aleph_cli_config and _ssh_home fixtures correctly isolate CLI state and SSH config for test environments. Test bodies shrink ~45% and read much more declaratively. The only note is the duplicated _poll() helper in both test files, which is pre-existing and out of scope for this PR. The IPv6 caveat for SSH connections is honestly documented. No correctness or security issues found.

tests/test_instances.py (line 6): The _poll function here is identical to the one in test_migration.py:22. Consider extracting to a shared helper (e.g. tests/conftest.py) in a follow-up.

tests/test_migration.py (line 22): Same _poll function as in test_instances.py:6 — worth deduplicating in a follow-up.

Two CI-revealed bugs from the previous commit: 1. `ccn_aggregates` returned the wrong shape. `aleph aggregate get <key>` emits `{"<key>": <value>}` (matching the SDK's AggregateResponse.data layout); the urllib fixture had unwrapped that. Drop the wrapping in the new fixture too — `test_corechannel_aggregate_exists` and `test_create_node` both rely on `agg.get("nodes"/"resource_nodes")`. 2. `aleph instance ssh` connects over the VM's /124 IPv6, and the CI runner has no route to the CRN droplet's IPv6 pool — both `test_instance_create_and_ssh` and `test_instance_migration` timed out at 60s with no SSH progress. Revert just the SSH transport to a manual ssh subprocess over the CRN's IPv4 + the host-side mapped port. Discovery still goes through the CLI: `aleph instance show --verbose` yields placement.allocated_node + mapped_ports["22"] in one call, and `aleph node list --type crn --all` resolves the allocated node hash to its registered URL. Drop the now-unused `_ssh_home` fixture and HOME override on `aleph_cli`.

foxpatch-aleph

Well-structured refactor that replaces ad-hoc HTTP+JSON-plumbing with the aleph CLI. The new conftest fixtures (aleph_cli_config, aleph_cli) correctly isolate the CLI configuration and prevent cross-test pollution. Test logic in both test_instances.py and test_migration.py is significantly simpler — instance show --verbose collapses what was previously 3-4 polling loops into one call, and SSH goes through the CRN IPv4+mapped-port path with the same reliability as before. The only notable code quality issue is the duplication of _resolve_crn_host across both test files.

tests/test_instances.py (line 26): This helper is identical to test_migration.py:43. Consider extracting into conftest.py as a shared utility (or making it a fixture) to avoid maintenance duplication.

The combined `instance show` poll has to wait for the scheduler to reallocate AND for the new CRN to boot the VM + map SSH. The original test split these into 300s + 180s + 60s polls; my single poll was set to 300s, which let migration time out in CI. Bump to 540s to match the original headroom.

foxpatch-aleph

Clean, well-structured refactoring. The mapped_ports format assumption needs verification against actual CLI output to avoid a runtime TypeError. Also address the missing running state check (minor regression), and deduplicate _resolve_crn_host/NODESTATUS_ADDR into conftest.py. The SSH auth failure is environment-related, not a PR regression.

tests/test_migration.py (line 113): mapped_ports format risk: the old code polled the raw CRN API which returned {"22": {"host": 12345}} — mapped.get("22") returned a dict, not a port number. If aleph instance show --verbose passes this through without flattening, int(data["mapped_ports"]["22"]) will raise TypeError. Verify the actual CLI output format and add handling for the nested case, or test against the real CLI.

tests/test_migration.py (line 96): Missing running state check: old _wait_for_vm_ssh_port required execution.get("running") before returning. The replacement only checks allocated_node + mapped_ports. If these get reported before the VM is booted, SSH connections arrive before sshd is ready. In practice _wait_for_ssh's polling masks this, but it's a minor behavioral regression.

tests/test_instances.py (line 26): Code duplication: _resolve_crn_host and NODESTATUS_ADDR are identical in both test files (test_instances.py:26-38, test_migration.py:43-53). Extract to conftest.py as a shared helper function.

Root cause of the migration timeout (found via CI scheduler-rs logs): the VM never migrated because the unlink was a silent no-op, so scheduler-rs kept seeing the CRN as linked ("Plan updated: 2 nodes", never re-planned) and never re-dispatched. Node operations embed the *current network's name* as the corechannel `--network-tag`. crn-up.sh links the CRNs with the CLI's builtin default network → tag `mainnet`. The aleph_cli_config fixture had run `network add testnet` + `network use testnet`, so every `node unlink` issued via aleph_cli was tagged `testnet` and no longer matched the `mainnet`-tagged link — the network accepted the POST (exit 0) but it referenced a non-existent link, so nothing changed downstream. This is why bumping the timeout didn't help: migration was never going to happen. Fix: set the scheduler URL on the builtin `mainnet` network in place (`network set --scheduler-url`) instead of creating/switching to a differently-named network. instance show/ssh still resolve the local scheduler, and node-op tags stay `mainnet`, matching crn-up.sh. Verified by dry-run: under the new fixture config, `node unlink` embeds tags ["unlink","mainnet"] (was ["unlink","testnet"]).

foxpatch-aleph

Solid refactor replacing ad-hoc HTTP/curl/urllib calls with the aleph CLI across shell scripts and Python tests. The key architectural win is the aleph_cli_config fixture that correctly isolates CLI config while preserving network-tag parity with crn-up.sh — fixing the silent migration breakage that was the root cause of CI timeouts. Code is clean, well-commented, and the net -69 line diff confirms genuine simplification.

tests/test_instances.py (line 81): mapped.get("22") only checks string keys, but the CRN API may return integer keys ({22: port}). The old code had mapped_ports.get("22") or mapped_ports.get(22). If the CLI doesn't normalize keys to strings, this poll will silently time out. Same issue at test_migration.py:114.

tests/test_migration.py (line 43): _resolve_crn_host is an exact duplicate of the function in test_instances.py:26. Consider extracting to conftest.py as a module-level helper or a small fixture.

scripts/fund-test-accounts.sh (line 103): Uses bare aleph (resolved from PATH) while crn-up.sh uses $aleph_cli resolved from $BIN_DIR/aleph. If aleph isn't on PATH in the CI runner's compose-exec context, this could fail. Consider using the same indirection pattern as crn-up.sh.

Real root cause of the migration failure (confirmed from CI scheduler-rs logs: it stayed at "2 nodes" and never re-dispatched): nodestatus only processes corechannel operations whose tag matches its FILTER_TAG, which was `mainnet`. The CRNs were therefore linked under the `mainnet` tag, and any unlink had to carry the same tag to take effect. Rather than keep masquerading as mainnet, make the whole testnet use the `testnet` tag consistently, and pin it explicitly so we can never fall back to mainnet by accident: - nodestatus: FILTER_TAG mainnet -> testnet (both the corechannel and balances services, deploy/docker-compose.yml). This is the setting that selects which network's corechannel ops nodestatus ingests (aleph_nodestatus Settings.filter_tag, env FILTER_TAG). - The CLI derives the corechannel --network-tag from the *current network's name*, so every invocation now passes `--network testnet` against a `testnet` network: - tests/conftest.py: aleph_cli_config registers a `testnet` network (scheduler URL -> local) and aleph_cli passes `--network testnet` on every call. - scripts/crn-up.sh: registers `testnet` and passes `--network testnet` on create-ccn / create-crn / link / node list. - scripts/fund-test-accounts.sh: passes `--network testnet` on the account-balance reads (raw --ccn, so the network is never contacted, but explicit for safety). The raw `--ccn <url>` always wins over the network's configured CCN (resolve_ccn_url early-returns on a URL), so pointing at the local CCN is unaffected. Balance/credit tests query by post type + sender, not by tag, so flipping the balances service is safe. Verified by dry-run: create-crn/link/unlink now embed the `testnet` tag under this config (were `mainnet`).

foxpatch-aleph

Clean refactor replacing ad-hoc HTTP/shell plumbing with aleph CLI calls. Reduces test_migration.py by ~45% while maintaining correctness. The corechannel network-tag invariant (testnet) is consistently applied across scripts, docker-compose, and test fixtures. Well-documented design decisions and clear reasoning throughout.

tests/conftest.py (line 54): This TESTNET_NETWORK constant is the linchpin tying together scripts, nodestatus FILTER_TAG, and test operations. Consider adding a comment noting that if this is ever changed, the matching FILTER_TAG in deploy/docker-compose.yml and --network in scripts/*.sh must also change — currently a silent no-op failure mode exists if they drift.

tests/test_instances.py (line 26): _resolve_crn_host and _poll are duplicated verbatim in test_migration.py. Consider extracting to conftest.py for a future cleanup — non-blocking.

tests/test_migration.py (line 43): _resolve_crn_host is duplicated from test_instances.py. Same suggestion as above.

odesenfans had a problem deploying to digitalocean May 28, 2026 14:48 — with GitHub Actions Failure

foxpatch-aleph approved these changes May 28, 2026

View reviewed changes

odesenfans had a problem deploying to digitalocean May 28, 2026 15:16 — with GitHub Actions Failure

foxpatch-aleph approved these changes May 28, 2026

View reviewed changes

odesenfans had a problem deploying to digitalocean May 28, 2026 15:48 — with GitHub Actions Failure

foxpatch-aleph reviewed May 28, 2026

View reviewed changes

odesenfans had a problem deploying to digitalocean May 28, 2026 21:06 — with GitHub Actions Error

foxpatch-aleph approved these changes May 28, 2026

View reviewed changes

odesenfans deployed to digitalocean May 28, 2026 21:33 — with GitHub Actions Active

foxpatch-aleph approved these changes May 28, 2026

View reviewed changes

odesenfans merged commit 18d9b39 into main May 28, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: use aleph CLI for operations that were ad-hoc#19

refactor: use aleph CLI for operations that were ad-hoc#19
odesenfans merged 5 commits into
mainfrom
od/refactor-cli-migrations

odesenfans commented May 28, 2026

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

foxpatch-aleph left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

odesenfans commented May 28, 2026

Migrations

Infrastructure in conftest.py

Caveat

Net diff

Verified offline

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

foxpatch-aleph left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Infrastructure in `conftest.py`