Skip to content

fix(cli): bound upgrade watch with timeout and progress (RND-568)#161

Open
mpjunior92 wants to merge 1 commit into
release/v1.0.0from
matheuspereirajunior/rnd-568-upgrade-watch-timeout
Open

fix(cli): bound upgrade watch with timeout and progress (RND-568)#161
mpjunior92 wants to merge 1 commit into
release/v1.0.0from
matheuspereirajunior/rnd-568-upgrade-watch-timeout

Conversation

@mpjunior92
Copy link
Copy Markdown
Contributor

@mpjunior92 mpjunior92 commented May 20, 2026

Summary

ecloud compute app upgrade would hang indefinitely after submitting the on-chain transaction whenever the orchestrator was silent (15+ minutes observed in the wild) — there was no deadline, no progress between status transitions, and no recovery hint.

  • Bound watchUntilUpgradeComplete with a deadline (default 600s, override via --watch-timeout flag or ECLOUD_WATCH_TIMEOUT_SECONDS env var; precedence: explicit option > env var > default).
  • Log every status transition on its own line with elapsed seconds, mirroring watchUntilRunning.
  • On timeout, throw a typed WatchTimeoutError carrying appId, lastStatus, elapsedSeconds, timeoutSeconds.
  • CLI catches the timeout and prints a recovery hint pointing at ecloud compute app info <id> along with txHash and appId, then exits non-zero. Success path is unchanged.

WatchTimeoutError and WATCH_DEFAULT_TIMEOUT_SECONDS are intentionally generic — RND-569 reuses the same helpers for the deploy watcher.

Resolves RND-568.

Test plan

  • pnpm --filter @layr-labs/ecloud-sdk run build:dev clean
  • pnpm --filter @layr-labs/ecloud-cli run build:dev clean
  • typecheck clean for the touched files (only pre-existing unrelated test-globals errors)
  • eslint clean on touched files
  • ecloud compute app upgrade --help lists --watch-timeout with [env: ECLOUD_WATCH_TIMEOUT_SECONDS]
  • Live verification skipped (10-min wall-clock + dev allowlist gating); logic is exercised by build/typecheck and is mechanically simple (Date.now() >= deadline + instanceof catch). RND-569 includes a live timeout reproduction that exercises the same helper code paths.

The 'ecloud compute app upgrade' command would hang indefinitely after
submitting the on-chain transaction whenever the orchestrator was silent
(15+ minutes observed in the wild) — there was no deadline, no progress
between status transitions, and no recovery hint.

- Bound watchUntilUpgradeComplete with a deadline (default 10 minutes,
  overridable via ECLOUD_WATCH_TIMEOUT_SECONDS env var or an explicit
  timeoutSeconds option).
- Log every status transition on its own line with elapsed seconds,
  mirroring watchUntilRunning.
- On timeout, throw a typed WatchUpgradeTimeoutError carrying appId,
  lastStatus, elapsedSeconds, and timeoutSeconds.
- CLI catches the timeout, prints a recovery hint pointing the user to
  'ecloud compute app info <id>' along with txHash and appId, and exits
  non-zero. Success path is unchanged.
@mpjunior92 mpjunior92 self-assigned this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant