Skip to content

fix(cli): bound deploy watch with timeout and heartbeat (RND-569)#162

Open
mpjunior92 wants to merge 1 commit into
release/v1.0.0from
matheuspereirajunior/rnd-569-deploy-watch-timeout
Open

fix(cli): bound deploy watch with timeout and heartbeat (RND-569)#162
mpjunior92 wants to merge 1 commit into
release/v1.0.0from
matheuspereirajunior/rnd-569-deploy-watch-timeout

Conversation

@mpjunior92
Copy link
Copy Markdown
Contributor

@mpjunior92 mpjunior92 commented May 20, 2026

Summary

watchUntilRunning previously only logged on status transitions, so when the orchestrator silently kept the app in Unknown the user saw a single Status: Unknown (1s) line forever (visible especially over non-TTY stdout, where carriage-return overwrites are invisible). The loop also had no timeout, so the CLI would hang indefinitely.

  • Add a 30s heartbeat that re-emits the current status with elapsed time, plus a configurable timeout (default 600s, override via --watch-timeout flag or ECLOUD_WATCH_TIMEOUT_SECONDS env var; precedence: explicit option > env var > default).
  • Throw a typed WatchTimeoutError (carrying appId, elapsedSeconds, lastStatus, timeoutSeconds) on deadline.
  • CLI catches the timeout and prints a recovery hint pointing at ecloud compute app info <id> along with appId and txHash, then exits non-zero.

WatchTimeoutError and WATCH_DEFAULT_TIMEOUT_SECONDS are intentionally generic — RND-568 reuses the same helpers for the upgrade watcher (the two PRs share symbol shape so whichever lands first defines the helper and the other reuses it).

Resolves RND-569.

Test plan

  • pnpm --filter @layr-labs/ecloud-sdk run build:dev clean
  • pnpm --filter @layr-labs/ecloud-cli run build:dev clean
  • typecheck clean for the touched files (only pre-existing unrelated errors)
  • eslint clean on touched files
  • ecloud compute app deploy --help lists --watch-timeout with [env: ECLOUD_WATCH_TIMEOUT_SECONDS]
  • Live verification on sepolia-dev: existing stuck app 0xB198C8b046d66f98aE70d5D57d53dcBb1f4B8D53 (left in Unknown from a prior failed quota attempt) provided a fixture. With ECLOUD_WATCH_TIMEOUT_SECONDS=70:
    Status: Unknown (0s) → heartbeat Status: Unknown (35s) → heartbeat Status: Unknown (68s)WatchTimeoutError elapsed=74s lastStatus=Unknown. Heartbeat and timeout both confirmed.

watchUntilRunning previously only logged on status transitions, so when
the orchestrator silently kept the app in Unknown the user saw a single
"Status: Unknown (1s)" line forever (visible especially over non-TTY
stdout where carriage-return overwrites are invisible). The loop also
had no timeout, so the CLI would hang indefinitely.

Add a 30s heartbeat that re-emits the current status with elapsed time,
plus a configurable timeout (default 10 minutes, override via
ECLOUD_WATCH_TIMEOUT_SECONDS) that throws a typed WatchTimeoutError.
The CLI deploy command catches it and prints a hint pointing at
'ecloud compute app info <id>' before exiting non-zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant