Skip to content

Latest commit

 

History

History
344 lines (236 loc) · 14 KB

File metadata and controls

344 lines (236 loc) · 14 KB

Megaplan Cloud

megaplan cloud keeps cloud orchestration thin. Core megaplan owns plan, auto, and chain behavior; cloud subcommands stage files, pick a transport, and run the core commands remotely.

Examples below use megaplan ...; in this repo run ./.venv/bin/python -m megaplan .... Add --cloud-yaml /path/to/cloud.yaml when cloud.yaml is not at the project root.

Providers

Provider Use case Notes
railway Hosted runner with Railway SSH/logs/volume primitives Good default for shared remote runs.
local Fast local iteration and CI-friendly smoke tests Uses docker compose from a persistent deploy dir under ~/.megaplan/cloud/<compose_project>/.
ssh Any reachable Docker host over SSH Syncs the deploy dir to ssh.remote_dir with rsync, or scp -r when rsync is unavailable.

provider: fly remains reserved for a future release.

Quick Start

  1. Scaffold:
megaplan cloud init
  1. Edit cloud.yaml for repo, provider, mode, secrets, and optional toolchains.

  2. Export the local secrets named under secrets::

export OPENAI_API_KEY=...
export GITHUB_TOKEN=...        # optional, but recommended for push/private clone
export ANTHROPIC_API_KEY=...   # optional
  1. Build and deploy:
megaplan cloud build
megaplan cloud deploy
  1. Start work remotely:
megaplan cloud bootstrap ideas/tiny-plan.txt
megaplan cloud chain chain.yaml --idea-dir ideas
  1. Inspect and connect:
megaplan cloud status
megaplan cloud status --chain
megaplan cloud logs
megaplan cloud attach

cloud.yaml Reference

Top-level fields

Field Required Default Meaning
provider no railway One of railway, local, or ssh.
mode no idle Runner mode: auto, chain, or idle.
secrets no [] Local env var names uploaded during megaplan cloud deploy and redacted from cloud log output where possible.
toolchains no [] Extra language toolchains layered into the image. Use aliases rust, go, java, or {name, install} mappings.

repo

Field Required Default Meaning
repo.url yes none Git URL cloned into the remote workspace.
repo.branch no main Branch checked out on clone.
repo.workspace no /workspace/app Absolute repo path used for remote cd, tmux, file uploads, and wrapper commands.

agents

Field Required Default Meaning
agents.default no codex Default megaplan agent for routed steps.
agents.<step> no inherits default Optional per-step override such as plan, review, execute, or loop_execute.

codex

Field Required Default Meaning
codex.model no gpt-5.4 Written into /root/.codex/config.toml on boot.
codex.reasoning no high Reasoning level written into /root/.codex/config.toml on boot.

auto

Used only when mode: auto.

Field Required in auto mode Default Meaning
auto.plan_name yes none Remote plan name for boot-time megaplan auto --plan ....
auto.idea_file yes none Absolute remote path to the idea file already staged on the workspace volume.
auto.robustness no standard Robustness for the boot-time init fallback.

chain

Used only when mode: chain.

Field Required in chain mode Default Meaning
chain.spec yes none Absolute remote path to the already-staged chain spec.

megaplan

Field Required Default Meaning
megaplan.ref no main Branch, tag, or SHA installed on boot via pip install --upgrade git+...@<ref>.

resources

Field Required Default Meaning
resources.volume no none Provider-specific persistent volume name. destroy deletes it only when set.
resources.port no 8080 Health server port exposed by the container.

railway

Field Required Default Meaning
railway.service no agent Railway service name used by deploy, logs, and down.
railway.session no agent Railway SSH session name used for interactive attaches.
railway.project no unset Optional project passed to Railway commands.
railway.environment no unset Optional environment passed to Railway commands.

local

Field Required when provider: local Default Meaning
local.compose_project no megaplan-cloud Docker Compose project name used for build, logs, exec, and teardown.
local.workdir no workspace Bind-mounted directory inside the persistent local deploy dir.

ssh

Field Required when provider: ssh Default Meaning
ssh.host yes none Remote SSH host.
ssh.user no unset Optional SSH username.
ssh.port no 22 SSH port.
ssh.identity_file no unset Optional identity file passed to ssh, scp, and rsync.
ssh.remote_dir no /tmp/megaplan-cloud Remote directory used for synced Docker build context and .env.
ssh.container no megaplan-cloud-agent Remote container name and image tag.

Toolchains

Without toolchains:, the image is Python/Node only. Add built-in aliases or a custom install snippet:

toolchains:
  - rust
  - go
  - name: custom
    install: |
      RUN curl -fsSL https://example.com/tool/install.sh | bash

Wrapper Workflows

megaplan cloud bootstrap <idea-file>

cloud bootstrap uploads a local idea file to <repo.workspace>/idea.txt, then runs:

megaplan init --project-dir <workspace> --idea-file <workspace>/idea.txt --auto-start --robustness <level>

--plan-name is optional. If omitted, cloud does not pass --name; core megaplan chooses the default slug from the idea text.

megaplan cloud chain <spec> [--idea-dir <dir>]

cloud chain is the preferred path for remote chain runs. It:

  1. Parses the local chain spec with core megaplan.chain.load_spec(...).
  2. Resolves each milestone idea file from --idea-dir or, by default, the local spec's parent directory.
  3. Uploads each idea file to the remote path named in the chain spec.
  4. Uploads the chain spec to <repo.workspace>/chain.yaml.
  5. Starts remote megaplan chain start --spec <repo.workspace>/chain.yaml in tmux session megaplan-chain, logging to <repo.workspace>/.megaplan/cloud-chain.log.

After upload + dispatch, cloud writes a provider-independent marker:

~/.megaplan/cloud/markers/<sha256(abs_path_of_cloud.yaml)[:16]>/last_chain.json

That marker survives Railway's ephemeral deploy dir and is used by cloud status --chain.

megaplan cloud status --chain

cloud status --chain fetches remote chain_state.json, then reuses core chain status formatting. Remote spec resolution precedence is:

  1. --remote-spec <path>
  2. ~/.megaplan/cloud/markers/<sha>/last_chain.json
  3. spec.chain.spec from cloud.yaml when mode: chain
  4. Otherwise missing_remote_spec

The command prints the structured payload on stdout and the same human-readable chain summary block that local megaplan chain status --spec ... prints on stderr.

megaplan cloud status

Without --chain, cloud status still runs remote megaplan status and prints that JSON payload unchanged.

megaplan cloud supervise --chain

cloud supervise --chain runs a one-shot supervisor tick against the remote chain. It observes the chain, refreshes branch/PR sync state, and makes safe progress decisions. It never invents approvals, bypasses quality gates, or runs destructive git operations.

One-shot tick behavior

Each invocation is a single observation + decision cycle:

  1. Read remote chain status via the same path as cloud status --chain.
  2. Refresh branch/PR sync by running _capture_sync_state remotely.
  3. Re-read chain status after the refresh.
  4. Map the refreshed effective_status to a safe action.
  5. Execute at most one safe mutation (tmux restart, one-shot chain tick).
  6. Emit a structured JSON report on stdout and a human-readable summary on stderr.

JSON stdout

The tick report on stdout includes these fields:

Field Type Meaning
success bool Whether the tick completed without error.
event string Event label: supervisor_tick, supervisor_blocked, supervisor_advanced, supervisor_restarted, or supervisor_error.
spec string Resolved remote chain spec path.
effective_status string Classified chain status after sync refresh.
next_action string Decision: noop, done, blocked, advance, restart, or none.
acted bool Whether the supervisor executed a mutation this tick.
refused_reason string|null Human-readable explanation when the supervisor declined to act.
runner object Runner liveness and session info.
sync object Branch/PR sync state fields.
pr object PR number, state, and head.
logs object Remote log paths and best-effort mtime/size.

Stderr summary

A single line is written to stderr:

supervisor tick: <event> | acted=<bool> | next_action=<action> [| refused_reason=<reason>]

Remote spec precedence

Same resolution order as cloud status --chain:

  1. --remote-spec <path>
  2. ~/.megaplan/cloud/markers/<sha>/last_chain.json
  3. spec.chain.spec from cloud.yaml when mode: chain
  4. Otherwise missing_remote_spec

Canonical session

The supervisor targets the same megaplan-chain tmux session used by cloud chain. All mutations use the canonical MEGAPLAN_TRUSTED_CONTAINER=1 megaplan chain start --spec <path> --one command, appending to <workspace>/.megaplan/cloud-chain.log.

Safe actions

The supervisor will only perform these mutations:

  • Restart a dead runner: When effective_status is stale_bookkeeping and the megaplan-chain tmux session is dead or missing, the supervisor kills any stale session and starts a fresh one-shot tick.
  • Advance past a merged PR: When effective_status is awaiting_pr_merge and the PR has been merged (confirmed via gh pr view --json state), the supervisor advances the chain with a one-shot tick.

Refusal cases (no mutation)

The supervisor refuses to act and returns acted: false with a refused_reason for:

effective_status Behavior
running Chain is running; nothing to do.
complete All milestones processed; chain is done.
human_prerequisite Prerequisite policy is required and unmet; requires human operator resolution via megaplan user-action resolve or megaplan chain override.
quality_gate Validation policy is required and quality gate is failing; requires human operator resolution.
awaiting_pr_merge (PR unmerged) PR is still open; supervisor will not advance until merged.
stale_bookkeeping (runner alive) Bookkeeping is stale but runner is alive; supervisor will not force-restart a live runner.
Provider lacks ssh_exec Cannot probe or mutate the remote runner.

Not a destructive repair tool

The supervisor is not a destructive repair tool and does not replace:

  • Human approval of prerequisites — use megaplan user-action resolve or megaplan chain override.
  • PR review — the supervisor only advances when the PR is already merged.
  • Quality-gate resolution — failing gates must be resolved by a human operator.

The supervisor never produces force-push, reset, branch-deletion, or any other destructive git commands. Its only mutations are tmux session management and chain start --one.

Boot-Time Runner Modes

mode: auto and mode: chain still control what the long-running remote agent session launches on boot. Those boot paths expect the referenced remote files to already exist on the workspace volume.

Use cloud bootstrap and cloud chain when you want cloud to stage the local input files for you. If you set mode: auto or mode: chain directly in cloud.yaml, make sure the referenced remote files already exist before restart.

Logs, Redaction, And Attach

megaplan cloud logs redacts:

  • literal values for secret names listed under secrets: when those values are present locally
  • NAME=value and NAME: value patterns for those secret names
  • known token shapes such as sk-..., ghp_..., and xoxb-...

This redaction applies to:

  • megaplan cloud logs
  • megaplan cloud exec
  • wrapper-dispatched output from cloud bootstrap, cloud chain, and cloud resume

megaplan cloud attach is different. It opens a raw interactive PTY, so line-buffered redaction is not applied there. Treat attach sessions as trusted terminals.

Provider Notes

provider: local

  • Uses docker compose -p <compose_project> -f ~/.megaplan/cloud/<compose_project>/docker-compose.yaml ...
  • Materializes a persistent deploy dir under ~/.megaplan/cloud/<compose_project>/
  • Bind-mounts ./<local.workdir> into <repo.workspace>
  • Best for local iteration and CI smoke tests

provider: ssh

  • Uses plain ssh for exec/logs/attach/status
  • Syncs the materialized deploy dir to ssh.remote_dir
  • Prefers rsync; falls back to scp -r with a warning when rsync is unavailable
  • Runs a single long-lived Docker container named ssh.container

provider: railway

  • Uses Railway SSH/logs/down/volume primitives
  • Markers are stored outside the Railway deploy dir so chain status survives redeploys
  • --session remains Railway-only

Runtime Requirements

Migration From reigh-megaplan-dev

Use the human-executed runbook at docs/cloud-migration-from-reigh.md. The important rule is: write MIGRATED.md first, then remove siblings while preserving that pointer file.