Tracked work items, grouped by area. Each entry should be specific enough to start without re-research.
- Split files over 275 lines back under the original limit. The
multi-node rollout pushed 12 files over 275 lines. CI limit was
temporarily relaxed to 420 to unblock v0.2.0-rc.2; needs to come
back down. Offenders:
crates/orca-proxy/src/lib.rs(406) — moveRouteTargetto its own modulecrates/orca-agent/src/docker/runtime.rs(384) — split outLocalRoute+registry_credentialshelperscrates/orca-tui/src/state.rs(370) — extractMetricHistory+parse_human_bytesinto ametricssubmodulecrates/orca-agent/src/grpc/client.rs(347) — split heartbeat loop and re-register logiccrates/orca-control/src/reconciler.rs(338) — move remote placement + placeholder instance into a separate filecrates/orca-control/src/lib.rs(314),webhook.rs(311),health.rs(305),api/handlers/ops.rs,ui/nodes.rs,ui.rs,handlers/server.rs— each has one module-sized chunk that can be moved cleanly.
- Secure websocket log stream from joined nodes to master. Today
the master can only serve logs for its locally-managed containers.
For remote-scheduled services the TUI shows a placeholder message.
Implementation:
- Each joined node runs a small agent HTTP/WS listener on 6881.
- Authentication via the cluster token (same as heartbeat).
- Endpoints:
GET /api/v1/logs/<container>?tail=&follow=returns a chunked text body;WS /api/v1/exec/<container>for interactive shell with line-discipline forwarding. - The master's
logshandler proxies to the target node's listener using the stored agent address (RegisteredNode.address). - The TUI's
client.logs(...)continues to hit/api/v1/services/{name}/logson the master — all remote detail is hidden.
orca execand TUI:shshould ride the same WS exec channel. Ratatui suspends to run an interactive pty on the socket.- Hard rule: the master must NOT ssh into joined nodes — agent-to- master communication is strictly HTTP/WS with the cluster token so the trust boundary is a single shared secret.
- Networks view showing the full routing graph for the cluster, in
order of external-to-internal depth:
- Public edge — the domains served by each node's proxy, grouped by node. Each row includes the A record target IP so a mismatch (e.g. DNS pointing to the wrong box) jumps out visually.
- Docker networks — one block per
orca-<network>bridge, listing the services attached and their aliases. Cross-network container links (a service withinternal = trueplus aliases referenced from another network) should be drawn as connecting edges. - Inter-node links — if a service on node A calls a service on node B by public domain, draw that as a dashed edge so it's obvious traffic is hair-pinning through the edge proxy.
- Backend:
GET /api/v1/cluster/networksthat returns, per node, the docker networks, their attached services+aliases, and the set of route-table entries (domain → service). The TUI renders this as an ASCII graph using ratatui's canvas widget. - Useful for debugging the kind of issue we hit today where
compliance-dashboardcouldn't resolvecompliance-agentbecause the alias was missing — a networks tab would have shown the orca-certifai bridge with only one name in the alias list.
- First-class
dev/stage/prodenvironments per project. Today every service is single-environment. We need:- Default
dev. Existing service.toml definitions stay as-is and are implicitly the dev environment of their project. - Per-environment image tags. A service can pin different tags per
environment (e.g.
:latestin dev,:sha-...in stage,:v1.2.0in prod). Theimagefield becomes a map keyed by environment, or a sibling[image.<env>]block. - Per-environment secrets and domains.
${secrets.X}resolves to the env-scoped secret first (e.g.prod.LITELLM_API_KEY) then falls back to the unscoped one. Domains can be templated (auth-{env}.meghsakha.com) or fully overridden per env. orca env promote <project> <from> <to>CLI. Copies the entire service definition from the source env to the destination env, then runs an interactive checklist that the operator must walk through before the new env is activated:- Required secrets exist in the destination env (lists missing).
- Domains resolve and TLS certs can be issued for them.
- External dependencies (databases, registries) are reachable.
- Image tags are present in the registry.
- Resource quota for the destination env is sufficient.
orca env list <project>andorca env diff <project> <a> <b>— show what's deployed where and what would change on promotion.- TUI environment switcher. A top-level pill/tab in the services view that filters by environment, with a "Promote..." action that walks the same checklist interactively.
- State storage. Environment lives in cluster.db / service.toml as
a first-class field on
ServiceConfig.
- Default
- Replace bind-mount workaround for joined-node config files. Today
config files mounted into containers (
librechat.yaml,logo.svg,settings.yml, etc.) live on a single host. On a joined node the service.toml's mount path won't exist. Either ship config files to the agent before deploy or move to ConfigMap-style API objects. orca volume copy <src> <dst>CLI command. Currently we shell out todocker run --rm -v src:/s -v dst:/d alpine tar.... Wrap that in a first-class subcommand so migrations don't need raw docker.- Single-binary install. PATH conflict between
/usr/local/bin/orca(system) and~/.local/bin/orca(user) caused state loss this session.orca updateshould know which path it's installed at and replace in-place;orca installshould default to/usr/local/bin/orca. - setcap survives binary updates.
mvacross filesystems creates a new inode and clearscap_net_bind_service. Either: (a)orca updaterunssetcapafter replacing the binary, OR (b) ship a systemd unit withAmbientCapabilities=CAP_NET_BIND_SERVICE. - Hot reload of cluster.toml. Backup config, ACME email, and other
cluster-level settings only load at startup. Watch the file (or
SIGHUP) to apply without
orca shutdown && orca server -d. - Reconciler: detect spec changes beyond
same_image. Today the skip-path only re-deploys when image/module/env/cmd change.extra_ports,mounts,volume,domain, andaliasesshould also trigger a recreate. orca redeploy <service>CLI subcommand. Today the only way to force a fresh image pull + recreate is via the webhook endpoint.orca deployshould resolveservices/upward. Errors with "services.toml not found" if invoked from the wrong cwd. Walk up to findcluster.tomllike git finds.git.- Manifest of mounted files in service.toml gets pushed to remote agent on deploy. Right now bind-mount paths must already exist on the target node — fine for the master, broken for joined nodes.
- Per-service
pre_hookactually runs.ServiceBackupConfigdefinespre_hook(e.g.pg_dump) but the scheduler doesn't invoke it yet. orca backup allshould support an--excludefilter. Not every volume needs to roll up to S3 (e.g. cache/temp).- Restore from S3.
s3_backend::restoreis unimplemented; the CLI prints "S3 restore not yet supported."
-
Single-project view filter. Let user scope the TUI to one project at a time instead of a flat list of all services.
-
Remember last-opened project. On TUI launch, reopen whichever project was active last session. If none (first run or project deleted), start with no project selected.
-
Backups per node. Show backup status grouped by node — last run time, volume count, total size, any failures. Needs
GET /api/v1/cluster/backupsaggregating results from all nodes. -
Webhook management. View, add, edit, and delete webhooks from the TUI. Show last trigger time, status, and matched repo/branch. Today webhooks can only be managed via curl to the REST API.
-
Backup dashboard. Per-node backup status: last run, volume count, total size, failures, retention. Trigger manual backup. View/restore individual volume snapshots. Needs
GET /api/v1/cluster/backups. -
Secrets organizer. Group secrets by project, show which services reference each secret, add/edit/delete from TUI. Today secrets are a flat list managed via
orca secrets set. -
AI chat interface as TUI landing page. Open the TUI to an
orca ask-style chat pane by default. The user types questions, the AI responds with cluster context (services, health, stats). Previous conversations persist in the session. Services/logs/etc are secondary tabs. This makes the TUI the primary ops interface. -
Log viewer. Stream logs from any service (local or remote) in a TUI pane. Depends on WS log streaming (#12).
-
Alert delivery: Slack, webhook, email (#24). Config exists but delivery is unimplemented. Conversational alerts only visible via
orca alerts listtoday.
orca askshould work locally without a running server. Today it sends the question to the API which reads AI config from the running server's cluster.toml. Should fall back to reading cluster.toml from CWD or~/.orca/and calling the LLM directly from the CLI.- Resolve
cluster.tomlandservices/upward. Today all CLI commands only work from the orca working directory. Should walk up to findcluster.tomllike git finds.git/, or fall back to~/.orca/cluster.tomlfor global config. - Wire up
orca logs --summarize. Currently prints a stub. Should fetch logs and send to the AI backend for analysis. (#23) - Secrets resolution in
cluster.toml. Today${secrets.X}only works in service.toml env vars. AI api_key and other cluster config values should also resolve from the secrets store. - Multi-argument commands.
orca redeploy,orca stop,orca deploy, andorca logsshould accept multiple service names in a single invocation, e.g.orca redeploy api web worker. Today each command takes only one service name, requiring separate calls. - Shell auto-completion. Generate completions for bash/zsh/fish via
orca completions bash > /etc/bash_completion.d/orca. Use clap's built-inclap_completecrate. Should complete subcommands, flags, and dynamically complete service names by querying the API. orca redeploymust route to the correct node. Todayredeployruns on master and tries to create the container locally even when the service is placed on a remote agent. Should check placement and dispatch via WS/heartbeat to the target node.
- Tags on nodes, services, and projects. Free-form key-value labels
(e.g.
env=prod,team=compliance,tier=frontend) on nodes, services, and projects. Stored in cluster.db, surfaced in TUI andorca status. Enables filtering, batch ops, and placement affinity beyond the currentplacement.nodefield. - Project-level environment variables (secrets). Today secrets are
global. Add per-project scoping so
${secrets.X}resolves project scope first, then falls back to global. CLI:orca secrets set --project <name> KEY VALUE. Stored alongside global secrets with a project prefix in the secrets store.