CambrianTech · joelteply · May 13, 2026 · May 13, 2026
diff --git a/docs/grid/AIRC-CONTINUUM-BRIDGE.md b/docs/grid/AIRC-CONTINUUM-BRIDGE.md
@@ -56,6 +56,13 @@ Heavy data should stay out of AIRC. Use AIRC for manifests, handles, room
 markers, artifact hashes, and job ids; use Continuum/Grid data paths for model
 weights, LoRA artifacts, voice/video, and high-volume streams.
 
+Secrets stay out of AIRC completely. API keys, HF tokens, SSH keys, cookies,
+provider credentials, and encrypted secret payloads are not bridge messages.
+AIRC can carry `secretRef` names, fingerprints, lease ids, request ids, PR SHAs,
+and acknowledgements so humans and agents can coordinate, but actual credential
+material must move only through the secret/capability command path described in
+[GRID-ARCHITECTURE.md](GRID-ARCHITECTURE.md).
+
 ## Harness
 
 For deterministic tests without a live AIRC monitor:

diff --git a/docs/grid/GRID-ARCHITECTURE.md b/docs/grid/GRID-ARCHITECTURE.md
@@ -184,6 +184,180 @@ Entities already serialize/deserialize cleanly, carry UUIDs, have CRUD events, a
 
 No new serialization format. No new ID scheme. No new event system. The Grid protocol IS the existing protocol, routed over a mesh.
 
+### 3.5 Secrets, API Keys, And Capability Leases
+
+The AIRC workflow is the right mental model: agents coordinate by sending
+stable identifiers, immutable SHAs, handles, and acknowledgements. They do not
+send the thing itself when the thing is large, private, or operationally
+sensitive. Grid secrets follow the same rule.
+
+**Default rule:** no raw API key, HF token, SSH key, cookie, model license token,
+or provider credential is ever sent through AIRC, Grid events, chat transcripts,
+logs, replay captures, RAG, or persona memory.
+
+Every node owns its local secret store under `$HOME/.continuum`. The grid moves
+capability facts and encrypted grants:
+
+```typescript
+interface GridSecretCapability {
+  secretRef: string;              // e.g. provider/openai/default
+  provider: string;               // openai, anthropic, huggingface, etc.
+  scopes: string[];               // chat, embeddings, upload, factory
+  ownerNodeId: UUID;
+  version: number;
+  fingerprint: string;            // hash/HMAC of normalized metadata, never value
+  available: boolean;             // non-empty + health check passed
+  expiresAt?: string;             // for leases, not local owner secrets
+}
+
+interface GridSecretLease {
+  leaseId: UUID;
+  secretRef: string;
+  granteeNodeId: UUID;
+  scopes: string[];
+  expiresAt: string;
+  auditHandle: UUID;
+}
+
+interface GridSecretRevision {
+  nodeId: UUID;
+  secretRef: string;
+  version: number;
+  fingerprint: string;
+  scopes: string[];
+  source: 'env-file' | 'settings-ui' | 'persona-command' | 'factory-import';
+  updatedAt: string;
+}
+```
+
+The Settings page, setup flow, persona helper, and JTAG commands all write to
+the same local authority. Personas may help the user enter a key or run a
+command, but they receive a `secretRef`/lease handle, not the raw value. The
+same handle can then be used by Rust workers, TypeScript adapters, factory
+jobs, and grid commands without each layer inventing its own credential path.
+
+Most real setup starts on the lowest-power machine in front of the user:
+
+- edit `$HOME/.continuum/config.env` directly;
+- use the Settings/API Providers widget;
+- ask a persona to call existing `ai/key/save`, `ai/key/remove`, or future
+  `ai/key/*` merge commands;
+- import a factory/upload credential for a specific workflow.
+
+All four entry points produce the same redacted `GridSecretRevision`. Grid sync
+then behaves like a small, secret-aware git merge: advertise revisions, compute
+a redacted diff, ask for approval if the same `secretRef` changed on more than
+one node, then apply only approved encrypted writes through `SecretManager`.
+The merge object contains names, versions, fingerprints, scopes, source, and
+timestamps. It never contains the secret value.
+
+```typescript
+interface GridSecretMergePlan {
+  baseRevision?: GridSecretRevision;
+  localRevision?: GridSecretRevision;
+  remoteRevision?: GridSecretRevision;
+  action: 'keep-local' | 'import-remote' | 'export-local' | 'rotate' | 'manual';
+  conflict: boolean;
+  reason: string;
+}
+```
+
+Git can be the implementation substrate for revision history if it is useful,
+but it must be a redacted secret ledger, not a repository of `.env` values. A
+commit may contain `secretRef`, fingerprint, version, and merge decision; it
+must never contain an API key or encrypted credential blob intended for another
+node.
+
+The process that keeps this in line should be a normal Continuum daemon/process,
+not a one-off sync script. It watches local secret/config revisions and
+occasionally runs the same `ai/key/*` command composition a user action would
+run. For explicit user mutations, `sync` is a parameter on the existing command
+shape, not a new top-level transport noun: `ai/key/save --sync` and
+`ai/key/remove --sync`.
+
+```text
+local edit/widget/persona command
+  -> SecretManager writes local state
+  -> GridReconcilerDaemon notices or receives the change event
+  -> GridReconcilerDaemon runs a bounded ai/key command program for selected peers:
+       - ai/key/status
+       - ai/key/diff
+       - optional owner/persona approval on conflicts
+       - ai/key/apply-merge
+  -> audit/replay records command handles, fingerprints, timings, outcomes
+```
+
+This is the same pattern as an intra-environment call like screenshot capture,
+but the target environment is another Continuum node. One node asks another node
+to execute a typed command, or a small bounded program of typed commands, against
+the target's own `$HOME/.continuum`. The caller receives typed redacted results;
+both sides can replay the decision without exposing the secret.
+
+The substrate already exists in the command system:
+
+- `grid/send` is the explicit routed command envelope: target node, command
+  name, params, typed result.
+- `GridInterceptor` is the transparent path: normal `Commands.execute()` can be
+  routed remotely when the router chooses a peer.
+- `grid/route` is the dry-run/debug primitive for "where would this command
+  execute?"
+- `model/forge` already delegates to `grid/job-submit`; forge jobs are therefore
+  another consumer of the same substrate, not a separate agent-managed lane.
+
+The missing abstraction is a bounded command program shape: a small ordered set
+of existing typed commands with limits, redaction policy, timeout, approval
+rules, and audit handles. It should be boring TypeScript data, not arbitrary
+shell. Secrets need it for status/diff/apply; forge needs it for preflight,
+credential availability, artifact/cache checks, job submit, and status followup.
+Grid should run those programs itself. It must not require a coding agent on
+each machine to manually align environment variables or forge setup.
+
+The first deployment target is the user's local grid: a trusted subnet/intranet
+over Tailscale. The same command envelope later extends to trusted WAN peers and
+eventually other users on the P2P mesh, with tighter limits, explicit approval,
+and stronger validation as trust decreases. The same shape later applies to
+model registry sync, LoRA availability, settings templates, and other low-volume
+grid state.
+
+**API-key slice for the first PR:**
+
+- Existing `ai/key/save`: write one key into `$HOME/.continuum/config.env` or
+  the platform vault through `SecretManager`; redact value from logs and command
+  echo. Add `sync?: boolean | 'trusted-grid'` to request immediate propagation
+  after the local write.
+- Existing `ai/key/remove`: remove one key through `SecretManager`. Add
+  `sync?: boolean | 'trusted-grid'` to propagate deletion/revocation metadata
+  after the local remove.
+- Existing `ai/key/test`: validate a candidate or stored provider key.
+- Existing `ai/providers/status`: provider-facing availability view.
+- `ai/key/status`: report configured key names, source path, empty
+  placeholders, fingerprints, and health without values.
+- `ai/key/diff`: compare local redacted revisions with one or more peers and
+  produce a merge plan without values.
+- `ai/key/apply-merge`: apply an approved merge plan through `SecretManager`.
+- `ai/key/request-lease`: request a scoped, expiring grant from an owner node;
+  default response is deny unless the owner or policy approves.
+- `ai/key/revoke-lease`: revoke a lease and emit an audit event.
+
+**Encrypted sharing is explicit.** If the owner chooses to copy a key to another
+trusted node, the export is an envelope encrypted to the target node identity
+and imported through `SecretManager`; loose file copy is not a grid protocol.
+The audit trail records requester, approver, `secretRef`, fingerprint, version,
+scope, and outcome. It never records the secret value.
+
+**No-token onboarding is a gate.** Fresh installs must work with public models
+and local inference without `HF_TOKEN` or any cloud key. `HF_TOKEN` is only for
+private/gated downloads, uploads, factory publishing, or user-selected provider
+workflows. A missing key produces a typed unavailable/degraded result; it must
+not silently route to a cloud fallback, stale credential, or CPU-shaped
+workaround.
+
+**Replay and introspection stay useful because they are redacted.** Record the
+command, `secretRef`, fingerprint/version, lease id, timing, target node, and
+result. That gives VDD/JTAG replay enough information to reproduce routing and
+authorization behavior without poisoning logs, RAG, or persona memory with
+credentials.
+
 ---
 
 ## 4. Transport Layer

diff --git a/docs/planning/ALPHA-GAP-ANALYSIS.md b/docs/planning/ALPHA-GAP-ANALYSIS.md
@@ -2,14 +2,20 @@
 
 <!-- markdownlint-disable MD013 MD060 -->
 
-**Updated**: 2026-05-11
+**Updated**: 2026-05-13
 **Branch policy**: every change lands as `PR -> canary -> validation -> PR -> main`
 **Status**: active planning document, shared by humans and agents
 **Operating rule**: Rust owns runtime logic. TypeScript is UI, schema, generated types, and thin command/transport glue.
+**Template-first rule**: new commands must start from `src/generator/specs/*.json` and Continuum's command generator. Manual command scaffolds are not acceptable; hand edits are for post-generation behavior only.
 **Architectural mandate**: Rust-first, GPU-first, replay-tested. No patchwork substitutes for the target architecture.
 **Sensory model plan**: [Sensory Model And Experiential Plasticity Plan](../architecture/SENSORY-MODEL-AND-EXPERIENTIAL-PLASTICITY-PLAN.md)
 
-This document is the alpha source of truth. Work should not proceed as disconnected chat threads or private agent branches. Each implementation PR must name the issue it advances, land in `canary`, publish validation evidence, and only then be considered for promotion to `main`.
+This document is the alpha/gap source of truth. Work should not proceed as disconnected chat threads, private agent branches, or parallel "gap" documents. Each implementation PR must name the issue it advances, land in `canary`, publish validation evidence, and only then be considered for promotion to `main`.
+
+As of 2026-05-13 there is exactly one alpha/gap planning file:
+`docs/planning/ALPHA-GAP-ANALYSIS.md`. New alpha/gap notes are merged here or
+deleted. Architecture references may point here, but they must not become
+parallel status ledgers.
 
 The previous 2026-05-01 alpha snapshot was useful but had become a historical log. This revision turns it into an execution plan for the current goal: **stable, GPU-first, Rust-centric Continuum with modular Docker and fast tests that do not depend on the Node/UI stack for core correctness.**
 
@@ -520,22 +526,41 @@ Implementation posture:
 | Issue | Priority | Direction | Test gate |
 |---|---:|---|---|
 | file: config single-source issue | P0 | `SecretManager` and Rust `secrets.rs` must treat only non-empty values as configured and must lazy-load `$HOME/.continuum/config.env` before any provider check | provider status shows cloud unavailable for empty placeholders; local chat still works |
-| file: `grid/config/sync` command issue | P0 | create a command pair for encrypted config sharing over trusted grid/Tailscale nodes; no loose file copying and no browser exposure | two-node test shares selected keys, decrypts only on trusted target, and never logs values |
+| [#1097](https://github.com/CambrianTech/continuum/issues/1097) API-key merge commands | P0 | extend the existing `ai/key/*` command surface for encrypted config sharing over trusted grid/Tailscale nodes; no loose file copying and no browser exposure | two-node test shares selected keys, decrypts only on trusted target, and never logs values |
+| [#1098](https://github.com/CambrianTech/continuum/issues/1098) routed command program substrate | P0 | consolidate bounded multi-command execution on top of `grid/send`, `GridInterceptor`, and `grid/route` so secrets and forge use the same path | one local-grid test runs a redacted `ai/key/*` program; one forge preflight routes through the same envelope |
 | #860 config.env as directory | P1 | keep setup file/dir creation idempotent and typed | setup test catches file-vs-dir mismatch |
 
+Implementation status:
+
+- Shared `ai/key` base types now exist for provider identity, sync intent,
+  target nodes, dry-run, synced state, and merge-plan id.
+- Existing `ai/key/save`, `ai/key/remove`, and `ai/key/test` shared types
+  inherit the base. Runtime sync behavior is intentionally not claimed until the
+  routed reconciliation path exists.
+- `ai/key/status` is generated from `src/generator/specs/ai-key-status.json`
+  and returns only redacted provider/key/source/configured/fingerprint metadata.
+- `grid/send` is the explicit routed command envelope; `GridInterceptor` is the
+  transparent `Commands.execute()` remote path; `grid/route` is the dry-run
+  routing/debug primitive.
+
 Command shape:
 
-- `grid/config/status`: list configured key names, source path, empty placeholders, and target-node drift without values.
-- `grid/config/export`: encrypt selected config keys for a specific trusted node identity.
-- `grid/config/import`: decrypt and merge selected keys into the target node's `$HOME/.continuum/config.env`.
-- `grid/config/sync`: orchestrate export/import across trusted grid nodes and report per-node success.
+- Existing `ai/key/save`: write one key through `SecretManager` to `$HOME/.continuum/config.env` or the platform vault; command echo and logs must redact values.
+- Existing `ai/key/remove`: remove one key through `SecretManager`.
+- Existing `ai/key/test`: validate a candidate or stored provider key.
+- Existing `ai/providers/status`: provider-facing availability view.
+- `ai/key/status`: list configured key names, source path, empty placeholders, fingerprints, and provider health without values.
+- `ai/key/diff`: compare redacted key revisions across selected target nodes and produce a merge plan without values.
+- `ai/key/apply-merge`: apply an approved merge plan through `SecretManager`; conflicts require owner/persona approval and never auto-overwrite a newer local key.
 
 Rules:
 
 - Empty placeholders such as `DEEPSEEK_API_KEY=` are documentation, not availability.
 - Local mode must work with zero API keys.
 - Cloud personas are eligible only when their required key is non-empty and the provider health check is not expired/failed.
 - Config sharing is an owner/trusted-node command. It should use grid identity plus transport encryption, then persist through `SecretManager` so all runtimes see one source.
+- Remote/grid execution is command routing context, not a namespace. The capability name stays stable while target environment changes.
+- Fresh install and Carl smoke must pass with public model downloads and no `HF_TOKEN`; token-dependent private/gated/factory upload paths are optional later setup.
 
 ### 2. GPU Runtime Stability
 

diff --git a/src/commands/ai/key/common/AiKeyBase.ts b/src/commands/ai/key/common/AiKeyBase.ts
@@ -0,0 +1,55 @@
+/**
+ * Shared AI key command types.
+ *
+ * The ai/key/* commands stay modular by verb, while shared params keep
+ * provider identity, sync intent, and redacted merge metadata consistent.
+ */
+
+import type { CommandParams, CommandResult, JTAGContext } from '@system/core/types/JTAGTypes';
+import { createPayload } from '@system/core/types/JTAGTypes';
+import { SYSTEM_SCOPES } from '@system/core/types/SystemScopes';
+import type { JTAGError } from '@system/core/types/ErrorTypes';
+import type { UUID } from '@system/core/types/CrossPlatformUUID';
+
+export type AiKeySyncMode = boolean | 'trusted-grid';
+
+export interface AiKeyParams extends CommandParams {
+  /** Provider config key or provider alias, e.g. OPENAI_API_KEY or openai. */
+  provider?: string;
+  /** Request sync after local mutation. Remote execution stays routing context. */
+  sync?: AiKeySyncMode;
+  /** Optional target node ids for explicit sync/diff/apply flows. */
+  targetNodes?: string[];
+  /** Build a merge plan without writing. */
+  dryRun?: boolean;
+}
+
+export interface AiKeyResult extends CommandResult {
+  success: boolean;
+  provider?: string;
+  synced?: boolean;
+  syncMode?: AiKeySyncMode;
+  targetNodes?: string[];
+  mergePlanId?: string;
+  error?: JTAGError;
+}
+
+export const createAiKeyParams = <T extends Partial<AiKeyParams> = Partial<AiKeyParams>>(
+  context: JTAGContext,
+  sessionId: UUID,
+  data: T & { provider?: string }
+): AiKeyParams & T => createPayload(context, sessionId, {
+  userId: SYSTEM_SCOPES.SYSTEM,
+  provider: data.provider ?? '',
+  ...data
+} as AiKeyParams & T);
+
+export const createAiKeyResult = <T extends Partial<AiKeyResult> = Partial<AiKeyResult>>(
+  context: JTAGContext,
+  sessionId: UUID,
+  data: T & { success: boolean; provider?: string }
+): AiKeyResult & T => createPayload(context, sessionId, {
+  userId: SYSTEM_SCOPES.SYSTEM,
+  provider: data.provider ?? '',
+  ...data
+} as AiKeyResult & T);