Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/grid/AIRC-CONTINUUM-BRIDGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,13 @@ Heavy data should stay out of AIRC. Use AIRC for manifests, handles, room
markers, artifact hashes, and job ids; use Continuum/Grid data paths for model
weights, LoRA artifacts, voice/video, and high-volume streams.

Secrets stay out of AIRC completely. API keys, HF tokens, SSH keys, cookies,
provider credentials, and encrypted secret payloads are not bridge messages.
AIRC can carry `secretRef` names, fingerprints, lease ids, request ids, PR SHAs,
and acknowledgements so humans and agents can coordinate, but actual credential
material must move only through the secret/capability command path described in
[GRID-ARCHITECTURE.md](GRID-ARCHITECTURE.md).

## Harness

For deterministic tests without a live AIRC monitor:
Expand Down
174 changes: 174 additions & 0 deletions docs/grid/GRID-ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,180 @@ Entities already serialize/deserialize cleanly, carry UUIDs, have CRUD events, a

No new serialization format. No new ID scheme. No new event system. The Grid protocol IS the existing protocol, routed over a mesh.

### 3.5 Secrets, API Keys, And Capability Leases

The AIRC workflow is the right mental model: agents coordinate by sending
stable identifiers, immutable SHAs, handles, and acknowledgements. They do not
send the thing itself when the thing is large, private, or operationally
sensitive. Grid secrets follow the same rule.

**Default rule:** no raw API key, HF token, SSH key, cookie, model license token,
or provider credential is ever sent through AIRC, Grid events, chat transcripts,
logs, replay captures, RAG, or persona memory.

Every node owns its local secret store under `$HOME/.continuum`. The grid moves
capability facts and encrypted grants:

```typescript
interface GridSecretCapability {
secretRef: string; // e.g. provider/openai/default
provider: string; // openai, anthropic, huggingface, etc.
scopes: string[]; // chat, embeddings, upload, factory
ownerNodeId: UUID;
version: number;
fingerprint: string; // hash/HMAC of normalized metadata, never value
available: boolean; // non-empty + health check passed
expiresAt?: string; // for leases, not local owner secrets
}

interface GridSecretLease {
leaseId: UUID;
secretRef: string;
granteeNodeId: UUID;
scopes: string[];
expiresAt: string;
auditHandle: UUID;
}

interface GridSecretRevision {
nodeId: UUID;
secretRef: string;
version: number;
fingerprint: string;
scopes: string[];
source: 'env-file' | 'settings-ui' | 'persona-command' | 'factory-import';
updatedAt: string;
}
```

The Settings page, setup flow, persona helper, and JTAG commands all write to
the same local authority. Personas may help the user enter a key or run a
command, but they receive a `secretRef`/lease handle, not the raw value. The
same handle can then be used by Rust workers, TypeScript adapters, factory
jobs, and grid commands without each layer inventing its own credential path.

Most real setup starts on the lowest-power machine in front of the user:

- edit `$HOME/.continuum/config.env` directly;
- use the Settings/API Providers widget;
- ask a persona to call existing `ai/key/save`, `ai/key/remove`, or future
`ai/key/*` merge commands;
- import a factory/upload credential for a specific workflow.

All four entry points produce the same redacted `GridSecretRevision`. Grid sync
then behaves like a small, secret-aware git merge: advertise revisions, compute
a redacted diff, ask for approval if the same `secretRef` changed on more than
one node, then apply only approved encrypted writes through `SecretManager`.
The merge object contains names, versions, fingerprints, scopes, source, and
timestamps. It never contains the secret value.

```typescript
interface GridSecretMergePlan {
baseRevision?: GridSecretRevision;
localRevision?: GridSecretRevision;
remoteRevision?: GridSecretRevision;
action: 'keep-local' | 'import-remote' | 'export-local' | 'rotate' | 'manual';
conflict: boolean;
reason: string;
}
```

Git can be the implementation substrate for revision history if it is useful,
but it must be a redacted secret ledger, not a repository of `.env` values. A
commit may contain `secretRef`, fingerprint, version, and merge decision; it
must never contain an API key or encrypted credential blob intended for another
node.

The process that keeps this in line should be a normal Continuum daemon/process,
not a one-off sync script. It watches local secret/config revisions and
occasionally runs the same `ai/key/*` command composition a user action would
run. For explicit user mutations, `sync` is a parameter on the existing command
shape, not a new top-level transport noun: `ai/key/save --sync` and
`ai/key/remove --sync`.

```text
local edit/widget/persona command
-> SecretManager writes local state
-> GridReconcilerDaemon notices or receives the change event
-> GridReconcilerDaemon runs a bounded ai/key command program for selected peers:
- ai/key/status
- ai/key/diff
- optional owner/persona approval on conflicts
- ai/key/apply-merge
-> audit/replay records command handles, fingerprints, timings, outcomes
```

This is the same pattern as an intra-environment call like screenshot capture,
but the target environment is another Continuum node. One node asks another node
to execute a typed command, or a small bounded program of typed commands, against
the target's own `$HOME/.continuum`. The caller receives typed redacted results;
both sides can replay the decision without exposing the secret.

The substrate already exists in the command system:

- `grid/send` is the explicit routed command envelope: target node, command
name, params, typed result.
- `GridInterceptor` is the transparent path: normal `Commands.execute()` can be
routed remotely when the router chooses a peer.
- `grid/route` is the dry-run/debug primitive for "where would this command
execute?"
- `model/forge` already delegates to `grid/job-submit`; forge jobs are therefore
another consumer of the same substrate, not a separate agent-managed lane.

The missing abstraction is a bounded command program shape: a small ordered set
of existing typed commands with limits, redaction policy, timeout, approval
rules, and audit handles. It should be boring TypeScript data, not arbitrary
shell. Secrets need it for status/diff/apply; forge needs it for preflight,
credential availability, artifact/cache checks, job submit, and status followup.
Grid should run those programs itself. It must not require a coding agent on
each machine to manually align environment variables or forge setup.

The first deployment target is the user's local grid: a trusted subnet/intranet
over Tailscale. The same command envelope later extends to trusted WAN peers and
eventually other users on the P2P mesh, with tighter limits, explicit approval,
and stronger validation as trust decreases. The same shape later applies to
model registry sync, LoRA availability, settings templates, and other low-volume
grid state.

**API-key slice for the first PR:**

- Existing `ai/key/save`: write one key into `$HOME/.continuum/config.env` or
the platform vault through `SecretManager`; redact value from logs and command
echo. Add `sync?: boolean | 'trusted-grid'` to request immediate propagation
after the local write.
- Existing `ai/key/remove`: remove one key through `SecretManager`. Add
`sync?: boolean | 'trusted-grid'` to propagate deletion/revocation metadata
after the local remove.
- Existing `ai/key/test`: validate a candidate or stored provider key.
- Existing `ai/providers/status`: provider-facing availability view.
- `ai/key/status`: report configured key names, source path, empty
placeholders, fingerprints, and health without values.
- `ai/key/diff`: compare local redacted revisions with one or more peers and
produce a merge plan without values.
- `ai/key/apply-merge`: apply an approved merge plan through `SecretManager`.
- `ai/key/request-lease`: request a scoped, expiring grant from an owner node;
default response is deny unless the owner or policy approves.
- `ai/key/revoke-lease`: revoke a lease and emit an audit event.

**Encrypted sharing is explicit.** If the owner chooses to copy a key to another
trusted node, the export is an envelope encrypted to the target node identity
and imported through `SecretManager`; loose file copy is not a grid protocol.
The audit trail records requester, approver, `secretRef`, fingerprint, version,
scope, and outcome. It never records the secret value.

**No-token onboarding is a gate.** Fresh installs must work with public models
and local inference without `HF_TOKEN` or any cloud key. `HF_TOKEN` is only for
private/gated downloads, uploads, factory publishing, or user-selected provider
workflows. A missing key produces a typed unavailable/degraded result; it must
not silently route to a cloud fallback, stale credential, or CPU-shaped
workaround.

**Replay and introspection stay useful because they are redacted.** Record the
command, `secretRef`, fingerprint/version, lease id, timing, target node, and
result. That gives VDD/JTAG replay enough information to reproduce routing and
authorization behavior without poisoning logs, RAG, or persona memory with
credentials.

---

## 4. Transport Layer
Expand Down
39 changes: 32 additions & 7 deletions docs/planning/ALPHA-GAP-ANALYSIS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,20 @@

<!-- markdownlint-disable MD013 MD060 -->

**Updated**: 2026-05-11
**Updated**: 2026-05-13
**Branch policy**: every change lands as `PR -> canary -> validation -> PR -> main`
**Status**: active planning document, shared by humans and agents
**Operating rule**: Rust owns runtime logic. TypeScript is UI, schema, generated types, and thin command/transport glue.
**Template-first rule**: new commands must start from `src/generator/specs/*.json` and Continuum's command generator. Manual command scaffolds are not acceptable; hand edits are for post-generation behavior only.
**Architectural mandate**: Rust-first, GPU-first, replay-tested. No patchwork substitutes for the target architecture.
**Sensory model plan**: [Sensory Model And Experiential Plasticity Plan](../architecture/SENSORY-MODEL-AND-EXPERIENTIAL-PLASTICITY-PLAN.md)

This document is the alpha source of truth. Work should not proceed as disconnected chat threads or private agent branches. Each implementation PR must name the issue it advances, land in `canary`, publish validation evidence, and only then be considered for promotion to `main`.
This document is the alpha/gap source of truth. Work should not proceed as disconnected chat threads, private agent branches, or parallel "gap" documents. Each implementation PR must name the issue it advances, land in `canary`, publish validation evidence, and only then be considered for promotion to `main`.

As of 2026-05-13 there is exactly one alpha/gap planning file:
`docs/planning/ALPHA-GAP-ANALYSIS.md`. New alpha/gap notes are merged here or
deleted. Architecture references may point here, but they must not become
parallel status ledgers.

The previous 2026-05-01 alpha snapshot was useful but had become a historical log. This revision turns it into an execution plan for the current goal: **stable, GPU-first, Rust-centric Continuum with modular Docker and fast tests that do not depend on the Node/UI stack for core correctness.**

Expand Down Expand Up @@ -520,22 +526,41 @@ Implementation posture:
| Issue | Priority | Direction | Test gate |
|---|---:|---|---|
| file: config single-source issue | P0 | `SecretManager` and Rust `secrets.rs` must treat only non-empty values as configured and must lazy-load `$HOME/.continuum/config.env` before any provider check | provider status shows cloud unavailable for empty placeholders; local chat still works |
| file: `grid/config/sync` command issue | P0 | create a command pair for encrypted config sharing over trusted grid/Tailscale nodes; no loose file copying and no browser exposure | two-node test shares selected keys, decrypts only on trusted target, and never logs values |
| [#1097](https://github.com/CambrianTech/continuum/issues/1097) API-key merge commands | P0 | extend the existing `ai/key/*` command surface for encrypted config sharing over trusted grid/Tailscale nodes; no loose file copying and no browser exposure | two-node test shares selected keys, decrypts only on trusted target, and never logs values |
| [#1098](https://github.com/CambrianTech/continuum/issues/1098) routed command program substrate | P0 | consolidate bounded multi-command execution on top of `grid/send`, `GridInterceptor`, and `grid/route` so secrets and forge use the same path | one local-grid test runs a redacted `ai/key/*` program; one forge preflight routes through the same envelope |
| #860 config.env as directory | P1 | keep setup file/dir creation idempotent and typed | setup test catches file-vs-dir mismatch |

Implementation status:

- Shared `ai/key` base types now exist for provider identity, sync intent,
target nodes, dry-run, synced state, and merge-plan id.
- Existing `ai/key/save`, `ai/key/remove`, and `ai/key/test` shared types
inherit the base. Runtime sync behavior is intentionally not claimed until the
routed reconciliation path exists.
- `ai/key/status` is generated from `src/generator/specs/ai-key-status.json`
and returns only redacted provider/key/source/configured/fingerprint metadata.
- `grid/send` is the explicit routed command envelope; `GridInterceptor` is the
transparent `Commands.execute()` remote path; `grid/route` is the dry-run
routing/debug primitive.

Command shape:

- `grid/config/status`: list configured key names, source path, empty placeholders, and target-node drift without values.
- `grid/config/export`: encrypt selected config keys for a specific trusted node identity.
- `grid/config/import`: decrypt and merge selected keys into the target node's `$HOME/.continuum/config.env`.
- `grid/config/sync`: orchestrate export/import across trusted grid nodes and report per-node success.
- Existing `ai/key/save`: write one key through `SecretManager` to `$HOME/.continuum/config.env` or the platform vault; command echo and logs must redact values.
- Existing `ai/key/remove`: remove one key through `SecretManager`.
- Existing `ai/key/test`: validate a candidate or stored provider key.
- Existing `ai/providers/status`: provider-facing availability view.
- `ai/key/status`: list configured key names, source path, empty placeholders, fingerprints, and provider health without values.
- `ai/key/diff`: compare redacted key revisions across selected target nodes and produce a merge plan without values.
- `ai/key/apply-merge`: apply an approved merge plan through `SecretManager`; conflicts require owner/persona approval and never auto-overwrite a newer local key.

Rules:

- Empty placeholders such as `DEEPSEEK_API_KEY=` are documentation, not availability.
- Local mode must work with zero API keys.
- Cloud personas are eligible only when their required key is non-empty and the provider health check is not expired/failed.
- Config sharing is an owner/trusted-node command. It should use grid identity plus transport encryption, then persist through `SecretManager` so all runtimes see one source.
- Remote/grid execution is command routing context, not a namespace. The capability name stays stable while target environment changes.
- Fresh install and Carl smoke must pass with public model downloads and no `HF_TOKEN`; token-dependent private/gated/factory upload paths are optional later setup.

### 2. GPU Runtime Stability

Expand Down
55 changes: 55 additions & 0 deletions src/commands/ai/key/common/AiKeyBase.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
/**
* Shared AI key command types.
*
* The ai/key/* commands stay modular by verb, while shared params keep
* provider identity, sync intent, and redacted merge metadata consistent.
*/

import type { CommandParams, CommandResult, JTAGContext } from '@system/core/types/JTAGTypes';
import { createPayload } from '@system/core/types/JTAGTypes';
import { SYSTEM_SCOPES } from '@system/core/types/SystemScopes';
import type { JTAGError } from '@system/core/types/ErrorTypes';
import type { UUID } from '@system/core/types/CrossPlatformUUID';

export type AiKeySyncMode = boolean | 'trusted-grid';

export interface AiKeyParams extends CommandParams {
/** Provider config key or provider alias, e.g. OPENAI_API_KEY or openai. */
provider?: string;
/** Request sync after local mutation. Remote execution stays routing context. */
sync?: AiKeySyncMode;
/** Optional target node ids for explicit sync/diff/apply flows. */
targetNodes?: string[];
/** Build a merge plan without writing. */
dryRun?: boolean;
}

export interface AiKeyResult extends CommandResult {
success: boolean;
provider?: string;
synced?: boolean;
syncMode?: AiKeySyncMode;
targetNodes?: string[];
mergePlanId?: string;
error?: JTAGError;
}

export const createAiKeyParams = <T extends Partial<AiKeyParams> = Partial<AiKeyParams>>(
context: JTAGContext,
sessionId: UUID,
data: T & { provider?: string }
): AiKeyParams & T => createPayload(context, sessionId, {
userId: SYSTEM_SCOPES.SYSTEM,
provider: data.provider ?? '',
...data
} as AiKeyParams & T);

export const createAiKeyResult = <T extends Partial<AiKeyResult> = Partial<AiKeyResult>>(
context: JTAGContext,
sessionId: UUID,
data: T & { success: boolean; provider?: string }
): AiKeyResult & T => createPayload(context, sessionId, {
userId: SYSTEM_SCOPES.SYSTEM,
provider: data.provider ?? '',
...data
} as AiKeyResult & T);
Loading
Loading