feat: add Keycard provider for sandbox identity credential management

## Problem Statement

Sandboxes need per-instance Keycard identities for service-to-service authentication. Today, providers are passive credential stores — no provider performs API calls during the sandbox lifecycle. We need a new Keycard provider that creates an APPLICATION with a unique SPIFFE ID before sandbox provisioning, generates ephemeral password credentials injected as `KEYCARD_CLIENT_ID` and `KEYCARD_CLIENT_SECRET` env vars, and cleans up the APPLICATION when the sandbox is decommissioned. Credentials must never be stored long-term — they are read once from the Keycard API and scoped to the sandbox lifetime.

## Technical Context

The current provider system is designed around passive credential storage: a provider record holds static credentials and config in key-value maps, and the sandbox supervisor fetches them at boot via `GetSandboxProviderEnvironment`. No existing provider makes external API calls or participates in sandbox lifecycle events. The Keycard integration introduces a fundamentally new pattern — an "active" or "lifecycle-aware" provider that must:
1. Call the Keycard API to create an APPLICATION before the sandbox starts
2. Generate per-sandbox credentials dynamically
3. Clean up the APPLICATION when the sandbox is deleted

This is architecturally novel for the provider system and requires new hook points in the server-side sandbox lifecycle.

## Affected Components

| Component | Key Files | Role |
|-----------|-----------|------|
| Provider system | `crates/openshell-providers/src/lib.rs`, `providers/mod.rs` | Provider plugin trait, registry, discovery |
| Gateway server (sandbox lifecycle) | `crates/openshell-server/src/grpc.rs` | `create_sandbox()`, `delete_sandbox()`, `resolve_provider_environment()` |
| Sandbox supervisor | `crates/openshell-sandbox/src/lib.rs`, `grpc_client.rs`, `secrets.rs`, `process.rs` | Fetches provider env, injects into child processes |
| Proto definitions | `proto/datamodel.proto`, `proto/openshell.proto` | Provider message, sandbox spec, gRPC services |
| Architecture docs | `architecture/sandbox-providers.md` | Provider architecture documentation |

## Technical Investigation

### Architecture Overview

**Current provider flow:**
1. Providers are created via CLI/gRPC with a `type`, `credentials` map, and `config` map — persisted in the server's object store.
2. At sandbox creation (`create_sandbox()` in `grpc.rs:178-315`), `SandboxSpec.providers` lists provider names. The server validates existence (fail-fast) but does NOT inject credentials into the pod spec.
3. At sandbox boot, the supervisor calls `GetSandboxProviderEnvironment` (`grpc.rs:914-945`), which resolves provider names → credential env vars via `resolve_provider_environment()` (`grpc.rs:3641-3672`).
4. The supervisor creates placeholder values (`openshell:resolve:env:KEY`) and holds real secrets in memory for proxy-time resolution via `SecretResolver`.
5. At sandbox deletion (`delete_sandbox()` in `grpc.rs:601-701`), the server deletes K8s resources, SSH sessions, and settings. **There is no provider cleanup hook.**

**Key observation:** The `ProviderPlugin` trait has an `apply_to_sandbox()` method that exists as a default no-op. It takes `&Provider` and returns `Result<(), ProviderError>`. It has no access to sandbox ID, no async support, and is called from nowhere in the current codebase. This cannot be used as-is for lifecycle hooks.

### Code References

| Location | Description |
|----------|-------------|
| `crates/openshell-providers/src/lib.rs` | `ProviderPlugin` trait — `discover()`, `apply_to_sandbox()` (no-op), `environment_variables()` |
| `crates/openshell-providers/src/providers/mod.rs` | Provider module registry — where `keycard` module would be added |
| `crates/openshell-server/src/grpc.rs:178-315` | `create_sandbox()` — sandbox ID generated at line 229, K8s creation at ~line 276. Pre-provision hook window is between these |
| `crates/openshell-server/src/grpc.rs:601-701` | `delete_sandbox()` — sandbox cleanup. Keycard APPLICATION deletion would go here |
| `crates/openshell-server/src/grpc.rs:3641-3672` | `resolve_provider_environment()` — iterates providers, builds env map. Needs Keycard-specific credential resolution logic |
| `crates/openshell-server/src/grpc.rs:914-945` | `get_sandbox_provider_environment()` gRPC handler |
| `crates/openshell-server/src/grpc.rs:4116-4268` | `create_provider_record()` — validation and persistence for new providers |
| `crates/openshell-sandbox/src/lib.rs:187-205` | Supervisor fetches provider env at startup via gRPC |
| `crates/openshell-sandbox/src/secrets.rs` | `SecretResolver::from_provider_env()` builds placeholder/resolver pair |
| `crates/openshell-sandbox/src/process.rs:27-28` | `inject_provider_env()` into child processes |
| `proto/datamodel.proto:79-88` | `Provider` message — id, name, type, credentials, config maps |
| `proto/datamodel.proto:26-36` | `SandboxSpec` with `providers` field |
| `architecture/sandbox-providers.md` | Provider architecture documentation |

### Current Behavior

When `create_sandbox()` runs:
1. Request is validated, sandbox ID is generated (`uuid::Uuid::new_v4()`)
2. Listed providers are checked for existence in the store (fail-fast)
3. Sandbox is persisted to object store, then created as a K8s resource
4. No provider-specific actions are triggered during creation

When `delete_sandbox()` runs:
1. Sandbox is fetched from store, SSH sessions and settings cleaned up
2. K8s resource is deleted
3. No provider-specific cleanup occurs

When `resolve_provider_environment()` runs:
1. Iterates provider names from the sandbox spec
2. Fetches each provider record from the store
3. Concatenates all `credentials` map entries into a flat env map
4. All credentials are blindly injected — no per-provider filtering

### What Would Need to Change

**New provider plugin (`keycard.rs`):**
- Implements `ProviderPlugin` for discovery and type registration
- Defines expected config keys: `base_url`, `zone_id`, `client_id`, `client_secret`
- Defines output env vars: `KEYCARD_CLIENT_ID`, `KEYCARD_CLIENT_SECRET`

**New Keycard HTTP client module (server-side):**
- `POST /zones/{zoneId}/applications` — create APPLICATION with SPIFFE ID as identifier
- `POST /zones/{zoneId}/application-credentials` — create password credential, read `identifier` (client ID) and `password` (client secret)
- `DELETE /zones/{zoneId}/applications/{id}` — delete APPLICATION
- Basic auth using admin `client_id`/`client_secret` from provider config
- Error handling with retries for transient failures

**Modified `create_sandbox()` in `grpc.rs`:**
- After sandbox ID generation (line 229), before K8s creation (~line 276):
  - Check if any listed provider is type `keycard`
  - Call Keycard API to create APPLICATION with SPIFFE ID `spiffe://openshell/sandbox/{sandbox_id}`
  - Call Keycard API to create password credential
  - Store the credential response (`identifier` + `password`) for injection
- Handle partial failures: if Keycard call fails, abort sandbox creation; if K8s creation fails after Keycard succeeds, clean up the Keycard APPLICATION

**Modified `delete_sandbox()` in `grpc.rs`:**
- Before or after K8s resource deletion:
  - Check if sandbox has a Keycard provider
  - Retrieve the Keycard APPLICATION ID (stored in sandbox metadata or a mapping)
  - Call Keycard API to delete the APPLICATION

**Modified `resolve_provider_environment()` in `grpc.rs`:**
- For Keycard providers, distinguish between admin credentials (used by server, NOT injected) and sandbox credentials (injected as `KEYCARD_CLIENT_ID`, `KEYCARD_CLIENT_SECRET`)
- Admin credentials in `config` map should not leak into sandbox env

**Provider registration:**
- Add `keycard` module to `providers/mod.rs`
- Register in `ProviderRegistry::new()` in `lib.rs`
- Add `"keycard"` to `normalize_provider_type()`

### Alternative Approaches Considered

**1. Where does Keycard lifecycle logic live?**
- **Option A: Inline in `grpc.rs`** — Add Keycard API calls directly in `create_sandbox()` and `delete_sandbox()`. Simplest, but couples server to a specific provider.
- **Option B: Provider lifecycle trait** — Extend `ProviderPlugin` or create `LifecycleProvider` trait with `on_sandbox_create()`/`on_sandbox_delete()` async methods. Cleaner abstraction but more engineering upfront.
- **Option C: Generic provider hook system** — Event-based dispatch in the server for all provider lifecycle events. Most flexible, most complex, likely premature.
- **Trade-off:** Option A is pragmatic for a first implementation. Option B should be considered if a second lifecycle provider emerges. This is a decision for human review.

**2. Per-sandbox credential storage strategy:**
- **Option A: Store in provider credentials map** — Use the existing `credentials` map on a per-sandbox Keycard provider record. Flows through existing injection path seamlessly. Contradicts strict "never stores" but credentials are ephemeral (deleted with sandbox).
- **Option B: Ephemeral credential store** — New data structure scoped to sandbox lifetime, not persisted to provider store. Architecturally novel, more complex.
- **Option C: SandboxSpec environment** — Inject directly into pod env. Simpler but bypasses the secret placeholder/proxy-resolution system.
- **Trade-off:** Option A is pragmatic — the credentials ARE stored temporarily but are scoped to the sandbox lifetime and cleaned up on decommission. The "never stores" requirement should be interpreted as "never stores long-lived credentials."

**3. SPIFFE ID format:**
- No SPIFFE framework exists in the codebase. The SPIFFE ID would be a string convention, not a full SPIFFE runtime.
- **Proposed format:** `spiffe://openshell/sandbox/{sandbox_id}` — needs human input on trust domain and hierarchy.

### Patterns to Follow

1. **Provider plugin pattern:** Follow `github.rs` or `claude.rs` for `ProviderPlugin` implementation — same registration, discovery, and environment variable patterns.
2. **HTTP client:** Use `reqwest` with `rustls-tls` (already in workspace `Cargo.toml` at version 0.12).
3. **Error handling:** Use `tonic::Status` for gRPC errors in server code.
4. **Testing:** Use `wiremock` (already a dev-dependency of `openshell-server`) for mocking Keycard HTTP API. Use `Store::connect("sqlite::memory:")` for persistence tests.
5. **Provider docs:** Update `architecture/sandbox-providers.md` following its existing structure (has sections per provider type).

## Proposed Approach

Introduce a new `keycard` provider type that holds admin API credentials (base URL, zone ID, client ID, client secret) in its config map. During sandbox creation, the server detects the Keycard provider, calls the Keycard API to create an APPLICATION with a SPIFFE-formatted identifier and a password credential, then makes the generated `identifier`/`password` available as `KEYCARD_CLIENT_ID`/`KEYCARD_CLIENT_SECRET` for sandbox env injection. During sandbox deletion, the server calls the Keycard API to delete the APPLICATION. The initial implementation places the lifecycle logic inline in `create_sandbox()`/`delete_sandbox()` with a clear path toward a lifecycle trait abstraction if needed later.

## Scope Assessment

- **Complexity:** Medium-High
- **Confidence:** Medium — clear path for core functionality, several design decisions need human input (credential storage strategy, SPIFFE ID format, lifecycle logic location)
- **Estimated files to change:** ~8-10
- **Issue type:** `feat`

## Risks & Open Questions

- **Failure modes during provisioning:** If Keycard API is unreachable, should sandbox creation be blocked entirely or proceed without credentials? Partial failures (APPLICATION created but credential creation fails) need rollback logic.
- **Orphaned Keycard APPLICATIONs:** If sandbox K8s creation fails after the Keycard APPLICATION was created, or if the server crashes between the two operations, Keycard APPLICATIONs could be orphaned. A reconciliation/cleanup mechanism may be needed.
- **Admin vs sandbox credential separation:** The current `resolve_provider_environment()` injects all credentials from a provider — the Keycard admin credentials must be explicitly excluded from sandbox env injection.
- **SPIFFE ID format and trust domain:** What should the trust domain be? Should it include the gateway namespace or cluster identity? (Proposed: `spiffe://openshell/sandbox/{sandbox_id}`)
- **Credential lifecycle mapping:** Where is the Keycard APPLICATION ID stored so it can be retrieved during sandbox deletion? Options: sandbox metadata, a dedicated mapping table, or derived from the SPIFFE ID.
- **Per-sandbox provider record vs dynamic resolution:** Should a per-sandbox Keycard provider record be created, or should a single shared provider record hold admin creds with per-sandbox credentials resolved dynamically?

## Test Considerations

- **Unit tests for Keycard HTTP client:** Use `wiremock` to mock all three Keycard API endpoints (create application, create credential, delete application). Test success paths, error responses, and timeout handling.
- **Unit tests for modified `resolve_provider_environment()`:** Extend the existing 7 tests to cover Keycard-specific credential resolution and admin credential filtering.
- **Integration tests for sandbox lifecycle:** Test that `create_sandbox()` calls Keycard API and injects credentials, and `delete_sandbox()` calls Keycard API to clean up.
- **Failure/rollback tests:** Test partial failure scenarios — Keycard success then K8s failure, Keycard credential creation failure after APPLICATION creation.
- **Provider validation tests:** Test that Keycard provider creation validates required config keys (zone_id, client_id, client_secret).
- **E2e tests** may be needed if a test Keycard environment is available, but unit tests with `wiremock` should provide sufficient coverage for the initial implementation.

---
*Created by spike investigation. Use `build-from-issue` to plan and implement.*

Location	Description
`crates/openshell-providers/src/lib.rs`	`ProviderPlugin` trait — `discover()`, `apply_to_sandbox()` (no-op), `environment_variables()`
`crates/openshell-providers/src/providers/mod.rs`	Provider module registry — where `keycard` module would be added
`crates/openshell-server/src/grpc.rs:178-315`	`create_sandbox()` — sandbox ID generated at line 229, K8s creation at ~line 276. Pre-provision hook window is between these
`crates/openshell-server/src/grpc.rs:601-701`	`delete_sandbox()` — sandbox cleanup. Keycard APPLICATION deletion would go here
`crates/openshell-server/src/grpc.rs:3641-3672`	`resolve_provider_environment()` — iterates providers, builds env map. Needs Keycard-specific credential resolution logic
`crates/openshell-server/src/grpc.rs:914-945`	`get_sandbox_provider_environment()` gRPC handler
`crates/openshell-server/src/grpc.rs:4116-4268`	`create_provider_record()` — validation and persistence for new providers
`crates/openshell-sandbox/src/lib.rs:187-205`	Supervisor fetches provider env at startup via gRPC
`crates/openshell-sandbox/src/secrets.rs`	`SecretResolver::from_provider_env()` builds placeholder/resolver pair
`crates/openshell-sandbox/src/process.rs:27-28`	`inject_provider_env()` into child processes
`proto/datamodel.proto:79-88`	`Provider` message — id, name, type, credentials, config maps
`proto/datamodel.proto:26-36`	`SandboxSpec` with `providers` field
`architecture/sandbox-providers.md`	Provider architecture documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Keycard provider for sandbox identity credential management #748

Problem Statement

Technical Context

Affected Components

Technical Investigation

Architecture Overview

Code References

Current Behavior

What Would Need to Change

Alternative Approaches Considered

Patterns to Follow

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Key Files	Role
Provider system	`crates/openshell-providers/src/lib.rs`, `providers/mod.rs`	Provider plugin trait, registry, discovery
Gateway server (sandbox lifecycle)	`crates/openshell-server/src/grpc.rs`	`create_sandbox()`, `delete_sandbox()`, `resolve_provider_environment()`
Sandbox supervisor	`crates/openshell-sandbox/src/lib.rs`, `grpc_client.rs`, `secrets.rs`, `process.rs`	Fetches provider env, injects into child processes
Proto definitions	`proto/datamodel.proto`, `proto/openshell.proto`	Provider message, sandbox spec, gRPC services
Architecture docs	`architecture/sandbox-providers.md`	Provider architecture documentation

feat: add Keycard provider for sandbox identity credential management #748

Description

Problem Statement

Technical Context

Affected Components

Technical Investigation

Architecture Overview

Code References

Current Behavior

What Would Need to Change

Alternative Approaches Considered

Patterns to Follow

Proposed Approach

Scope Assessment

Risks & Open Questions

Test Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions