SecAI-Hub
diff --git a/‎README.md‎
Lines changed: 3 additions & 2 deletions b/‎README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/security-status.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/security-status.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎files/system/etc/secure-ai/policy/incident-containment.yaml‎
Lines changed: 14 additions & 0 deletions b/‎files/system/etc/secure-ai/policy/incident-containment.yaml‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎files/system/usr/libexec/secure-ai/secai-forensic.sh‎
Lines changed: 189 additions & 0 deletions b/‎files/system/usr/libexec/secure-ai/secai-forensic.sh‎
Lines changed: 189 additions & 0 deletions
diff --git a/‎services/incident-recorder/alerting.go‎
Lines changed: 131 additions & 0 deletions b/‎services/incident-recorder/alerting.go‎
Lines changed: 131 additions & 0 deletions
@@ -158,7 +158,7 @@ Every model passes through the same fully automatic pipeline:
 | **Updates** | Cosign-verified rpm-ostree, staged workflow, greenboot auto-rollback |
 | **Supply Chain** | Per-service CycloneDX SBOMs, SLSA3 provenance attestation, cosign-signed checksums |
 
-See [docs/threat-model.md](docs/threat-model.md) for threat classes, residual risks, and security invariants. See [docs/security-status.md](docs/security-status.md) for implementation status of all 50 milestones.
+See [docs/threat-model.md](docs/threat-model.md) for threat classes, residual risks, and security invariants. See [docs/security-status.md](docs/security-status.md) for implementation status of all 51 milestones.
 
 ### Verify Image Signatures
 
@@ -378,7 +378,7 @@ See [docs/test-matrix.md](docs/test-matrix.md) for full breakdown.
 ## Roadmap
 
 <details>
-<summary>All 50 project milestones (click to expand)</summary>
+<summary>All 51 project milestones (click to expand)</summary>
 
 - [x] **Milestone 0** -- Threat model, dataflow, invariants, policy files
 - [x] **Milestone 1** -- Bootable OS, encrypted vault, GPU drivers
@@ -431,6 +431,7 @@ See [docs/test-matrix.md](docs/test-matrix.md) for full breakdown.
 - [x] **Milestone 48** -- Production hardening: build script fail-closed (fatal errors for 12 required services + binary verification gate), incident store fsync (crash-safe persistence), GPU backend metadata recording, llama-server watchdog (Type=notify + WatchdogSec=30), model catalog externalization (YAML with fallback), circuit breaker for inter-service HTTP calls, post-upgrade model verification in Greenboot, cosign key rotation documentation (full lifecycle)
 - [x] **Milestone 49** -- Signed-first install path: bootstrap script configures signing policy before first rebase (eliminates unverified transport), digest-pinned install flow (CI publishes digests in build summary + release assets), first-boot setup wizard (interactive integrity verification + vault + TPM2 + health check), recovery/dev path separated into dedicated doc
 - [x] **Milestone 50** -- Production operations package: backup/restore scripts (full/config/logs/keys categories, age/gpg encryption, SHA256 manifest, LUKS header backup/restore), rollback decision matrix (Greenboot auto-rollback + manual criteria), 5 break-glass recovery procedures, formal data retention policy (7 data classes, disk capacity thresholds)
+- [x] **Milestone 51** -- Stronger observability: unified appliance health dashboard (trusted/degraded/recovery_required), live SLO compliance monitoring (uptime + P95 latency tracking), webhook alerting hooks for containment events, forensic bundle export via UI + CLI (secai-forensic.sh), recovery ceremony endpoints wired
 
 </details>
 
 
@@ -1,6 +1,6 @@
 # Security Implementation Status
 
-This document is split into two sections. The first section covers **Security Assurance Controls** -- all implemented milestones (M0 through M50) that satisfy the M5 security assurance acceptance criteria. Every control listed there is complete and tested. The second section is the **Product Feature Roadmap**, which tracks planned product capabilities (Agent Mode Phases 2 and 3). These are product enhancements, not security assurance requirements; the M5 security posture is fully met without them.
+This document is split into two sections. The first section covers **Security Assurance Controls** -- all implemented milestones (M0 through M51) that satisfy the M5 security assurance acceptance criteria. Every control listed there is complete and tested. The second section is the **Product Feature Roadmap**, which tracks planned product capabilities (Agent Mode Phases 2 and 3). These are product enhancements, not security assurance requirements; the M5 security posture is fully met without them.
 
 Last updated: 2026-03-14
 
@@ -63,6 +63,7 @@ All M5 security assurance criteria are met. The controls below have been impleme
 | Production hardening | Implemented | M48 | Build script fail-closed (all `|| echo WARNING` fallbacks replaced with fatal errors for 12 required services, final binary verification gate), incident store fsync (f.Sync() before close on both incident persistence and audit log writes), GPU backend metadata recording (`/etc/secure-ai/gpu-backend.json` written at build time with backend/version/timestamp), llama-server watchdog (Type=notify wrapper with startup health gate + WatchdogSec=30 continuous monitoring), model catalog externalization (`/etc/secure-ai/model-catalog.yaml` with YAML loading + hardcoded fallback), circuit breaker for Python services (closed→open→half-open state machine protecting inter-service HTTP calls), post-upgrade model verification in Greenboot (SHA256 manifest check closes 15-min integrity gap), cosign key rotation documentation (full lifecycle: generation, rotation schedule, distribution, emergency revocation, HSM migration path). 402 Go + 739 Python tests (1,141 total). |
 | Signed-first install path | Implemented | M49 | Signed bootstrap script (`secai-bootstrap.sh`) configures container signing policy (policy.json + registries.d + cosign public key) before first rebase — eliminates unverified transport from production install path. Digest-pinned install flow (CI publishes image digest in build summary and release assets). First-boot setup wizard (interactive verification of image integrity, transport, vault setup, TPM2 sealing, health check). Signing policy files baked into OS image (`/etc/pki/containers/secai-cosign.pub`, `/etc/containers/registries.d/secai-os.yaml`, policy.json merge in build script). Recovery/dev bootstrap path separated into dedicated doc with clear warnings. |
 | Production operations package | Implemented | M50 | Backup script (`secai-backup.sh`) with full/config/logs/keys categories, age/gpg encryption, internal SHA256 manifest, LUKS header backup. Restore script (`secai-restore.sh`) with integrity verification, staging extraction, double-confirmation LUKS header restore, post-restore health check. Production operations doc extended with rollback decision matrix (Greenboot auto-rollback triggers + manual criteria), 5 break-glass recovery procedures (token loss, attestation failure, Level 1 panic lockout, signing policy break, Greenboot exhaustion), formal data retention policy (7 data classes with retention periods, disk capacity thresholds at 70/80/90/95%). |
+| Stronger observability | Implemented | M51 | Unified appliance health dashboard (trusted/degraded/recovery_required state derived from runtime attestor + integrity monitor + incident recorder). Live SLO compliance monitoring (in-process tracker measuring uptime % and P95 latency against docs/slos.md targets, 7-day rolling window). Webhook alerting hooks for containment events (fire-and-forget POST with retry, configurable per-event-type filtering in incident-containment.yaml). Forensic bundle export wired to HTTP mux (was implemented but unregistered), enriched with real audit log entries and policy digest, accessible via UI download button, Flask proxy, and CLI script (`secai-forensic.sh`). Recovery ceremony endpoints also wired (ack, reattest, status). |
 
 ---
 
 
@@ -77,3 +77,17 @@ rules:
       - force_vault_relock
       - log_alert
     default_severity: critical
+
+# Alerting — fire-and-forget webhooks on containment events.
+# Configure webhook URLs to receive JSON alert payloads when incidents
+# trigger containment actions.  Each entry specifies a URL and which
+# event types to forward.  Leave events empty to receive all event types.
+#
+# Supported event types: containment, escalation, recovery
+alerting:
+  webhooks: []
+  # Example:
+  # - url: "http://127.0.0.1:9090/api/alerts"
+  #   events: ["containment", "escalation"]
+  # - url: "https://hooks.example.com/secai"
+  #   events: []  # receive all events
@@ -0,0 +1,189 @@
+#!/usr/bin/env bash
+#
+# SecAI OS — Forensic Bundle Export/Verify (M51)
+#
+# Exports a signed forensic bundle from the incident recorder, or
+# verifies the integrity of a previously exported bundle.
+#
+# Usage:
+#   secai-forensic export  [--output FILE]   Export a signed forensic bundle
+#   secai-forensic verify  <FILE>            Verify bundle hash integrity
+#   secai-forensic --help                    Show help
+#
+set -euo pipefail
+
+INCIDENT_RECORDER_URL="${INCIDENT_RECORDER_URL:-http://127.0.0.1:8515}"
+SERVICE_TOKEN_PATH="${SERVICE_TOKEN_PATH:-/run/secure-ai/service-token}"
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+NC='\033[0m'
+
+info()  { echo -e "${GREEN}[INFO]${NC}  $*"; }
+warn()  { echo -e "${YELLOW}[WARN]${NC}  $*"; }
+err()   { echo -e "${RED}[ERROR]${NC} $*" >&2; }
+
+usage() {
+    cat <<'EOF'
+secai-forensic — Forensic bundle export and verification
+
+Usage:
+  secai-forensic export [--output FILE]   Export a signed forensic bundle
+  secai-forensic verify <FILE>            Verify bundle hash integrity
+  secai-forensic --help                   Show this help
+
+The export subcommand downloads a signed forensic bundle from the local
+incident recorder service.  The bundle contains all incidents, audit log
+entries, system state, and a policy digest, signed with HMAC-SHA256.
+
+The verify subcommand recomputes the bundle hash and checks it against
+the stored hash to detect tampering.
+
+Environment:
+  INCIDENT_RECORDER_URL   (default: http://127.0.0.1:8515)
+  SERVICE_TOKEN_PATH      (default: /run/secure-ai/service-token)
+EOF
+    exit 0
+}
+
+# ---------------------------------------------------------------------------
+# Export
+# ---------------------------------------------------------------------------
+cmd_export() {
+    local output="${1:-}"
+    if [[ -z "$output" ]]; then
+        output="forensic-bundle-$(date -u +%Y%m%d-%H%M%S).json"
+    fi
+
+    # Read service token if available
+    local auth_args=()
+    if [[ -f "$SERVICE_TOKEN_PATH" ]]; then
+        local token
+        token=$(cat "$SERVICE_TOKEN_PATH")
+        auth_args=(-H "Authorization: Bearer ${token}")
+    else
+        warn "Service token not found at ${SERVICE_TOKEN_PATH} — trying without auth"
+    fi
+
+    info "Exporting forensic bundle from ${INCIDENT_RECORDER_URL}..."
+
+    local http_code
+    http_code=$(curl -sf -w "%{http_code}" \
+        "${auth_args[@]+"${auth_args[@]}"}" \
+        "${INCIDENT_RECORDER_URL}/api/v1/forensic/export" \
+        -o "$output" 2>/dev/null) || true
+
+    if [[ ! -f "$output" ]] || [[ ! -s "$output" ]]; then
+        err "Export failed (HTTP ${http_code:-unknown}). Is the incident recorder running?"
+        rm -f "$output"
+        exit 1
+    fi
+
+    # Show summary
+    local size
+    size=$(wc -c < "$output" | tr -d ' ')
+    info "Exported: ${output} (${size} bytes)"
+
+    # Extract and show bundle hash
+    if command -v python3 &>/dev/null; then
+        python3 -c "
+import json, sys
+try:
+    b = json.load(open('${output}'))
+    print('Bundle hash:  ' + b.get('bundle_hash', 'N/A'))
+    print('Exported at:  ' + b.get('exported_at', 'N/A'))
+    print('Incidents:    ' + str(len(b.get('incidents', []))))
+    print('Audit lines:  ' + str(len(b.get('audit_entries', []))))
+    print('Signed:       ' + ('yes' if b.get('signature') else 'no'))
+except Exception as e:
+    print('Could not parse bundle: ' + str(e), file=sys.stderr)
+"
+    fi
+}
+
+# ---------------------------------------------------------------------------
+# Verify
+# ---------------------------------------------------------------------------
+cmd_verify() {
+    local file="$1"
+    if [[ ! -f "$file" ]]; then
+        err "File not found: ${file}"
+        exit 1
+    fi
+
+    if ! command -v python3 &>/dev/null; then
+        err "python3 is required for bundle verification"
+        exit 1
+    fi
+
+    python3 -c "
+import json, hashlib, sys
+
+bundle = json.load(open('${file}'))
+
+# Recompute hash over content fields (same structure as Go ExportForensicBundle)
+hash_input = json.dumps({
+    'exported_at':   bundle['exported_at'],
+    'incidents':     bundle['incidents'],
+    'audit_entries': bundle['audit_entries'],
+    'system_state':  bundle['system_state'],
+    'policy_digest': bundle['policy_digest'],
+}, separators=(',', ':'), sort_keys=False).encode()
+
+computed = hashlib.sha256(hash_input).hexdigest()
+stored = bundle.get('bundle_hash', '')
+
+if stored == computed:
+    print('VERIFIED: Bundle hash matches.')
+    print('  Hash: ' + stored)
+    print('  Incidents: ' + str(len(bundle.get('incidents', []))))
+    print('  Exported at: ' + bundle.get('exported_at', 'N/A'))
+    sys.exit(0)
+else:
+    print('FAILED: Bundle hash mismatch — content may have been tampered.', file=sys.stderr)
+    print('  Expected: ' + stored, file=sys.stderr)
+    print('  Computed: ' + computed, file=sys.stderr)
+    sys.exit(1)
+"
+}
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+case "${1:-}" in
+    export)
+        shift
+        output=""
+        while [[ $# -gt 0 ]]; do
+            case "$1" in
+                --output)
+                    [[ $# -lt 2 ]] && { err "--output requires a filename"; exit 1; }
+                    output="$2"
+                    shift 2
+                    ;;
+                *)
+                    err "Unknown option: $1"
+                    usage
+                    ;;
+            esac
+        done
+        cmd_export "$output"
+        ;;
+    verify)
+        shift
+        [[ $# -lt 1 ]] && { err "verify requires a filename"; usage; }
+        cmd_verify "$1"
+        ;;
+    --help|-h)
+        usage
+        ;;
+    *)
+        err "Unknown command: ${1:-}"
+        echo ""
+        usage
+        ;;
+esac
@@ -0,0 +1,131 @@
+package main
+
+import (
+	"bytes"
+	"encoding/json"
+	"log"
+	"net/http"
+	"sync"
+	"time"
+)
+
+// =========================================================================
+// Alerting — fire-and-forget webhooks on containment/escalation events
+// =========================================================================
+
+// AlertingConfig holds webhook configuration loaded from the containment policy.
+type AlertingConfig struct {
+	Webhooks []WebhookTarget `yaml:"webhooks" json:"webhooks"`
+}
+
+// WebhookTarget defines a single webhook endpoint.
+type WebhookTarget struct {
+	URL    string   `yaml:"url" json:"url"`
+	Events []string `yaml:"events" json:"events"` // "containment", "escalation", "recovery"
+}
+
+// AlertPayload is the JSON body sent to webhook endpoints.
+type AlertPayload struct {
+	Event     string   `json:"event"`
+	Timestamp string   `json:"timestamp"`
+	Incident  Incident `json:"incident"`
+	Actions   []string `json:"actions,omitempty"`
+	Severity  string   `json:"severity"`
+	Source    string   `json:"source"`
+}
+
+var (
+	alertingCfg   AlertingConfig
+	alertingCfgMu sync.RWMutex
+)
+
+func getAlertingConfig() AlertingConfig {
+	alertingCfgMu.RLock()
+	defer alertingCfgMu.RUnlock()
+	return alertingCfg
+}
+
+func setAlertingConfig(cfg AlertingConfig) {
+	alertingCfgMu.Lock()
+	defer alertingCfgMu.Unlock()
+	alertingCfg = cfg
+}
+
+// fireWebhooks dispatches alert payloads to all configured webhook URLs
+// matching the given event type.  Fire-and-forget with one retry.
+func fireWebhooks(event string, inc Incident, actions []string) {
+	cfg := getAlertingConfig()
+	if len(cfg.Webhooks) == 0 {
+		return
+	}
+
+	payload := AlertPayload{
+		Event:     event,
+		Timestamp: time.Now().UTC().Format(time.RFC3339),
+		Incident:  inc,
+		Actions:   actions,
+		Severity:  string(inc.Severity),
+		Source:    "incident-recorder",
+	}
+
+	body, err := json.Marshal(payload)
+	if err != nil {
+		log.Printf("alerting: failed to marshal payload: %v", err)
+		return
+	}
+
+	for _, wh := range cfg.Webhooks {
+		if !matchesEvent(wh.Events, event) {
+			continue
+		}
+		go sendWebhook(wh.URL, body)
+	}
+}
+
+// matchesEvent returns true if the event list is empty (match all) or
+// contains the given event string.
+func matchesEvent(events []string, event string) bool {
+	if len(events) == 0 {
+		return true // empty filter = match all events
+	}
+	for _, e := range events {
+		if e == event {
+			return true
+		}
+	}
+	return false
+}
+
+// sendWebhook POSTs the JSON body to the given URL.
+// Retries once after 1 second on failure.  5-second timeout per attempt.
+func sendWebhook(url string, body []byte) {
+	client := &http.Client{Timeout: 5 * time.Second}
+	for attempt := 0; attempt < 2; attempt++ {
+		req, err := http.NewRequest(http.MethodPost, url, bytes.NewReader(body))
+		if err != nil {
+			log.Printf("alerting: cannot create request for %s: %v", url, err)
+			return
+		}
+		req.Header.Set("Content-Type", "application/json")
+		req.Header.Set("User-Agent", "SecAI-Incident-Recorder/1.0")
+
+		resp, err := client.Do(req)
+		if err != nil {
+			log.Printf("alerting: POST to %s failed (attempt %d): %v", url, attempt+1, err)
+			if attempt == 0 {
+				time.Sleep(1 * time.Second)
+				continue
+			}
+			return
+		}
+		resp.Body.Close()
+		if resp.StatusCode >= 200 && resp.StatusCode < 300 {
+			log.Printf("alerting: webhook delivered to %s (status %d)", url, resp.StatusCode)
+			return
+		}
+		log.Printf("alerting: webhook to %s returned status %d (attempt %d)", url, resp.StatusCode, attempt+1)
+		if attempt == 0 {
+			time.Sleep(1 * time.Second)
+		}
+	}
+}