You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
M48: Production hardening — 8 fixes for operational gaps
Build script fail-closed: replace 10+ `|| echo WARNING` patterns with
fatal errors for all 12 required services, add final binary verification
gate. GPU backend metadata now written to /etc/secure-ai/gpu-backend.json
at build time.
Incident store durability: add f.Sync() between Flush() and Close() in
both persistIncidents() and writeAudit() to survive power loss.
Llama-server crash recovery: Type=notify wrapper with startup health gate
and WatchdogSec=30 for continuous hung-process detection.
Model catalog externalization: /etc/secure-ai/model-catalog.yaml loaded
at startup with hardcoded fallback for graceful degradation.
Circuit breaker: closed→open→half-open state machine for inter-service
HTTP calls, integrated into UI /api/status endpoint.
Greenboot model verification: SHA256 manifest check at boot closes the
15-minute gap between upgrade and periodic integrity scan.
Key rotation docs: cosign key lifecycle expanded from 4 lines to full
procedure (generation, rotation schedule, distribution, emergency
revocation, audit checklist, HSM migration path).
402 Go + 739 Python = 1,141 total tests (24 new).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
See [docs/threat-model.md](docs/threat-model.md) for threat classes, residual risks, and security invariants. See [docs/security-status.md](docs/security-status.md) for implementation status of all 47 milestones.
159
+
See [docs/threat-model.md](docs/threat-model.md) for threat classes, residual risks, and security invariants. See [docs/security-status.md](docs/security-status.md) for implementation status of all 48 milestones.
160
160
161
161
### Verify Image Signatures
162
162
@@ -218,8 +218,8 @@ All CI jobs are defined in [`.github/workflows/ci.yml`](.github/workflows/ci.yml
218
218
219
219
| Job | Workflow Link | What It Proves |
220
220
|-----|--------------|---------------|
221
-
|`go-build-and-test`|[View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml)|399 Go tests across 9 services with `-race` (build, test, vet) |
222
-
|`python-test`|[View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml)|718 Python tests (unit/integration + adversarial/acceptance), ruff lint, bandit security scan (enforced on HIGH/HIGH), mypy type checking |
221
+
|`go-build-and-test`|[View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml)|402 Go tests across 9 services with `-race` (build, test, vet) |
222
+
|`python-test`|[View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml)|739 Python tests (unit/integration + adversarial/acceptance), ruff lint, bandit security scan (enforced on HIGH/HIGH), mypy type checking |
|`supply-chain-verify`|[View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml)| SBOM generation via Syft, cosign availability, provenance keywords in release/build workflows |
225
225
|`test-count-check`|[View job](https://github.com/SecAI-Hub/SecAI_OS/actions/workflows/ci.yml)| Prevents documented test counts from drifting below actual (source of truth: [test-counts.json](docs/test-counts.json)) |
@@ -239,8 +239,8 @@ All CI jobs are defined in [`.github/workflows/ci.yml`](.github/workflows/ci.yml
|[API Reference](docs/api.md)| HTTP API for all services |
241
241
|[Policy Schema](docs/policy-schema.md)| Full policy.yaml schema reference |
242
-
|[Security Status](docs/security-status.md)| Implementation status of all 47 milestones |
243
-
|[Test Matrix](docs/test-matrix.md)| Test coverage: 1,117 tests across Go and Python (see [test-counts.json](docs/test-counts.json)) |
242
+
|[Security Status](docs/security-status.md)| Implementation status of all 48 milestones |
243
+
|[Test Matrix](docs/test-matrix.md)| Test coverage: 1,141 tests across Go and Python (see [test-counts.json](docs/test-counts.json)) |
244
244
|[Compatibility Matrix](docs/compatibility-matrix.md)| GPU, VM, and hardware support |
245
245
|[Security Test Matrix](docs/security-test-matrix.md)| Security feature test coverage |
246
246
|[FAQ](docs/faq.md)| Common questions |
@@ -426,6 +426,7 @@ See [docs/test-matrix.md](docs/test-matrix.md) for full breakdown.
426
426
-[x]**Milestone 45** -- Production readiness hardening: incident persistence (file-backed), graceful shutdown for all Go services, HTTP timeouts, systemd production hardening, first-boot validation, audit log rotation, CI vulnerability scanning, production operations guide
427
427
-[x]**Milestone 46** -- Operational maturity: bootstrap trust gap fix (cosign verify before rebase), CI runs on all changes (removed paths-ignore for .md), Python quality gates (ruff + bandit + split test suites), docs-validation CI job, production-readiness checklist, SLOs, release channel policy, support lifecycle, sample verification output
428
428
-[x]**Milestone 47** -- CI enforcement hardening: enforced vulnerability scanning (govulncheck + pip-audit + bandit fail on HIGH/HIGH) with waiver mechanism, mypy type checking for security-sensitive services, pinned reproducible Python CI dependencies, Go 1.23→1.25 (12 stdlib CVE fixes), verification-first bootstrap docs
429
+
-[x]**Milestone 48** -- Production hardening: build script fail-closed (fatal errors for 12 required services + binary verification gate), incident store fsync (crash-safe persistence), GPU backend metadata recording, llama-server watchdog (Type=notify + WatchdogSec=30), model catalog externalization (YAML with fallback), circuit breaker for inter-service HTTP calls, post-upgrade model verification in Greenboot, cosign key rotation documentation (full lifecycle)
429
430
430
431
</details>
431
432
@@ -457,7 +458,7 @@ services/
457
458
search-mediator/ Python -- Tor-routed web search (:8485)
Copy file name to clipboardExpand all lines: docs/security-status.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Security Implementation Status
2
2
3
-
This document is split into two sections. The first section covers **Security Assurance Controls** -- all implemented milestones (M0 through M47) that satisfy the M5 security assurance acceptance criteria. Every control listed there is complete and tested. The second section is the **Product Feature Roadmap**, which tracks planned product capabilities (Agent Mode Phases 2 and 3). These are product enhancements, not security assurance requirements; the M5 security posture is fully met without them.
3
+
This document is split into two sections. The first section covers **Security Assurance Controls** -- all implemented milestones (M0 through M48) that satisfy the M5 security assurance acceptance criteria. Every control listed there is complete and tested. The second section is the **Product Feature Roadmap**, which tracks planned product capabilities (Agent Mode Phases 2 and 3). These are product enhancements, not security assurance requirements; the M5 security posture is fully met without them.
4
4
5
5
Last updated: 2026-03-14
6
6
@@ -60,6 +60,7 @@ All M5 security assurance criteria are met. The controls below have been impleme
60
60
| Production readiness hardening | Implemented | M45 | Incident recorder file-backed persistence (survives restarts), graceful shutdown (SIGTERM/SIGINT with connection draining) for all 9 Go services, HTTP server timeouts for mcp-firewall and gpu-integrity-watch, systemd production hardening (TimeoutStartSec, TimeoutStopSec, StartLimitInterval, StartLimitBurst) for all 12 daemon units, first-boot health validation script, audit log rotation via logrotate, CI dependency vulnerability scanning (govulncheck + pip-audit), production operations guide (upgrade, key rotation, capacity limits, monitoring) |
61
61
| Operational maturity | Implemented | M46 | Bootstrap trust gap fix (cosign verify before unverified rebase, documented trust gap rationale), CI runs on all changes (removed blanket paths-ignore for .md files), Python quality gates (ruff lint + bandit security scan + split test suites into unit/integration and adversarial/acceptance), docs-validation CI job (broken link detection, required docs check, test-counts.json validation), production-readiness checklist (formal release gate), SLOs (availability/latency/correctness targets + alerting thresholds), release channel policy (stable/candidate/dev + versioning + upgrade paths + security patch SLA), support lifecycle (hardware matrix, driver versions, support windows, deprecation policy, scope boundaries), CI evidence table with all 10 job descriptions and workflow links, sample verification output for verify-release.sh |
62
62
| CI enforcement hardening | Implemented | M47 | Enforced vulnerability scanning: bandit fails CI on HIGH-severity/HIGH-confidence findings, govulncheck fails on unwaived Go vulns, pip-audit fails on unwaived Python vulns. Waiver mechanism (`.github/vuln-waivers.json`) with mandatory expiry dates for reviewed/accepted findings. mypy type checking gate for security-sensitive services (common, agent, quarantine, ui). Pinned reproducible Python CI dependencies (`requirements-ci.txt`). Go 1.23→1.25 upgrade fixing 12 stdlib CVEs (crypto/tls, crypto/x509, encoding/asn1, net/url, os). Flask 3.1.1→3.1.3 (GHSA-68rp-wp8r-4726). Verification-first bootstrap documentation (signed rebase as default quickstart, unverified bootstrap moved to labeled recovery section). |
63
+
| Production hardening | Implemented | M48 | Build script fail-closed (all `|| echo WARNING` fallbacks replaced with fatal errors for 12 required services, final binary verification gate), incident store fsync (f.Sync() before close on both incident persistence and audit log writes), GPU backend metadata recording (`/etc/secure-ai/gpu-backend.json` written at build time with backend/version/timestamp), llama-server watchdog (Type=notify wrapper with startup health gate + WatchdogSec=30 continuous monitoring), model catalog externalization (`/etc/secure-ai/model-catalog.yaml` with YAML loading + hardcoded fallback), circuit breaker for Python services (closed→open→half-open state machine protecting inter-service HTTP calls), post-upgrade model verification in Greenboot (SHA256 manifest check closes 15-min integrity gap), cosign key rotation documentation (full lifecycle: generation, rotation schedule, distribution, emergency revocation, HSM migration path). 402 Go + 739 Python tests (1,141 total). |
0 commit comments