Guardian Agent — Full System Analysis

A deep-dive analysis across performance, security, compliance, and hardening for the Guardian Agent codebase (as of current state).

Executive Summary

Area	Status	Summary
Performance	✅ Strong	Async Rust, benchmarks, 10k+ req/s target; some sync locks in middleware.
Security	✅ Good, gaps	Auth (JWT/API key), MFA, headers, sanitization; no native TLS, no per-route RBAC.
Compliance	✅ Good, stubs	Retention, PII, encryption, audit, reporting; forensics/analytics/chain-of-custody are stubs.
Hardening	✅ Good	Rate limit, lockout, session, backup, CORS; TLS via proxy only.

1. What We Have

1.1 Core Runtime & API

Binary: guardian-server (Rust, Axum, tokio).
Config: YAML (guardian.yaml) + env (GUARDIAN_CONFIG, GUARDIAN_JWT_SECRET, GUARDIAN_DEBUG, GUARDIAN_SIGNING_KEY, PORT, RUST_LOG).
Routes: 50+ endpoints — health, validate, capability, metrics, policies, retention, legal-hold, PII, encryption, RBAC users, compliance, forensics, tenants, chain-of-custody, regulatory, analytics, MCP, admin (audit, backup), MFA (setup/verify/disable).
Graceful shutdown: axum::serve(...).with_graceful_shutdown(shutdown_signal()) on Ctrl+C.

1.2 Performance

Async stack: Tokio, Axum, async/await throughout agent and server.
Benchmarks: benches/throughput.rs, benches/latency.rs, benches/memory.rs, benches/startup.rs (Criterion).
Release profile: opt-level = 3, LTO, codegen-units = 1, panic = "abort".
Caching: Config has cache (TTL, max_size); policy validator uses in-memory policy bundle.
Logging: Append-only file, optional rotation by size; signing and optional encryption on write.
Observability: /metrics, /health, /health/live, /health/ready; tracing with configurable level and JSON format.

1.3 Security

Authentication: Optional; when enabled: JWT (Bearer) or API key (x-api-key). Public paths configurable (auth.public_paths).
MFA: TOTP (totp-rs); MfaStore; routes /auth/mfa/setup, /auth/mfa/verify, /auth/mfa/disable; auth middleware requires X-MFA-Code when user.mfa_enabled.
Secrets: Env-based 32-byte keys (secrets::load_key_from_env, load_secret_from_env); signing key and JWT secret from env in production.
Security headers: Middleware sets HSTS, X-Content-Type-Options, X-Frame-Options, CSP, X-XSS-Protection.
Error sanitization: GuardianError::sanitize_for_client(); detailed messages only when GUARDIAN_DEBUG=1|true.
Input validation: sanitize_path (no ..), validate_id, validate_framework, truncate_safe; used on path params and bodies where applicable.
Request size: Middleware enforces max body (e.g. 10MB) via Content-Length.
CORS: Configurable cors.allowed_origins; empty = same-origin only.

1.4 Hardening

Rate limiting: Per-key token bucket (RateLimitStore); key = X-Forwarded-For / X-Real-IP or "unknown"; 429 + Retry-After and X-RateLimit headers.
Account lockout: After N failed auth attempts, key locked for M minutes (LockoutStore).
Session management: Optional per-user sessions with TTL and max concurrent (SessionStore); key = user_id, value = (client_key, last_activity).
Backup: BackupConfig (enabled, cron, local_path, optional s3/azure); BackupManager copies log (+ optional config) to timestamped dir; cron job in main; POST/GET /admin/backup and /admin/backup/status. Cloud upload stubbed (warn only).
Audit: In-memory ring buffer (AuditLog); admin operations record actor, operation, target, success; GET /admin/audit with filters.

1.5 Compliance & Data Governance

Retention: RetentionManager — cron-driven; apply policy (retention_days, archive_after_days, deletion, legal_hold); archive to local dir; legal hold list; stats and API.
PII: PIIDetector (regex-based SSN, email, credit card, etc.); redaction (mask/hash/remove/replace); agent redacts action/verdict/metadata on log write when enabled.
Encryption: EncryptionManager — AES-256-GCM; key sources: Local file, Env, HashiCorp Vault (HTTP), AWS KMS (feature, stub), Azure KV (feature commented out); agent can encrypt log lines; key rotation stub.
RBAC: AccessControl — users, roles (Admin, Auditor, Compliance, Operator, Viewer), permissions; list/add/update/delete users; access log; agent.check_permission(user_id, permission) exists.
Compliance reporting: ComplianceReporter — SOC2, HIPAA, GDPR, PCI-DSS, ISO27001; requirement definitions; generate report for period; GET requirements by framework.
Legal hold: Retention respects legal hold list; add/remove/list via API.
Multi-tenancy: TenantManager — CRUD tenants; per-tenant retention, PII config, access control (in-memory).
Chain of custody: Types (timestamp token, verification record, custody record); manager returns structure (implementation is placeholder).
Regulatory mapping: RegulatoryMapper — gap analysis types; map_to_framework returns empty mappings (TODO).
Analytics: AnalyticsEngine — anomaly and risk score types; detect_anomalies and calculate_risk_score return empty (TODO).
Forensics: ForensicEngine — query, timeline, correlate; all return empty (TODO); index not built from logs.
MCP: Protocol types and parsing; monitor endpoint; config/stats (runtime only, not persisted).

1.6 Infrastructure & CI

CI: rust-ci.yml — test (stable/beta/nightly), build release, coverage (llvm-cov, Codecov), fmt, clippy, multi-target build (linux/musl/darwin/windows), cargo audit, Docker build/push (distroless, multi-arch).
Deploy: Dockerfile(s), Helm chart (helm/guardian-agent), examples (docker-compose, k8s sidecar, systemd).
Docs: Compliance, deployment, monitoring, MCP, sidecar, performance impact.

2. What We Might Not Have (Gaps & Risks)

2.1 Performance

Sync locks in hot path: RateLimitStore uses std::sync::Mutex, LockoutStore/SessionStore use std::sync::RwLock. Under high concurrency these can block the async runtime. Prefer tokio::sync::RwLock/Mutex or dedicated async-friendly structures.
No HTTP/2: Axum/hyper can support it; not explicitly enabled; helps multiplexing under load.
No connection/timeout tuning: Listen backlog, request timeout, body read timeout not explicitly configured.
Benchmarks vs server: Throughput/latency benches exercise GuardianAgent::validate_action in-process, not the full HTTP stack (middleware, auth, rate limit). Add server-level benchmarks for realistic numbers.
Log I/O: Logger uses std::fs::File + BufWriter behind a mutex; under very high write load this could be a bottleneck (e.g. batch or async write path).

2.2 Security

No native TLS: start_server returns an error if tls_config.enabled is true; TLS is not implemented in-process. Production must use a reverse proxy (nginx, Caddy, etc.) for HTTPS. mTLS (client_ca_path) is config but unused in code.
RBAC not enforced per route: Users and permissions are stored and check_permission exists, but no handler checks permission before acting. Any authenticated user can call any protected endpoint. Need per-route or per-handler permission checks (e.g. Admin-only for backup, ManageRBAC for user management).
API keys in config: auth.api_keys (map key → user_id) can be in YAML; if config is committed or wide-readable, keys leak. Prefer env or secrets manager for API keys.
MFA secret storage: TOTP secrets in MfaStore (in-memory HashMap); lost on restart. No persistence or encryption at rest for MFA secrets.
Session storage: In-memory; no shared session store across instances (sticky sessions or Redis needed for multi-instance).
Client key spoofing: Rate limit/lockout key is X-Forwarded-For / X-Real-IP. If proxy is not trusted, clients can spoof; ensure proxy strips/overwrites these.

2.3 Compliance & Correctness

Forensics: Query/timeline/correlate return empty; index not built from log files. No real search or timeline for investigations.
Analytics: Anomaly detection and risk scoring return empty; no real algorithms.
Regulatory mapping: Mappings are empty; no feature → requirement evidence.
Chain of custody: Structure only; no RFC 3161 TSA or real verification flow.
Retention archive: Archive is local directory only; no S3/Azure upload in retention (config has archive_location but implementation moves to local archive/).
Audit persistence: Audit log is in-memory ring buffer; lost on restart; no durable audit trail for strict compliance.
Tenant isolation: Tenants stored in memory; no per-request tenant context enforcement on log read/write (tenant_id in request/header and scoping of data).

2.4 Hardening & Operations

Backup cloud: S3/Azure upload in backup is stubbed (warn only); no real cloud backup.
TLS feature: tls feature exists in Cargo.toml but no rustls/native-tls wiring in server; config exists, code path errors out.
Azure features: azure-kv referenced in encryption but not declared in Cargo.toml (warnings); Azure Key Vault and Azure Blob backup not implemented.
Container scanning: CI has cargo audit; no Trivy (or similar) container image scan in the workflow.
No request timeout: No global or per-route request timeout; slow clients can hold connections.

2.5 Testing & Quality

Integration tests: Cover validation, retention, PII, RBAC, compliance, tenants, etc., but many assertions are permissive (e.g. assert!(x || !x)). Some tests may not fail when behavior regresses.
No auth/MFA e2e: No automated test that runs server with auth + MFA and checks 401/403 and MFA flow.
Benchmarks not in CI: Criterion benches are not run in CI; no regression tracking on throughput/latency.

3. Factor-by-Factor Deep Dive

3.1 Performance

Aspect	Have	Missing / Risk
Async I/O	Tokio, Axum, async agent	—
Concurrency	Multi-threaded runtime	Sync Mutex/RwLock in middleware
Throughput target	10k+ req/s (docs/benches)	Server-level bench not in CI
Latency	Criterion latency bench (in-process)	No p99, no server stack
Memory	Small binary, 5–20MB described	No max heap or RSS guard
Startup	50ms target	No startup bench in CI
Caching	Config cache, policy bundle	No JWT or validation result cache
Backpressure	Rate limit (token bucket)	No explicit connection/request limits

Recommendations: Replace sync primitives in middleware with async ones; add server-level benchmarks and run in CI; consider JWT caching and request/timeout limits.

3.2 Security

Aspect	Have	Missing / Risk
Authentication	JWT + API key, optional	API keys in config; no secret store for keys
MFA	TOTP, setup/verify/disable, middleware	MFA secrets volatile; no persistence
Secrets management	Env keys, Vault for encryption key	JWT/API keys not in Vault
TLS	Config + “use proxy” error	No in-process TLS
Headers	HSTS, CSP, X-Frame-Options, etc.	—
Error leakage	Sanitize unless GUARDIAN_DEBUG	—
Input validation	Path, ID, framework, size	Could extend to more JSON schemas
CORS	Configurable origins	—
Rate limiting	Per-IP/key token bucket	Key spoofing if proxy untrusted
Lockout	N failures → lock M min	—
Sessions	TTL, max per user	In-memory only
RBAC	Users, roles, permissions	Not enforced on routes

Recommendations: Enforce RBAC per endpoint; persist or encrypt MFA secrets; move API keys to env/Vault; add native TLS option (e.g. rustls) or document proxy-only TLS clearly.

3.3 Compliance

Aspect	Have	Missing / Risk
Retention	Policies, cron, delete/archive, legal hold	Archive only local
PII	Detect + redact on write	—
Encryption at rest	AES-256-GCM, multiple key sources	KMS/Azure stubs or unimplemented
Audit	In-memory admin audit	Not durable
Reporting	SOC2, HIPAA, GDPR, PCI-DSS, ISO27001	Report content is template/placeholder
Legal hold	List, add, remove; retention respects	—
Forensics	API and types	No real query/timeline/correlation
Chain of custody	Data structures	No TSA or verification
Regulatory mapping	Types, gap analysis	Empty mappings
Analytics	Anomaly/risk types	Empty results
Multi-tenancy	Tenant CRUD, per-tenant config	No request-scoped tenant enforcement

Recommendations: Implement forensics (read logs, index, query/timeline); persist audit to file or external store; implement retention archive to S3/Azure; add tenant context to requests and scope data access.

3.4 Hardening

Aspect	Have	Missing / Risk
Rate limiting	Yes, configurable	—
Lockout	Yes	—
Sessions	Yes, TTL + cap	In-memory
Backup	Local backup, cron, API	Cloud upload stubbed
Health	Live + ready	—
Graceful shutdown	Yes	—
CORS	Strict possible	—
Request size	Capped	—
No TLS in-process	Documented (use proxy)	No optional rustls build
CI	Tests, audit, multi-platform, Docker	No container scan; no bench in CI

Recommendations: Implement S3/Azure backup or document as future work; add Trivy (or similar) to CI; optionally add rustls and wire tls feature.

4. Summary Tables

4.1 Implemented vs Stub vs Missing

Component	Status	Notes
Policy validation (OPA)	Implemented	HTTP + fallback
Immutable logger	Implemented	Signing, optional encryption
Capability gate	Implemented	JWT capability tokens
Retention	Implemented	Cron, legal hold, local archive
PII	Implemented	Detect + redact on write
Encryption manager	Implemented	Local/Env/Vault; KMS stub
RBAC	Implemented	No per-route enforcement
Compliance reporter	Implemented	Reports + requirements
Audit log	Implemented	Volatile, in-memory
Backup	Implemented	Local + cron; cloud stubbed
MFA	Implemented	TOTP; secrets volatile
Forensics	Stub	Empty query/timeline/correlate
Analytics	Stub	Empty anomalies/risk
Regulatory mapping	Stub	Empty mappings
Chain of custody	Stub	Types only
MCP config/stats	Runtime only	Not persisted
Native TLS	Missing	Config exists; use proxy
Per-route RBAC	Missing	check_permission not used in server
Durable audit	Missing	In-memory only
Tenant-scoped access	Missing	No request tenant context

4.2 Risk Overview

Risk	Severity	Mitigation
No per-route RBAC	High	Add permission checks to admin/sensitive handlers
MFA secrets lost on restart	Medium	Persist (encrypted) or document limitation
Audit not durable	Medium	Write audit to file or external store
Sync locks in middleware	Medium	Switch to async locks or dedicated structures
API keys in config	Medium	Env or secrets manager only
Forensics/analytics stubs	Low	Implement or mark as future in docs
No native TLS	Low	Reverse proxy is acceptable; document clearly

5. Conclusion

Guardian Agent has a solid base: async Rust, broad API surface, auth, MFA, rate limiting, lockout, sessions, security headers, input validation, error sanitization, retention, PII, encryption, RBAC, compliance reporting, backup scheduling, and good CI (including cargo audit). The main gaps are: per-route RBAC enforcement, native TLS (or explicit proxy-only guidance), durable audit, persisted MFA secrets, forensics/analytics/regulatory implementations, and replacing sync locks in hot-path middleware for scalability. Addressing the high/medium items above would materially strengthen production readiness and compliance posture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guardian Agent — Full System Analysis

Executive Summary

1. What We Have

1.1 Core Runtime & API

1.2 Performance

1.3 Security

1.4 Hardening

1.5 Compliance & Data Governance

1.6 Infrastructure & CI

2. What We Might Not Have (Gaps & Risks)

2.1 Performance

2.2 Security

2.3 Compliance & Correctness

2.4 Hardening & Operations

2.5 Testing & Quality

3. Factor-by-Factor Deep Dive

3.1 Performance

3.2 Security

3.3 Compliance

3.4 Hardening

4. Summary Tables

4.1 Implemented vs Stub vs Missing

4.2 Risk Overview

5. Conclusion

FilesExpand file tree

FULL_SYSTEM_ANALYSIS.md

Latest commit

History

FULL_SYSTEM_ANALYSIS.md

File metadata and controls

Guardian Agent — Full System Analysis

Executive Summary

1. What We Have

1.1 Core Runtime & API

1.2 Performance

1.3 Security

1.4 Hardening

1.5 Compliance & Data Governance

1.6 Infrastructure & CI

2. What We Might Not Have (Gaps & Risks)

2.1 Performance

2.2 Security

2.3 Compliance & Correctness

2.4 Hardening & Operations

2.5 Testing & Quality

3. Factor-by-Factor Deep Dive

3.1 Performance

3.2 Security

3.3 Compliance

3.4 Hardening

4. Summary Tables

4.1 Implemented vs Stub vs Missing

4.2 Risk Overview

5. Conclusion