Project: AgentFlow Document date: 2026-04-18 Repository snapshot reviewed: 2026-04-18 Audience: engineering, security review, enterprise due diligence
AgentFlow exposes a public FastAPI surface for AI agents and tenant-owned integrations. The current repository shows a security posture centered on typed request validation, tenant-scoped access control, API-key authentication, rate limiting, PII masking on responses, SQL safety guards for NL-to-SQL, and CI-based dependency and image scanning.
The codebase is strongest at application-layer controls that can be validated directly in source: auth, authorization, request filtering, security headers, contract evolution, replay safety, and auditability of API usage. The weakest areas are the controls that typically require external infrastructure or third-party attestation. In this repository snapshot there is no evidence of an external penetration test or demonstrated generalized secrets manager integration. Local DuckDB files can be opened through an optional encrypted attach path when an operator supplies encryption key material, but the default remains backward-compatible and unencrypted.
External pen-test attestation status as of 2026-05-06: not present. Use
docs/operations/external-pen-test-attestation-handoff.md for the checklist
required before any third-party pen-test claim.
Automated posture channel (2026-06-05): the repository now publishes an
OpenSSF Scorecard result (.github/workflows/scorecard.yml) and carries a
prepared OpenSSF Best Practices self-assessment
(docs/operations/openssf-security-posture.md). Both are $0 posture signals —
an automated heuristic score and a maintainer self-certification respectively —
and are kept explicitly distinct from a third-party penetration test, which is
still not present and not claimed.
Threat model assumed by the current implementation:
- untrusted external callers using
X-API-Key - tenant isolation requirements across shared serving infrastructure
- AI agents issuing natural-language queries that must not escape allowed tables or mutate data
- operational abuse such as brute-force authentication attempts and burst traffic
The API uses tenant-bound API keys plus a separate admin secret for /v1/admin/*. API key material can be stored either as plaintext runtime values or as bcrypt hashes. The default security policy sets bcrypt rounds to 12, which is aligned with a modern password-hashing baseline for application secrets.
Rotation support is implemented in the auth layer. Keys have key_id, previous_key_hash, previous_key_active_until, and explicit grace-period behavior. Admin rotation endpoints expose create, rotate, rotation-status, and revoke-old flows. The auth middleware also records endpoint usage per tenant/key, which gives the system a concrete audit trail for key activity and key-slot transitions.
Authorization is layered on top of authentication:
- admin endpoints require
X-Admin-Key - entity access can be restricted per key through
allowed_entity_types - request context binds
tenant_idto the authenticated tenant - serving paths use that tenant context when querying tenant-scoped data
Evidence: src/serving/api/auth/manager.py, src/serving/api/auth/middleware.py, src/serving/api/auth/key_rotation.py, tests/integration/test_rotation.py
Tenant isolation is not only a naming convention in this codebase. The serving layer includes explicit tenant routing through TenantRouter, which maps tenant IDs to Kafka topic prefixes and DuckDB schema names. The SQL builder qualifies known tables with the tenant schema and fails closed when tenant-scoped tables exist but no tenant context is available.
This is stronger than soft application filtering because the query builder rewrites table references before execution. Integration tests show that the same logical order ID can resolve to different rows for different tenants and that cross-tenant lookups return 404 rather than leaking another tenant's data.
The current evidence supports the claim "tenant-scoped DuckDB schemas with fail-closed query resolution." It does not support broader claims such as end-to-end isolation across every external dependency.
Evidence: src/ingestion/tenant_router.py, src/serving/semantic_layer/query/sql_builder.py, src/serving/semantic_layer/query/engine.py, tests/integration/test_tenant_isolation.py
Typed validation is pervasive in the API surface. FastAPI request bodies and query parameters are defined with Pydantic models across agent, batch, alert, webhook, dead-letter, SLO, and contract endpoints. Validation constraints are used for lengths, enums, numeric ranges, and optional structures.
The ingestion schemas add cross-field semantics beyond shape validation. OrderEvent verifies that total_amount matches the sum of line items, payment timestamps are normalized to UTC and rejected if too far in the future, and product pricing rejects negative values. This matters because it prevents upstream data corruption from turning into trusted downstream state.
Schema contract evolution is implemented through a contract registry plus version-aware validation and diff endpoints. The API versioning layer also exposes deprecation metadata through headers and supports tenant-level version pins, which reduces the blast radius of backward-incompatible changes.
Evidence: src/ingestion/schemas/events.py, tests/unit/test_event_schemas.py, src/serving/semantic_layer/contract_registry.py, src/serving/api/routers/contracts.py, src/serving/api/versioning.py
The serving layer uses two complementary patterns for SQL safety.
First, the hot-path entity and metric lookups pass untrusted values as query parameters rather than interpolating them into SQL text. Injection-focused unit tests assert that payloads such as '; DROP TABLE ... stay in parameter arrays and never appear in the generated SQL.
Second, the NL-to-SQL surface validates translated SQL with sqlglot. The validator only permits a single SELECT statement, rejects DDL and DML node types, and rejects unknown tables outside the allowlist and CTE names. Tenant scoping is then applied through AST-aware rewriting in _scope_sql, which is materially safer than regex replacement.
The repository also documents intentional # nosec B608 suppressions only on trusted identifier paths where identifiers come from internal catalog/config allowlists rather than user-controlled input.
Audit finding A-4 flagged the dynamic-SQL surface as "one careless edit away from a hole". Every # nosec B608 suppression in src/ was reviewed and is safe by construction:
- Interpolated identifiers come from a fixed in-code allowlist, the semantic catalog, trusted backend config, live schema introspection (
PRAGMA table_info),_IDENTIFIER_RE, orsqlglotvalidation. - Interpolated values are either bound as
?parameters, fixed literals, integers, or regex-extracted tokens that exclude SQL metacharacters and are additionally quoted via_quote_literal/_sql_str_literal.
No site interpolates unbound, unquoted request data. There are no Class-A (migratable value-interpolation) sites remaining: the hot entity/metric paths already bind values (use_query_params on DuckDB, _quote_literal elsewhere) and the operational routers already pass values through ?. The remaining suppressions are Class-B (identifiers / structural fragments that cannot be parameterized).
The number of suppressions per file is pinned by test_interpolated_sql_nosec_surface_is_pinned — a new site (even inside an already-listed file) or a new file fails CI and forces a review. Each suppression's per-line rationale comment is enforced by test_nosec_comments_carry_reason.
| Site | Interpolated | Why safe (Class B — safe by construction) |
|---|---|---|
semantic_layer/nl_engine.py:110,163 |
oid, uid (values) |
regex ORD-[\w-]+ / USR-\d+ exclude quotes; additionally quoted via _sql_str_literal; covered by test_nl_engine_injection.py |
semantic_layer/nl_engine.py:116,126,149 |
window (value) |
_extract_window numeric allowlist (\d+ <unit> or constant); quoted via _sql_str_literal |
semantic_layer/nl_engine.py:139 |
limit (value) |
parsed int() from the question text |
api/routers/lineage.py:103 |
select_columns, time_column (identifiers) |
column names are in-code literals gated by PRAGMA table_info; values bound as ? |
api/routers/slo.py:109,123,143,160 |
time_column (identifier) |
fixed allowlist processed_at/created_at; window/tenant bound via CAST(? AS INTERVAL) / ? |
api/routers/stream.py:45 |
select_columns (identifiers) |
in-code literals gated by PRAGMA table_info; filters bound as ? |
api/webhook_dispatcher.py:315 |
order_by (identifier) |
fixed allowlist processed_at/created_at/event_id; tenant bound as ? |
backends/clickhouse_backend.py:293,306,326,344,359,375 |
self._database + fixed table names |
database from trusted backend config; table names and VALUES are in-code literals (demo seed) |
backends/duckdb_backend.py:94 |
table_name (identifier) |
guarded by _IDENTIFIER_RE.match (bare identifier or schema.identifier) |
backends/duckdb_backend.py:116 |
sql (full statement) |
sqlglot.parse must yield exactly one exp.Select before EXPLAIN |
semantic_layer/query/entity_queries.py:31,133,196 |
table / primary_key / entity_type | identifiers via _quote_identifier from catalog; values via _quote_literal or ? (use_query_params on DuckDB) |
semantic_layer/query/nl_queries.py:133,138 |
sql subquery, limit/offset |
sql prevalidated by validate_nl_sql (sqlglot single SELECT); limit bounded 1..1000, offset decoded int |
semantic_layer/search_index.py:152 |
entity.table (identifier) |
table name from the semantic catalog EntityDefinition, not request data |
orchestration/dags/daily_batch.py:148 |
table (identifier) |
iterates a fixed in-code health-check list |
Evidence: src/serving/semantic_layer/sql_guard.py, src/serving/semantic_layer/query/sql_builder.py, src/serving/semantic_layer/nl_engine.py, tests/unit/test_query_engine_injection.py, tests/unit/test_sql_guard.py, tests/unit/test_nl_engine_injection.py, tests/unit/test_security_tooling_policy.py
Rate limiting exists at two levels:
- per-key request quotas enforced through
RateLimiter - per-IP throttling for repeated failed authentication attempts
RateLimiter uses Redis when available and falls back to an in-memory sliding window when Redis is unavailable or intentionally disabled. The auth middleware returns X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After headers, so clients can adapt to the policy rather than blindly retrying.
The SDKs also include resilience primitives, specifically retry policy handling and a circuit breaker. This is not a server-side abuse control by itself, but it does reduce retry storms and repeated hammering of degraded endpoints by well-behaved clients.
Evidence: src/serving/api/rate_limiter.py, src/serving/api/auth/middleware.py, sdk/agentflow/client.py, sdk-ts/src/client.ts, tests/unit/test_sdk_circuit_breaker.py
Response-side PII masking is implemented in PiiMasker and applied on entity responses and NL-query results. Masking behavior is configured through config/pii_fields.yaml, supports multiple strategies (partial, full, hash), and allows explicit tenant exemptions for internal tenants. When masking is applied, the API sets X-PII-Masked: true.
Security headers are applied centrally and include:
Strict-Transport-SecurityContent-Security-PolicyX-Frame-OptionsX-Content-Type-OptionsReferrer-Policy
These controls improve baseline browser-facing hardening for docs/admin surfaces. TLS termination is intentionally delegated to an upstream edge or ingress layer; the FastAPI application applies HTTP-layer security controls behind that boundary.
Evidence: src/serving/masking.py, src/serving/api/routers/agent_query.py, src/serving/api/security.py, tests/unit/test_masking.py
The repository includes a dedicated security workflow in GitHub Actions:
- Bandit for Python static analysis with a tracked baseline diff
- Safety for dependency vulnerability scanning
- Trivy for container image scanning and CycloneDX SBOM generation
The Bandit baseline currently records a historical B310 finding in src/serving/backends/clickhouse_backend.py. SQL construction findings are not globally suppressed; reviewed identifier construction is handled through narrow suppressions and tests.
Helm defaults no longer embed production-shaped API-key verifier hashes. Operators can render a chart-managed Secret for local use or mount an existing Kubernetes Secret, which is friendlier to External Secrets Operator, Sealed Secrets, or equivalent workflows.
Evidence: .github/workflows/security.yml, .bandit, .bandit-baseline.json, docs/helm-deployment.md
Operationally, the repo shows several useful security-facing controls:
- API usage is written to
api_usagewith tenant, key name, endpoint, key ID, and key slot - admin analytics endpoints can inspect usage, anomalies, latency, and top entities/queries
- the runbook includes response procedures for API unavailability, pipeline lag, dead letters, webhook failures, alert storms, and stuck key rotation
This provides a credible audit and incident-response starting point for a small team. It is notably better than a pure demo API with no usage telemetry.
What is not evidenced in this repository snapshot:
- generalized secrets management through AWS Secrets Manager or another external vault
- automated rotation for non-API-key secrets
- externally immutable audit retention or SIEM export
The API usage path can optionally publish hash-chained JSONL records through AGENTFLOW_AUDIT_LOG_PATH in addition to DuckDB analytics. This is useful local evidence that DuckDB analytics are not the only audit path, but object-lock retention, SIEM delivery, and external immutability still need operator evidence outside the repository. Use docs/operations/immutable-retention-evidence-handoff.md before making any external immutable-retention claim.
Because the external controls are not provable from the checked-in code, they should not be claimed in customer-facing security questionnaires without additional infrastructure evidence.
Evidence: src/serving/api/auth/middleware.py, src/serving/api/analytics.py, docs/runbook.md
The current implementation has several material limitations:
- No external penetration test evidence is present in the repository.
- DuckDB encryption is optional and operator-configured; the default file path remains unencrypted for backward compatibility, and DuckDB encryption is not a compliance attestation by itself.
- Secrets management appears environment- and chart-driven in this snapshot; a managed secret store is not demonstrated.
- The security pipeline is strong at code scanning and SBOM generation, but a real signed container-release run is still evidence-pending until CI signs a published image digest.
- The demo-data initialization path is convenient for development, but it increases the importance of strict environment separation between demo and production deployments.
- Browser-oriented security headers exist, and request body size enforcement is applied from
SecurityPolicy.request_size_limit_bytes.
Partial readiness. The repo supports response-time PII masking, tenant scoping, and usage auditing. That helps with least-privilege data exposure and auditability. However, GDPR readiness is incomplete without documented data retention policies, deletion workflows, subject access procedures, and infrastructure evidence for storage/backup handling.
Partial readiness. Access control, audit logging, CI security scans, and operational runbooks are present. Missing evidence includes change-management controls outside git/CI, vendor management, centralized secrets management, formal incident program artifacts, and third-party audit evidence.
Not ready based on the reviewed repository. Optional DuckDB at-rest encryption does not supply the broader administrative, contractual, audit-retention, or external-assessment controls HIPAA would require.
For an engineering-led v1 product, AgentFlow shows an above-average application security baseline:
- strong typed validation
- practical tenant isolation
- real SQL safety controls
- concrete key rotation mechanics
- response-time privacy masking
- usable audit trail and CI scanning
The main gap is not the app layer. It is the absence of externally verifiable infrastructure and governance controls. These gaps do not block continued development, package publication, or demos, but they do block enterprise-facing security claims:
- external penetration testing, including report scope, dates, severity summary, remediation map, retest status, and owner
- documented secrets-management architecture
- explicit encryption-at-rest posture for deployment targets
- external immutable retention evidence before claiming WORM/Object Lock/SIEM-backed audit logs
- a short customer-facing security overview aligned to the facts above