fix: close audit_30 backlog — 1 HIGH + 6 MEDIUM (Workstream C / R3) by brownjuly2003-code · Pull Request #118 · brownjuly2003-code/agentflow

brownjuly2003-code · 2026-06-30T08:20:55Z

Workstream C / R3 — fresh adversarial audit + fixes (`road-to-9.8.md`)

A fresh multi-agent adversarial audit of main (5814712), run after the
audit_28 backlog (16 HIGH) was closed: 4 read-only dimension agents
(concurrency · data/SQL/DV2/masking/tenant · security · API/contract/resource) +
1 verifier that executed code to confirm or refute each candidate. It found
1 HIGH + 6 MEDIUM new defects. Each is fixed here as one thematic commit with
a regression test that fails on the old code and passes on the new.

Fixed

D1 (HIGH) — cross-tenant leak. A CTE whose name collides with a real table
hid the physical inner reference from _scope_sql, so a tenant query bound to
the shared main schema and returned every tenant's rows. Rescope physical
refs via sqlglot scope resolution (genuine CTEs untouched). Reproduced
end-to-end.
C1 — alert resolved notification lost. The resolved transition advanced the
alert even when delivery failed → the page was lost and the incident stayed
open. Re-attempt next tick (mirrors the firing/escalation hardening).
C2 — webhook deliveries dropped. An exception mid per-webhook loop unwound
dispatch_new_events after the event was marked seen → later webhooks were
never enqueued. Isolate each webhook; mark seen only after durable enqueue.
S1 — unauthenticated DoS. Analytics middleware ran before auth/rate-limit,
spawning a DB-writing thread and persisting an attacker-controlled body per
request. Record only after successful auth; cap query_text.
A2 — event-loop blocking. lineage / slo / the SSE stream ran sync DuckDB
scans inline, serializing all tenants on the worker. Offload to a threadpool on
a dedicated cursor. (deadletter read handlers + alert/webhook history reads:
same root cause and recipe — tracked as a follow-up.)
A1 — metric window unvalidated. /v1/metrics accepted any window, silently
computed 1h, and echoed the requested window. Validate against
available_windows → 422 (mirrors the alerts router).
D2 — PII masking bypass. Masking keyed on the output column name, so
SELECT email AS contact returned cleartext. Mask by projection lineage.

Verification

Full unit suite 1498 passed; ruff + ruff format clean (285 files);
mypy --strict clean (100 files); webhook integration (no-Docker) 12 passed.
Audit write-up + per-finding repro commands in audit_30_06_26.md (gitignored).

Honest scope

Refuted by the verifier: S2 (admin-key in logs) — the shipped
config/security.yaml already redacts X-Admin-Key/Cookie; no defect at HEAD.
Downgraded to LOW (documented, non-gating): two unbounded .add()-only
seen-sets (global webhook dedup — entangled with cache invalidation; per-SSE —
bounded by connection lifetime).

🤖 Generated with Claude Code

…30 D1) _scope_sql is the sole tenant-isolation mechanism (one DuckDB DB, a schema per tenant, no per-connection search_path). It skipped any table whose name matched any CTE in the statement, so in `WITH orders_v2 AS (SELECT * FROM orders_v2) SELECT * FROM orders_v2` the physical inner orders_v2 stayed unqualified, bound to the shared `main` schema (populated for every tenant by demo-seed and local_pipeline), and returned other tenants' rows — a cross-tenant leak. Replace the global cte_names skip with sqlglot scope resolution: classify each table reference as physical (rescope to the caller's tenant schema) or an in-scope CTE reference (leave alone), and keep the "tenant context required" guard firing for a collision in the no-schema branch. Genuine CTEs that merely share a real table's name are unaffected. Regression test fails on the old code (inner ref leaks to main) and passes on the new (inner ref pinned to the tenant schema). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dit_30 C1) The resolved-transition branch discarded deliver()'s result and unconditionally advanced the alert to state="resolved"/fired_at=None. If every resolved page failed (receiver briefly down/5xx/timeout) the alert still went to resolved and the next tick fell straight through to "ok" — the resolved notification was lost for good and the receiver kept the incident open. Firing and escalation were already hardened against this (audit_28 #4); the resolved branch was not. Now count only successful resolved deliveries and, if any fail, leave fired_at set and the state unchanged so the next evaluation tick re-attempts (at-least- once, consistent with the firing path). Happy path is unchanged. Regression test fails on the old code (fired_at cleared despite failure) and passes on the new; a success-path test pins that a delivered resolved page still advances the alert. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… after enqueue (audit_30 C2) dispatch_new_events added the event to seen_event_ids *before* the per-webhook loop, and the loop body (_enqueue_delivery -> deliver -> _record_outcome) had no per-webhook isolation. An exception while handling one webhook — e.g. an httpx.InvalidURL deliver() doesn't catch, or a duckdb.Error — unwound the whole method after the event was already seen, so webhooks not yet reached got no durable queue row and were never re-driven: permanently undelivered. Now each matching webhook is wrapped in try/except, the event is marked seen only once every matching webhook is durably enqueued (still runs for zero-webhook events, preserving the metric-cache-invalidation trigger), and _enqueue_delivery returns whether it inserted a new row so a re-scan (when another webhook's enqueue failed and left the event unseen) does not re-POST already-queued deliveries inline. Inline-delivery failures are swallowed — the durable row is re-driven by process_delivery_queue. Regression tests fail on the old code (_enqueue_delivery returned None; the second webhook was never enqueued) and pass on the new. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… query_text (audit_30 S1) The analytics middleware sits OUTSIDE AuthMiddleware in the stack, so it recorded a session — spawning a DB-writing thread and persisting the request body — for every /v1 request regardless of the auth result, including unauthenticated 401s and the unthrottled flood path (per-key and per-IP throttling both live inside AuthMiddleware). An attacker could drive un-throttled thread spawns and write an attacker-controlled ~1 MiB query_text per request: a remote DoS + storage amplification. Gate recording on a present request.state.tenant_key and a non-rejected status (skip 401/403/429/503), on both the success and the exception path, and truncate query_text to the same 1000-char bound /v1/query enforces (analytics runs before the route's Pydantic validation). Authenticated traffic is unaffected and stays rate-limited by AuthMiddleware; 401 visibility remains via the AUTH_FAILURES counter. Regression tests fail on the old code (unauthenticated request scheduled a write; oversized question persisted verbatim) and pass on the new; a positive test pins that an authenticated request still records. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lineage, slo and the SSE stream handlers ran their synchronous DuckDB scans inline in async handlers on the single event-loop thread. The hot paths (entity/metric/query) were already offloaded with run_in_threadpool and pinned by test_agent_query_async, but these were missed: a slow scan froze the loop for every tenant on the worker (noisy-neighbour), and concurrent requests serialized instead of overlapping. The SSE stream re-runs its scan once per second per open connection, so it blocked the loop on every tick. Offload each scan with run_in_threadpool and run it on a dedicated cursor (not the shared connection object) so concurrent requests on different worker threads don't collide. Kept query_engine._conn rather than db_pool.read_conn(): with the default DUCKDB_PATH=":memory:" the pool and the engine hold *separate* in-memory databases, so a pooled read connection would not see the engine's data. Regression test fails on the old code (4 concurrent /v1/lineage requests serialize to ~1.2s) and passes on the new (~0.3s, overlapped). deadletter's read handlers and the alert/webhook history reads share the same root cause and fix recipe and are tracked as a follow-up. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

GET /v1/metrics/{name} never validated the window. The engine maps an unknown window to "1 hour" via WINDOW_MAP.get(window, "1 hour") (and active_sessions, whose template is hard-coded to 30 minutes, ignores it entirely), but the handler echoed the raw requested window back in the response and used it in the cache key. So GET .../revenue?window=2h returned the 1-hour value labelled "2h", and an arbitrary stream of bogus window strings polluted the metric cache. The alerts router already validated the same field; the core read path did not. Reject windows not in the metric's available_windows with 422, mirroring alerts._validate_metric_request, so only declared windows are computed, echoed and cached. Regression test fails on the old code (window=2h returned 200) and passes on the new (422 for revenue?window=2h and active_sessions?window=24h, 200 for a declared window). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… (audit_30 D2) PiiMasker.mask matched rule fields against the result-row dict keys (the output column names), so a query that renamed a PII column — `SELECT email AS contact` or `SELECT lower(email) AS e` — returned the cleartext value under the alias and left pii_masked False (no X-PII-Masked header). Reachable in LLM mode, where the model controls the projection. Resolve projection lineage from the SELECT and mask every output column that *derives* from a rule's source field, not just one literally named after it. SELECT * (and unparseable SQL) keep the previous name-based matching, which is correct because the outputs are the canonical source names. Single-entity and batch masking pass no lineage and are unchanged. Regression tests fail on the old code (`email AS contact` / `lower(email) AS e` came back cleartext) and pass on the new; a SELECT * test pins the fallback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T08:21:15Z

DORA Metrics

Window: last 30 days
Branch: main
Deployment frequency: 129 total / 30.1 per week
Lead time for changes: avg 0.32h / median 0.0h
Change failure rate: 78.29% (101/129)
MTTR: 0.25h across 3 incident(s)

…t_30 A1 follow-up) (#119) The A1 fix (#118) makes GET /v1/metrics reject a window the metric doesn't declare. The e2e support-agent journey and the support-agent example queried active_sessions — a point-in-time gauge whose only declared window is "now" — with window="1h", which the old code silently computed as a 30-minute count and echoed back as "1h". Post-merge that correctly returns 422, which turned E2E Tests + Staging Deploy red on main. Use window="now" in the e2e journey and the example, and extend the A1 unit test to pin that active_sessions?window=now returns 200. Co-authored-by: JuliaEdom <uedomskikh@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…IGH + 2 MEDIUM) (#123) * fix(security): reject WITH RECURSIVE CTE shadowing a tenant table (D1 bypass) The D1 fix (f153b23) re-scopes a physical table reference shadowed by a non-recursive CTE of the same name, but a recursive CTE *can* self- reference, so sqlglot keeps its name in its own body scope: the physical anchor reference (the first UNION branch, which cannot self-reference) is mis-classified as a CTE reference and never re-scoped — it stays bound to the shared `main` schema and leaks every tenant's rows. _scope_sql is the sole tenant-isolation mechanism (one DuckDB, schema-per-tenant), so this is a full cross-tenant read from a single valid SELECT. There is no safe re-scoping of a recursive anchor (genuinely ambiguous with the recursion) and no legitimate query names a recursive CTE after a physical table, so fail closed at both layers: validate_nl_sql rejects the shape (the NL/LLM gate) and _scope_sql raises for any other caller (defense-in-depth). Non-recursive shadows keep the safe re-scope path. Regression tests fail on old code (validate accepts; _scope_sql leaks), pass on new. Verified e2e: the recursive attack through execute_nl_query now returns 403; the legitimate tenant_a query still sees only its rows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(security): mask PII renamed through a subquery/CTE (D2 lineage bypass) The D2 fix (faa6c77) mapped each output column to the source names in the *outermost* projection only, so a PII column renamed at an inner level — `SELECT contact FROM (SELECT email AS contact FROM users_enriched) t` (and the CTE/double-rename variants) — never matched the `email` rule and was returned as cleartext with no X-PII-Masked signal. Resolve true projection lineage with sqlglot.lineage, tracing each output column to its ultimate source columns across subqueries, CTEs and union branches. Union the deep lineage sources with the shallow columns named directly in the projection: lineage is blind through an inner `SELECT *` (no schema to expand it) where the shallow scan still catches a direct `email AS contact`, and the shallow scan misses the inner rename lineage catches — either alone leaks one shape, the union closes both. A lineage failure falls back to a sentinel source set that matches every rule field (fail closed), so an unresolvable column is masked, never leaked. Regression test covers subquery, CTE and double-rename renames; the full masking + property + router suites stay green. SELECT * still falls back to name-matching (its outputs are the source names verbatim). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(api): serialize lazy table DDL to stop cold-DB concurrent-create 500s The #120 read-handler offload moved deadletter/webhook-log/alert-history reads onto worker threads (run_in_threadpool) with a dedicated cursor, but each still calls its ensure_*_table() — a lazy CREATE TABLE IF NOT EXISTS / ALTER ... ADD COLUMN IF NOT EXISTS — on that cursor. Pre-#120 these ran serialized on the event loop; the offload let them run truly in parallel, and concurrent catalog DDL on one DuckDB raises "Catalog write-write conflict" (across different tables too, not just the same one — the catalog is a single versioned structure). The serving store defaults to :memory:, cold on every restart, so a concurrent burst of cold reads returned HTTP 500s until one request won the CREATE race. Serialize every ensure_*_table behind one shared process-wide lock (src/db_concurrency.catalog_ddl_lock): the first thread creates, the rest see a warm no-op (warm DDL doesn't conflict). The lock is held only for the brief CREATE/ALTER, never around queries and never nested. Also close the freshly-opened read cursor in deadletter._read_cursor if the DDL raises, before the handler's own try/finally takes over (no per-failure leak). Regression tests fire 32 threads at one ensure_* and 12 each across all three behind a Barrier: reliably raise without the lock (verified 16-30 conflicts), green with it. A shared-instance test pins the single lock so a future per-table lock can't reintroduce the cross-table race. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: JuliaEdom <uedomskikh@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

JuliaEdom and others added 7 commits June 30, 2026 10:38

brownjuly2003-code merged commit d4b3b01 into main Jun 30, 2026
22 of 23 checks passed

brownjuly2003-code deleted the fix/audit-30-backlog branch June 30, 2026 08:29

brownjuly2003-code mentioned this pull request Jun 30, 2026

fix(test): query active_sessions with declared window "now" (A1 follow-up) #119

Merged

brownjuly2003-code mentioned this pull request Jun 30, 2026

fix: close 3 defects from a fresh adversarial audit of #118–#122 (1 HIGH + 2 MEDIUM) #123

Merged

brownjuly2003-code mentioned this pull request Jun 30, 2026

fix(security): mask unaliased PII expressions (D2 #125 follow-up) #126

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: close audit_30 backlog — 1 HIGH + 6 MEDIUM (Workstream C / R3)#118

fix: close audit_30 backlog — 1 HIGH + 6 MEDIUM (Workstream C / R3)#118
brownjuly2003-code merged 7 commits into
mainfrom
fix/audit-30-backlog

brownjuly2003-code commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

brownjuly2003-code commented Jun 30, 2026

Workstream C / R3 — fresh adversarial audit + fixes (road-to-9.8.md)

Fixed

Verification

Honest scope

Uh oh!

github-actions Bot commented Jun 30, 2026

DORA Metrics

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Workstream C / R3 — fresh adversarial audit + fixes (`road-to-9.8.md`)