fix: close 2 residual defects from auditing #123/#124 (1 MEDIUM PII leak + 1 LOW DDL race)#125
Merged
Merged
Conversation
…dual) The #123 D2 fix resolves projection lineage so a PII column renamed through a subquery/CTE is masked by what it is built from. But when an inner `SELECT *` sits *below* the rename — e.g. `SELECT c FROM (SELECT email AS c FROM (SELECT * FROM users_enriched) z) t` — sqlglot.lineage walks past the renamed `email` node to the bare `*` leaf and returns a plain `frozenset({'*'})`. That is NOT the `_UnresolvedSources` sentinel (which only fires on a lineage *exception*), so `email` is absent from the source set and the column fails **open** as cleartext with no X-PII-Masked signal; the shallow scan sees only the outer alias `c`. The #123 deep|shallow union only closes a star one level *above* the rename. A `*` lineage leaf means the column could carry any source column of that table, including PII, so treat it as unresolved and fail closed (mask) — the same policy the module already applies on a lineage exception. Regression test (subquery and CTE forms) fails on old code (cleartext, was masked=False) and passes on new; the existing masking/property/mutation suites stay green. Independently reproduced against live DuckDB before and after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The #123 lock fix serialized the three offloaded read-handler `ensure_*_table` helpers behind `catalog_ddl_lock`, but missed `ensure_webhook_delivery_queue_ table`: the dispatcher runs its lazy `CREATE TABLE IF NOT EXISTS webhook_ delivery_queue` on the shared serving connection from the event loop, while an offloaded read handler runs its own (now-locked) `ensure_*` on a worker thread. Concurrent catalog DDL on one cold DuckDB raises a "Catalog write-write conflict" across *different* tables too, so the cross-table 500-on-cold-restart the #123 fix set out to remove was still reachable through this unlocked site. Wrap its CREATE in the same shared `catalog_ddl_lock` as its three siblings (internal-wrap pattern; the two callers do not hold the lock, so no nesting). Regression: add `ensure_webhook_delivery_queue_table` to the concurrency harness's ensurer set so both the 32-thread same-table and the cross-table Barrier hammers cover it — both fail on old code (verified: 23+ and 11+ conflicts on the queue table) and pass with the lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DORA Metrics
|
brownjuly2003-code
added a commit
that referenced
this pull request
Jun 30, 2026
Closes the unaliased-expression PII leak flagged as a known follow-up in #125. An unaliased expression over PII — `SELECT upper(email) FROM users_enriched` (output column `upper(email)`), `SELECT email || '' ...` (`(email || '')`) — has no `alias_or_name`, so `_projection_source_columns` skipped it; the result column kept DuckDB's rendered name, which never matched a rule field, and the PII was returned cleartext with no X-PII-Masked signal (reproduced against live DuckDB: was_masked=False). A name-based fix is impossible because sqlglot's rendering does not reproduce DuckDB's column naming (UPPER(email) vs upper(email); case and parenthesisation differ). Align projections positionally to the real result keys (projection order == result-column order), which mask_query_results already has from the rows, so each projection is keyed by its true output name: aliased/bare projections keep the deep-lineage union shallow resolution (incl. the #125 SELECT*-blinded star leaf), an unaliased expression is masked by the columns it references (upper(email) -> email), and a top-level SELECT * / parse failure / count mismatch still falls back to name-matching. Completes the D2 masking surface. Counterfactual: the unaliased test fails on post-#125 code (cleartext) and passes with the fix; a new invariant test pins that a directly-named non-PII column (SELECT email, user_id) is not over-masked. Full unit suite 1513 passed; ruff/format/mypy --strict clean. masking.py mutation score unaffected (the harness covers the masking primitives, not the projection resolver). Co-authored-by: JuliaEdom <uedomskikh@gmail.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Two residual defects from a fresh adversarial audit of #123/#124 (the
audit-of-the-audit-fixes), each confirmed by executing a repro and each closed
with a regression test that fails on old code and passes on new.
1.
fix(security)— PII renamed above an innerSELECT *leaked cleartext (MEDIUM, D2 #123 residual)The #123 D2 fix resolves projection lineage so a PII column renamed through a
subquery/CTE is masked by what it is built from. But a
SELECT *sittingbelow the rename defeats it:
sqlglot.lineagewalks past the renamedemailnode to the bare*leaf andreturns a plain
frozenset({'*'})— not the_UnresolvedSourcessentinel(which only fires on a lineage exception) — so
emailis absent from thesource set and the column failed open as cleartext with no
X-PII-Maskedsignal. The shallow scan saw only the outer alias
c. Verified against liveDuckDB:
was_masked=False,alice.secret@victim.comreturned in the clear(every PII field — email/phone/full_name/ip_address/shipping_address). The #123
deep|shallow union only closes a star one level above the rename.
Fix: a
*lineage leaf means the column could carry any source column ofthat table (incl. PII), so treat it as unresolved and fail closed — the same
policy the module already applies on a lineage exception.
2.
fix(api)— webhook delivery-queue lazy DDL was outside the catalog lock (LOW, #123 residual)The #123 lock fix serialized the three offloaded read-handler
ensure_*_tablehelpers behind
catalog_ddl_lock, but missedensure_webhook_delivery_queue_table:the dispatcher runs its lazy
CREATE TABLE IF NOT EXISTS webhook_delivery_queueon the shared serving connection from the event loop, concurrently with the
now-locked read-handler DDL on worker threads. Concurrent catalog DDL on one
cold DuckDB raises a
Catalog write-write conflictacross different tablestoo, so the cross-table 500-on-cold-restart the #123 fix targeted was still
reachable through this unlocked site. Fix: wrap its CREATE in the same
shared lock as its three siblings (internal-wrap; callers don't hold the lock,
so no nesting).
Verification
old code — D2 returns cleartext; the queue ensurer raises 23+ same-table and
11+ cross-table
Catalog write-write conflicts — and pass with the fix.ruff check+ruff formatclean;mypy --strictclean on the changed modules. D2 leak independently reproducedclosed against live DuckDB; controls (
email AS contact,upper(email) AS x)still mask, top-level
SELECT *name-match unaffected.masking.pymutation score is unaffected: the mutation harness(
test_masking_mutation.py) covers the masking primitives, not the lineageresolver, so the new branch is outside the mutated surface.
Audit also confirmed CLEAN (executed)
D1 tenant isolation (recursive/nested-CTE shadows all rejected; non-recursive
collisions re-scoped; broad
traverse_scopepositions) and #124's Flinkdead-letter
yield(each element yields exactly one tuple, correct main/DLQdiscriminator) held against every executed attack.
Known follow-up (not in this PR — pre-existing, orthogonal to #123)
An unaliased expression over PII —
SELECT upper(email) FROM users_enriched(output key
upper(email)),SELECT email||'' …(key(email || '')) — stillreturns cleartext: the projection has no
alias_or_name, so it is skippedbefore lineage runs. A robust fix needs positional alignment of projections to
the actual result-row keys (sqlglot's rendering doesn't match DuckDB's column
naming — case/parens differ), a larger change to this mutation-gated module that
belongs in its own PR. Filed for a dedicated follow-up.
🤖 Generated with Claude Code