fix(security): mask unaliased PII expressions (D2 #125 follow-up)#126
Merged
Conversation
Closes the unaliased-expression PII leak flagged as a known follow-up in #125. An unaliased expression over PII — `SELECT upper(email) FROM users_enriched` (output column `upper(email)`), `SELECT email || '' ...` (`(email || '')`) — has no `alias_or_name`, so `_projection_source_columns` skipped it; the result column kept DuckDB's rendered name, which never matched a rule field, and the PII was returned cleartext with no X-PII-Masked signal (reproduced against live DuckDB: was_masked=False). A name-based fix is impossible because sqlglot's rendering does not reproduce DuckDB's column naming (UPPER(email) vs upper(email); case and parenthesisation differ). Align projections positionally to the real result keys (projection order == result-column order), which mask_query_results already has from the rows, so each projection is keyed by its true output name: aliased/bare projections keep the deep-lineage union shallow resolution (incl. the #125 SELECT*-blinded star leaf), an unaliased expression is masked by the columns it references (upper(email) -> email), and a top-level SELECT * / parse failure / count mismatch still falls back to name-matching. Completes the D2 masking surface. Counterfactual: the unaliased test fails on post-#125 code (cleartext) and passes with the fix; a new invariant test pins that a directly-named non-PII column (SELECT email, user_id) is not over-masked. Full unit suite 1513 passed; ruff/format/mypy --strict clean. masking.py mutation score unaffected (the harness covers the masking primitives, not the projection resolver). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DORA Metrics
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Closes the unaliased-expression PII leak flagged as a known follow-up in
#125 — the last D2 masking bypass shape.
An unaliased expression has no
alias_or_name, so_projection_source_columnsskipped it entirely; the result column kept DuckDB's rendered name
(
upper(email),(email || '')), which never matched a masking rule field, sothe PII was returned cleartext with no
X-PII-Maskedsignal. Reproducedagainst live DuckDB:
was_masked=False.A name-based fix is impossible because sqlglot's rendering does not reproduce
DuckDB's column naming —
UPPER(email)vsupper(email),email || ''vs(email || '')(case and parenthesisation differ).Fix
Align projections positionally to the real result keys (projection order ==
result-column order), which
mask_query_resultsalready has from the rows. Eachprojection is then keyed by its true DuckDB output name:
shallow resolution (incl. the SELECT*-blinded fail-closed star leaf);
(
upper(email)→email);SELECT *, a parse failure, or a projection/result-key countmismatch still falls back to name-matching (
return None), so existingbehaviour is unchanged.
This completes the D2 masking surface: aliased renames, subquery/CTE renames,
SELECT *-blinded inner renames (#125), and now unaliased expressions are allmasked.
Verification
on post-fix: close 2 residual defects from auditing #123/#124 (1 MEDIUM PII leak + 1 LOW DDL race) #125 code (
SELECT upper(email)→masked=False, cleartext) andpasses with the fix.
column alongside a PII one (
SELECT email, user_id) keepsuser_iduntouched— positional alignment keys each projection to its own column.
ruff+ruff formatclean;mypy --strictclean.Independently reproduced closed against live DuckDB; top-level
SELECT *name-match and all prior D2 shapes still pass.
masking.pymutation score is unaffected (the mutation harness covers themasking primitives, not the lineage/projection resolver).
🤖 Generated with Claude Code