Skip to content

feat(dsql): enhance query plan explainability with type coercion detection, rewrites, and workflow extraction#162

Open
Morlej wants to merge 7 commits into
awslabs:mainfrom
Morlej:feat/dsql-query-plan-explainability
Open

feat(dsql): enhance query plan explainability with type coercion detection, rewrites, and workflow extraction#162
Morlej wants to merge 7 commits into
awslabs:mainfrom
Morlej:feat/dsql-query-plan-explainability

Conversation

@Morlej
Copy link
Copy Markdown

@Morlej Morlej commented May 8, 2026

Summary

  • Extract Workflow 8 from SKILL.md into references/query-plan/workflow.md (SKILL.md: 334 → 281 LOC)
  • Add type coercion index bypass detection — pg_amop-based detection in plan-interpretation.md, indexed column type queries in catalog-queries.md
  • Add query rewrite references — 11 generic patterns split into individual files under query-rewrites/, plus 2 DSQL-specific rewrites (reltuples estimate, split large joins)
  • Add structured trigger criteria, context disambiguation, and routing to the workflow reference
  • Wire rewrites into workflow — loaded at Phase 0, applied at Phase 2

Validation

  • validate-size.py: 281 lines (good, under 300 limit)
  • validate-references.py: 0 broken links, 0 new orphans

Eval Results

Manual qualitative comparison (n=1, Claude Opus 4.6). Full results in tools/evals/databases-on-aws/dsql/query_plan_rewrite_eval_results.md:

Eval Scenario With Skill Baseline Key Delta
200 IN-subquery Full Scan PASS PARTIAL Skill recommends specific rewrite patterns from reference
201 Type coercion index bypass PASS PASS Both identify it; skill adds DSQL-specific pg_amop detail
202 12-table join ordering PASS PARTIAL Skill offers full diagnostic workflow with GUC experiments
203 COUNT(*) timeout PASS FAIL Skill recommends pg_class reltuples with staleness warning
204 Multiple OR to IN PASS PARTIAL Skill identifies pattern from reference
205 GROUP BY after JOIN PASS PARTIAL Skill recommends subquery aggregation
206–210 LEFT JOIN, computation push, NOT IN+NULL, UNION ALL, negative Added in review round Coverage for remaining patterns + negative case

Follow-ups

  • MCP mirror PR: awslabs/mcp src/aurora-dsql-mcp-server/skills/dsql-skill/ needs to be synced with these changes (workflow.md, query-rewrites/ split, updated catalog-queries.md, plan-interpretation.md). Will open companion PR after this merges.
  • Python SQL converter: Per review feedback, deterministic rewrites should migrate to a Python script in a future PR (reference files then document the converter's rules).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

🤖 Generated with Claude Code

@Morlej Morlej requested review from a team as code owners May 8, 2026 23:33
@Morlej Morlej force-pushed the feat/dsql-query-plan-explainability branch from 8e33741 to 8261713 Compare May 8, 2026 23:36
Copy link
Copy Markdown
Contributor

@amaksimo amaksimo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few general commets:

  1. We should use positive language throughout (llm can confuse DO with DO NOT when we trim context)
  2. We should try to use RFC language more frequently throughout
  3. We should break up the references in the query-plan folder as some of the files are very long

Comment thread plugins/databases-on-aws/skills/dsql/references/query-plan/workflow.md Outdated
Comment thread plugins/databases-on-aws/skills/dsql/references/query-plan/workflow.md Outdated
@anwesham-lab anwesham-lab requested a review from amaksimo May 12, 2026 21:48
@anwesham-lab anwesham-lab force-pushed the feat/dsql-query-plan-explainability branch from 07b6baa to 6f97294 Compare May 14, 2026 18:20

**Fallback:** If `awsknowledge` is unavailable, use the defaults above and flag that limits should be verified against [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/).
**Fallback:** If `awsknowledge` is unavailable, use the defaults above and note to the user
that limits should be verified against [DSQL documentation](https://docs.aws.amazon.com/aurora-dsql/latest/userguide/).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we end up adding line breaks?


**When:** MUST load all four at Workflow 8 Phase 0 — [query-plan/plan-interpretation.md](references/query-plan/plan-interpretation.md), [query-plan/catalog-queries.md](references/query-plan/catalog-queries.md), [query-plan/guc-experiments.md](references/query-plan/guc-experiments.md), [query-plan/report-format.md](references/query-plan/report-format.md)
**Contains:** DSQL node types + Node Duration math + estimation-error bands, pg_class/pg_stats/pg_indexes SQL + correlated-predicate verification, GUC experiment procedures + 30-second skip protocol, required report structure + element checklist + support request template
**When:** MUST load [query-plan/workflow.md](references/query-plan/workflow.md) at Workflow 8 entry — it gates the remaining files
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the bullet structure?


**SHOULD apply when:** The WHERE clause rejects NULLs from the right-hand side of a LEFT JOIN (e.g., `IS NOT NULL`, equality comparisons, or any predicate that cannot be true for NULL).

**Skip when:** NULLs from the right-hand side are intentionally preserved in the result.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this an always? contextualize when/how often? Should this be an SHOULD Skip when?


When a query uses LEFT JOIN but the WHERE clause rejects NULLs on the joined table, rewrite as INNER JOIN. This enables a simpler, more efficient join plan.

**SHOULD apply when:** The WHERE clause rejects NULLs from the right-hand side of a LEFT JOIN (e.g., `IS NOT NULL`, equality comparisons, or any predicate that cannot be true for NULL).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a should or a must? should gives the model a directive to bypass when told something like to rush

@@ -0,0 +1,48 @@
# Rewrite: Propagate Filter to JOIN Columns
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since so many of these are deterministic, I question the meta structure of if we should instead leverage something like a python script converter of sorts that can parse and replace the SQL and the reference files just execute them?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still hold to this? Why are we relying on markdown applications alone? Why are these transformations not written into script patterns/tooling?

@anwesham-lab
Copy link
Copy Markdown
Member

PR #162 — Review Summary

feat(dsql): enhance query plan explainability with type coercion detection, rewrites, and workflow extraction

Reviewed at head SHA 07b6baaca029e3336ddfb03438eee26429734a72. Sound direction; eval gains are real (especially eval 203 reltuples). Holding on five correctness bugs in the new SQL example pairs (rows 1–5) — agents will copy these into production rewrites that change result sets — plus a dangling cross-reference cluster (rows 6–8) where workflow.md, plan-interpretation.md, and catalog-queries.md reference an "implicit cast compatibility matrix" and a "Phase 5" that no longer exist after the rewrite. Structural, eval, and process items follow.

# Confidence Area Finding Suggestion Reviewed SHA
1 95 query-rewrites/push-computation-to-constant.md L9-17 — correctness First example is not equivalent under integer division. Original WHERE emp_no * 100 / 5 = 10001 has no integer solution; rewrite WHERE emp_no = 10001 * 5 / 100 matches emp_no = 500. The file's own "Skip when … integer-division rounding" caveat is violated by the leading example. Replace with a genuinely invertible example (e.g. emp_no + 100 = 10001emp_no = 9901), or use a numeric/float column. 07b6baa
2 90 query-rewrites/not-in-to-not-exists.md L1-26 — correctness "Sidesteps NULL semantics issues" understates: when the subquery contains NULL, NOT IN returns empty and NOT EXISTS returns the rows; the rewrite changes results, not just performance. State explicitly: "NOT EXISTS does not preserve NOT IN's NULL-propagation; output differs when the subquery may contain NULLs. Confirm intent with the user before applying." 07b6baa
3 85 query-rewrites/subquery-unnesting-uncorrelated.md L9-23 — correctness SELECT DISTINCT R.* collapses pre-existing duplicates in R that the original semi-join (IN (SELECT …)) preserved — fixes one duplicate problem by introducing another. Either (a) recommend the EXISTS form (true semi-join) or (b) state the assumption "Apply only when S.b is unique (PK/UNIQUE); otherwise DISTINCT changes results." 07b6baa
4 85 query-rewrites/subquery-unnesting-correlated.md L20-26 — correctness Same DISTINCT-on-semi-join issue as #3 for the EXISTS→JOIN rewrite. Same fix: prefer EXISTS, or document the uniqueness precondition. 07b6baa
5 85 query-rewrites/subquery-unnesting-scalar.md L33-52 — correctness s_count example: scalar COUNT(*) returns 0 for outer rows with no match; LEFT JOIN+GROUP BY rewrite returns NULL. Downstream WHERE s_count = 0, SUM(s_count), etc. break silently. The first MAX example is fine (MAX returns NULL on empty). Wrap with COALESCE(Agg.s_count, 0) AS s_count; add a one-line note that COUNT/SUM need COALESCE while MAX/MIN do not. 07b6baa
6 95 workflow.md L29-31 — correctness Trigger row references "Phase 5 re-entry for an existing report" but the workflow defines only Phase 0–4 (TOC L7–14). Routing L56 correctly says "append Addendum". Replace with "Reassessment re-entry — re-runs Phase 1–2 and appends an Addendum per Phase 4." 07b6baa
7 90 plan-interpretation.md L194-242, workflow.md L100, catalog-queries.md L122-124 — correctness Three files reference an "implicit cast compatibility matrix below/above/in plan-interpretation.md" that does not exist. The section was rewritten to recommend a live pg_amop query instead. Eval 201's expectation "Mentions implicit cast compatibility matrix" reinforces the phantom artifact. Replace all four references with "the pg_amop query in catalog-queries.md (B-Tree Cross-Type Operator Support)." Update eval 201 expectation accordingly. 07b6baa
8 80 plan-interpretation.md L201-214 — correctness Bullet at L202 says "if an implicit cast exists, the planner can still use the index" — contradicts L211–214 which correctly notes B-Tree needs a registered cross-type operator (pg_amop), not just a pg_cast. The two paragraphs disagree. Drop or rephrase L202: "If a cross-type B-Tree operator is registered (see pg_amop), the index can be used; otherwise the planner applies a per-row cast that defeats index ordering." 07b6baa
9 75 plan-interpretation.md L214 — correctness/durability "Cross-type index support is limited to the integer family" stated as fact, no citation, no "verify before asserting" hedge. Will rot the moment DSQL adds a cross-type operator family. Prefix with "At time of writing…" and route the agent through the pg_amop query before asserting this to a user. 07b6baa
10 70 catalog-queries.md L136 — correctness amopmethod = 10003 is a DSQL-internal magic number (PG mainline B-Tree is 403). No provenance comment; will silently break if the OID changes. Add inline comment explaining provenance and a SELECT oid FROM pg_am WHERE amname = 'btree' recommendation as a hedge. 07b6baa
11 90 catalog-queries.md TOC L5-13 — structure TOC omits 3 of the 9 sections — all PR additions: "Column Types for Predicate Columns" (L107), "B-Tree Cross-Type Operator Support" (L125), "Indexed Column Types" (L157). Add three TOC entries between current items 5 and 6. 07b6baa
12 90 plan-interpretation.md TOC L3-14 — structure TOC omits "Type Coercion and Index Bypass" (L186) — the headline new section of this PR. Insert TOC entry, renumber subsequent items. 07b6baa
13 80 SKILL.md L112-115 — structure The PR rewrites this block but uses a single combined When:/Contains: while sibling "(modular):" sections give each sub-file its own #### heading + per-file When/Contains. Loading conditions for plan-interpretation.md, catalog-queries.md, guc-experiments.md, report-format.md, and the rewrite indexes are no longer declared in the entry file. Either give each query-plan reference its own #### entry, or explicitly delegate routing to workflow.md and state that as the rule. 07b6baa
14 80 tools/evals/databases-on-aws/README.md — multi-target sync New query_plan_rewrite_evals.json is not added to the README's directory tree or per-tier eval section. Sibling evals (evals.json, query_explainability_evals.json) all have entries. The cluster-fixtures table also misses the new schemas (12-table join, 50M-row table). Add the new eval and a fixtures row to the README. 07b6baa
15 75 tools/evals/databases-on-aws/dsql/scripts/ — process New eval JSON has no paired runner script under scripts/. Sibling evals all have one. PR ships only manual query_plan_rewrite_eval_results.md. Either add run_query_plan_rewrite_evals.py (LLM-judge fits) or document explicitly that this suite is manual-only. 07b6baa
16 70 query_plan_rewrite_evals.json — tests Coverage gap: 5 of 11 generic rewrites have no direct eval — left-join-to-inner, propagate-filter, push-computation-to-constant, not-in-to-not-exists, flatten-union-all. NOT IN→NOT EXISTS especially worth covering (correctness, not just perf). No negative cases (where the agent should decline the rewrite). Add evals 206–212 covering missing patterns + at least one "OR across different columns → does NOT recommend OR-to-IN" negative case. 07b6baa
17 65 query_plan_rewrite_eval_results.md — tests Sample size = 1 per cell, no model/version/temperature recorded, no variance analysis. PASS/FAIL is a single human transcript read. Record model + version + n=3 with majority vote; add a Runs column; or downgrade the table to "qualitative comparison." 07b6baa
18 75 PR description — pr-body PR body / commit 82617135 claim "275 lines (good)" but SKILL.md is 279 lines at head. Still under cap; cosmetic but it's a stated correctness claim. Re-run validate-size.py on 07b6baa and update the PR body / commit message. 07b6baa
19 65 Multi-target sync (awslabs/mcp) — process awslabs/mcp@main src/aurora-dsql-mcp-server/skills/dsql-skill/references/query-plan/ does NOT contain workflow.md, query-rewrites/, or the new index files. PR description does not mention an MCP-mirror PR or follow-up. Per the dsql-skill-author placement rules, the default DSQL skill must propagate to the MCP standalone skill + Kiro Power. Open a companion PR against awslabs/mcp mirroring the new files (and translate workflow.md for Kiro Power), or document explicitly that the mirror is out of scope and link the follow-up issue. 07b6baa
20 70 SKILL.md L172-173, L266-267 — silent-failure This PR softens the awsknowledge fallback rule from "flag that" to "note to the user that" — advisory phrasing, not a MUST. Agent can silently use stale defaults for decisions that turn on the exact value. Promote to MUST: "MUST tell the user the lookup failed, MUST name the limit and value, MUST refuse the fallback when the recommendation depends on the exact value." 07b6baa
21 70 query-rewrites/reltuples-estimate.md + eval 203 — silent-failure reltuples reflects last ANALYZE/autovacuum and may be drastically stale on a fresh or write-heavy table. Doc says "estimate, not exact" but does not require warning the user about staleness; eval 203 lacks the staleness expectation, so the failure mode is unobservable. Add MUST: "Warn the user that reltuples reflects the last ANALYZE; recommend cross-checking last_analyze when the count drives a decision." Add eval expectation. 07b6baa
22 70 catalog-queries.md L107-180 (PR-added sections) — security The 3 new sections this PR adds (Column Types for Predicate Columns, B-Tree Cross-Type Operator Support, Indexed Column Types) introduce fresh '{schema}' / '{table}' placeholder substitution patterns. SKILL.md Workflow 4 mandates safe_query.build() for query construction; these new examples teach lexical concatenation, an injection sink despite readonly_query. Add a one-line MUST scoped to the new sections: "Substitute these placeholders via safe_query.build() with ident() — see input-validation.md." 07b6baa
23 60 query-rewrites/*.md (all 13) — style Every new rewrite file pairs **SHOULD apply when:** with **Skip when:**. The two are logical complements; per authoring-style.md §Voice reserve prohibition for irreversible harm. Drop Skip when: and tighten SHOULD apply when:, or rephrase as a single **Applies when:** criterion. 07b6baa
24 80 workflow.md TOC L7-15 — structure TOC anchors encode the em dash with double hyphens (e.g., #phase-0--load-reference-material). GitHub collapses to a single -, so all five Phase TOC links are broken in the rendered file. Regenerate as #phase-0-load-reference-material#phase-4-produce-the-report-invite-reassessment (and #phase-3-experiment-conditional). 07b6baa

Reviewer scope. This review covered the diff at the head SHA (21 files, +1131 / −36) — the new query-plan workflow extraction, type-coercion detection, 11-pattern rewrite library, and eval pair. Prior amaksimo review threads from the predecessor PR #161 (file split, RFC keywords, positive language, DATEADD→NOW()-INTERVAL, psql fallback removal) are addressed at this head; thank you.


🤖 This review was drafted with Claude Code using the dsql-skill-author Workflow 2 (reviewer) procedure and the 17+ sub-agent roster from code-review.md. Findings have been validated through the five-gate filter (re-read at head SHA, applicability, suggestion correctness, customer-value, confidence ≥ 60).

Was this review useful? React with 👍 if the findings were helpful, 👎 if they missed the mark or introduced false positives. Reply with specifics so the review process can improve. Findings you disagree with are valid to push back on — confidence scores are not verdicts.

Morlej added a commit to Morlej/agent-plugins that referenced this pull request May 15, 2026
…vals

Correctness fixes (review items 1-5):
- awslabs#1: push-computation-to-constant — use NUMERIC column 'amount' to
  avoid integer division non-equivalence
- awslabs#2: not-in-to-not-exists — add NULL semantics warning (NOT EXISTS
  does not preserve NOT IN's NULL-propagation; MUST confirm with user)
- awslabs#3/awslabs#4: subquery-unnesting — prefer EXISTS form (true semi-join);
  document uniqueness precondition for JOIN+DISTINCT alternative
- awslabs#5: subquery-unnesting-scalar — add COALESCE(s_count, 0) for
  COUNT/SUM (LEFT JOIN returns NULL, scalar returns 0)

Dangling reference fixes (review items 6-8):
- awslabs#6: workflow.md trigger table — "Phase 5" → reassessment re-entry
- awslabs#7: Replace all "implicit cast compatibility matrix" references
  with "pg_amop query in catalog-queries.md"
- awslabs#8: plan-interpretation.md L202 — fix cast-vs-operator contradiction

Structural fixes (review items 9-14, 24):
- awslabs#9: Hedge "integer family" claim with "at time of writing" + verify
- awslabs#10: amopmethod=10003 — add provenance comment and verification SQL
- awslabs#11: catalog-queries.md TOC — add 3 missing sections
- awslabs#12: plan-interpretation.md TOC — add Type Coercion section
- awslabs#13: SKILL.md — explicitly delegate routing to workflow.md
- awslabs#24: workflow.md — remove em dashes from headings for clean anchors

Other fixes (review items 21-23):
- awslabs#21: reltuples-estimate — add staleness warning (MUST warn user)
- awslabs#22: catalog-queries — add safe_query.build() note for placeholders
- awslabs#23: "Skip when" → "SHOULD skip when" in all rewrite files

Eval improvements (review items 14, 16):
- awslabs#14: README — add query_plan_rewrite_evals to directory tree and
  eval section
- awslabs#16: Add evals 206-210 covering LEFT JOIN, computation push, NOT IN
  with NULL warning, nested UNION ALL, and negative case (OR across
  different columns)
- awslabs#7 (eval): Update eval 201 expectation — pg_amop instead of matrix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Morlej added a commit to Morlej/agent-plugins that referenced this pull request May 15, 2026
- awslabs#17: Downgrade eval results to qualitative comparison, record model
  and version, note n=1 and recommend n>=3 for production confidence
- awslabs#18: SKILL.md is 281 lines (will update PR body)
- awslabs#20: Strengthen awsknowledge fallback to MUST — refuse fallback when
  recommendation depends on exact limit value
- awslabs#21: Already addressed in prior commit (reltuples staleness)
- awslabs#15: Document manual-only status and future Python converter direction
  (per anwesham-lab's suggestion for deterministic rewrites)
- awslabs#19: MCP mirror PR noted as follow-up in PR body

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@krokoko krokoko requested a review from anwesham-lab May 25, 2026 00:31
Morlej and others added 6 commits May 25, 2026 23:18
…ction and rewrite references

- Add structured trigger phrases and routing criteria for query plan diagnosis
- Add type coercion index bypass detection (implicit cast compatibility matrix)
- Extend catalog queries with indexed column type retrieval
- Add generic SQL rewrite reference (11 patterns: OR-to-IN, subquery unnesting, etc.)
- Add DSQL-specific rewrite reference (reltuples estimate, split large joins for DP threshold)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract Workflow 8 (query plan explainability) from SKILL.md into
  references/query-plan/workflow.md to stay under the 300 LOC limit
- Wire query-rewrites-generic.md and query-rewrites-dsql-specific.md
  into the workflow (Phase 0 load list + Phase 2 evidence gathering)
- Add behavioral evals (query_plan_rewrite_evals.json) covering type
  coercion detection, subquery unnesting, OR-to-IN, GROUP BY pushdown,
  large join splitting, and reltuples estimation
- Add eval results (query_plan_rewrite_eval_results.md) with
  with-skill vs baseline comparison

Validation:
- validate-size.py: 275 lines (good)
- validate-references.py: 0 broken links, 0 new orphans

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…, RFC keywords

Review feedback from amaksimo:

- Split query-rewrites-generic.md into 11 individual files under
  query-rewrites/ subdirectory to reduce context consumption
- Split query-rewrites-dsql-specific.md into individual files
- Convert monolithic files to index tables pointing to sub-files
- Fix DATEADD() SQL Server syntax → PostgreSQL NOW() - INTERVAL
- Flip negative language ("Do not apply") to positive ("Skip when")
- Add RFC keywords (MUST, SHOULD, MAY) throughout
- Remove psql fallback from workflow.md (enforce MCP usage)
- Update plan-interpretation.md recommendation template with RFC language
- Make Phase 0 loading explicit: MUST for core refs, SHOULD for rewrites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…vals

Correctness fixes (review items 1-5):
- awslabs#1: push-computation-to-constant — use NUMERIC column 'amount' to
  avoid integer division non-equivalence
- awslabs#2: not-in-to-not-exists — add NULL semantics warning (NOT EXISTS
  does not preserve NOT IN's NULL-propagation; MUST confirm with user)
- awslabs#3/awslabs#4: subquery-unnesting — prefer EXISTS form (true semi-join);
  document uniqueness precondition for JOIN+DISTINCT alternative
- awslabs#5: subquery-unnesting-scalar — add COALESCE(s_count, 0) for
  COUNT/SUM (LEFT JOIN returns NULL, scalar returns 0)

Dangling reference fixes (review items 6-8):
- awslabs#6: workflow.md trigger table — "Phase 5" → reassessment re-entry
- awslabs#7: Replace all "implicit cast compatibility matrix" references
  with "pg_amop query in catalog-queries.md"
- awslabs#8: plan-interpretation.md L202 — fix cast-vs-operator contradiction

Structural fixes (review items 9-14, 24):
- awslabs#9: Hedge "integer family" claim with "at time of writing" + verify
- awslabs#10: amopmethod=10003 — add provenance comment and verification SQL
- awslabs#11: catalog-queries.md TOC — add 3 missing sections
- awslabs#12: plan-interpretation.md TOC — add Type Coercion section
- awslabs#13: SKILL.md — explicitly delegate routing to workflow.md
- awslabs#24: workflow.md — remove em dashes from headings for clean anchors

Other fixes (review items 21-23):
- awslabs#21: reltuples-estimate — add staleness warning (MUST warn user)
- awslabs#22: catalog-queries — add safe_query.build() note for placeholders
- awslabs#23: "Skip when" → "SHOULD skip when" in all rewrite files

Eval improvements (review items 14, 16):
- awslabs#14: README — add query_plan_rewrite_evals to directory tree and
  eval section
- awslabs#16: Add evals 206-210 covering LEFT JOIN, computation push, NOT IN
  with NULL warning, nested UNION ALL, and negative case (OR across
  different columns)
- awslabs#7 (eval): Update eval 201 expectation — pg_amop instead of matrix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- awslabs#17: Downgrade eval results to qualitative comparison, record model
  and version, note n=1 and recommend n>=3 for production confidence
- awslabs#18: SKILL.md is 281 lines (will update PR body)
- awslabs#20: Strengthen awsknowledge fallback to MUST — refuse fallback when
  recommendation depends on exact limit value
- awslabs#21: Already addressed in prior commit (reltuples staleness)
- awslabs#15: Document manual-only status and future Python converter direction
  (per anwesham-lab's suggestion for deterministic rewrites)
- awslabs#19: MCP mirror PR noted as follow-up in PR body

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anwesham-lab anwesham-lab force-pushed the feat/dsql-query-plan-explainability branch from 122b2a3 to 1178334 Compare May 26, 2026 06:18
@anwesham-lab
Copy link
Copy Markdown
Member

anwesham-lab commented May 26, 2026

PR #162 — Multi-agent review (revised after empirical validation on a live DSQL cluster)

Reviewed at head SHA 1178334f37ce8abb22e6ee4955929ba1593d714e. Updated comment: original revision had several findings that did not survive a per-finding §4 re-validation pass plus empirical testing on a live DSQL cluster (account 011528302527, region us-west-2, head-SHA-deployed test schema). The empirical results validate the PR's core thesis on every claim I tested: the type-coercion-bypass detection is real and load-bearing (cross-type date = timestamp was 85× slower — Index Cond → Filter, 1.2ms → 102ms with 19,945 rows removed by post-scan filter), and the IN→EXISTS rewrite delivers a real ~43× speedup on DSQL (4,632ms Nested Loop → 108ms Hash Semi Join) because DSQL's planner does not auto-unnest IN (SELECT…) into a semi-join the way mainline PostgreSQL has since 8.4. Two original findings (#9 projection drop, #3 array/JSON regression depth) were graded false-positive / minor and have been removed; one (#7) is reframed because empirical evidence shows the hardcoded OID is correct on DSQL — only the verify-comment is misleading. Twenty review sub-agents ran in parallel, then one Opus grader per finding re-validated against head SHA. Findings below all survive at confidence ≥ 60 post-validation.

# Confidence Area Finding Suggestion Reviewed SHA
1 90 SKILL.md L37-L40 — correctness [.mcp.json](../../.mcp.json) was rewritten to [mcp/.mcp.json](mcp/.mcp.json). The new path resolves to skills/dsql/mcp/.mcp.json, which does not exist in the repo (verified via the GitHub contents API). The canonical file is at the plugin root, and mcp/mcp-setup.md confirms ".mcp.json at the plugin root". Revert to [.mcp.json](../../.mcp.json). 1178334f
2 90 SKILL.md L177-L180 — correctness [scripts/](../../scripts/) and [scripts/README.md](../../scripts/README.md) were rewritten to scripts/... — resolves under the skill, where no scripts/ subdirectory exists. The trailing "and hook configuration" wording was also dropped. validate-references.py does not flag these because the new paths lack its trigger keywords. Revert both paths to ../../scripts/... and restore the dropped wording. 1178334f
3 75 SKILL.md awsknowledge limits table — regression Deletion of | Supported column data types | See docs | aurora dsql supported data types |. PR #155 added this row specifically "so the skill does not drift as DSQL's type surface evolves." Removed without replacement and out of stated PR scope. Restore the row, or call out the deletion in the PR body with rationale. 1178334f
4 90 subquery-unnesting-scalar.md L5 — correctness For COUNT and SUM, MUST wrap with COALESCE(..., 0) because the LEFT JOIN returns NULL (not 0) for unmatched rows — the scalar subquery returns 0. This is wrong for SUM. Per the PostgreSQL aggregate-functions docs: "Most aggregate functions, except count, return null when no rows are selected. For example, sum of no rows returns null, not zero." Both the scalar subquery and the LEFT-JOIN form return NULL for SUM over empty sets — only COUNT differs. Wrapping SUM with COALESCE(..., 0) silently changes results. Restrict the COALESCE rule to COUNT only. For SUM/MIN/MAX, omit COALESCE so the rewrite preserves NULL semantics. 1178334f
5 70 catalog-queries.md L141, L158 — wording / verifier-misleading (Original finding reframed after empirical validation.) The hardcoded WHERE ao.amopmethod = 10003 is correct on DSQL — every DSQL index uses the btree_index access method (OID 10003); the 403/btree access method exists in pg_am but is not used by any actual index. The empirical pg_amop check at 10003 correctly excludes date<->timestamp (which produced an 85× slowdown via post-scan filter on a 20k-row test table), while at 403 it incorrectly includes it. However, the inline verify-comment Verify with: SELECT oid FROM pg_am WHERE amname = 'btree' would mislead a verifier — that returns 403 (regular btree, amtype='i'), not 10003 (btree_index). Update the verify-comment to Verify with: SELECT oid FROM pg_am WHERE amname = 'btree_index' (or, equivalently, switch the query to WHERE ao.amopmethod = (SELECT oid FROM pg_am WHERE amname = 'btree_index') so the OID lookup is the database's responsibility). 1178334f
6 78 catalog-queries.md L109-L111 — correctness The MUST substitute … via safe_query.build() with ident() directive is scoped to one section, but the same '{schema}'/'{table}'/'{col}' placeholders are used throughout the file. Worse, per mcp/tools/input-validation.md, ident() emits "value" (double-quoted identifier) and is for table/column names. The placeholders here sit in single-quoted string-literal positions (c.table_schema = '{schema}', n.nspname = '{schema}', IN ('{table1}', '{table2}')) which need regex()/allow() — both emit 'value'. Following the directive verbatim produces invalid SQL like WHERE c.table_schema = "public". (Note: positions like FROM {schema}.{table} and GROUP BY {column} legitimately need ident().) Lift the substitution rule to the file preamble with per-position guidance: identifier positions ({schema}.{table}, GROUP BY {column}) → ident(); literal positions (= '{schema}', IN ('{table}')) → regex() or allow(). 1178334f
7 70 workflow.md L42-L46 — correctness regression The Context Disambiguation row says "offer the psql fallback" when no MCP is connected, but workflow.md never reproduces the fallback (auth-token command, psql <<< heredoc, $? check) the prior SKILL.md Workflow 8 carried. Phase 1 L78 and Safety L127 say readonly_query exclusively, contradicting the offered fallback. Either restore the psql fallback procedure, or change the row to "MUST refuse — no MCP connection means no plan capture." 1178334f
8 70 split-large-joins.md — example consistency File states "e.g., 10 joins for Aurora DSQL" but the worked Original example only joins 7 tables (R1–R7), so by its own rule the rewrite would not trigger. Eval 202 uses 12 tables (consistent with a >10 threshold) — the example is the outlier. The final JOIN sub2 ON sub1.id = sub2.id is also ambiguous: SELECT * from each CTE propagates multiple id columns. Bump example to >10 tables and project explicit columns / alias the join keys per CTE. 1178334f
9 80 plan-interpretation.md L227-L233 — example correctness Recommendation Template shows WHERE col = '42' rewritten as WHERE col = 42::float regardless of column type. The doc's own type matrix says cross-type B-Tree support is integer-family-only — ::float is the wrong example type entirely. Eval 201's expected_output uses '12345'::integer, which is consistent with the matrix. Re-anchor the example to the column's actual type (e.g., for bigint columns: col = 42 or col = 42::bigint), and align with eval 201's ::integer casting. 1178334f
10 65 workflow.md Phase 0 L60-L74 vs SKILL.md L113-L117 — accuracy SKILL.md describes workflow.md as "follow its loading instructions rather than loading all files upfront", but Phase 0 mandates MUST read these four files before starting plus SHOULD also load two indexes — six files at workflow entry, the opposite of lazy loading. (The 13 individual rewrite sub-files are lazy-loaded — that's the genuine lazy-load surface.) Reword SKILL.md's **Contains:** to drop the misleading "rather than loading all files upfront" claim; describe the actual loading model ("loads 4–6 files at Phase 0; rewrite sub-files on-demand at Phase 2"). 1178334f
11 75 in-subquery-to-exists.md + subquery-unnesting-uncorrelated.md — redundancy / routing Both files target the same IN (SELECT …)EXISTS rewrite. Empirically validated as valuable on DSQL (43× speedup vs. mainline PG's auto-unnesting). The duplication remains: structurally identical examples, and Eval 200's expected_output cites them interchangeably ("subquery-unnesting-uncorrelated.md or in-subquery-to-exists.md") — a tell that the routing is undecidable. The "Uncorrelated IN-subquery" / "Large IN-subquery result set" split in query-rewrites-generic.md carries no deterministic predicate the agent can evaluate at trigger-time. Merge into one canonical file (subquery-unnesting-uncorrelated.md is the broader one — it covers the JOIN alternative and correlation gate; recommend deleting in-subquery-to-exists.md and routing all uncorrelated-IN-to-EXISTS prompts there). Keep correlated and scalar variants separate (genuinely different input shapes). 1178334f

Findings dropped after re-validation (full transcripts available on request):

  • (dropped, false positive) not-in-to-not-exists.md "Additional example" projection drop — re-read of L29-L48 shows Original is SELECT product_id FROM products, not SELECT *; projection is preserved (with table-qualification only).
  • (dropped, < 60) Array/JSON storage rule revert — SKILL.md routes to development-guide.md as authoritative, dev-guide still carries the longer correct form, so SKILL.md is a stale shorthand rather than a behavior regression.
  • (dropped, < 60) Eval results 206–210 missing transcripts — real but borderline; matches the PR body's explicit "n=1, manual qualitative" framing.
  • (dropped, < 60) Manual-only eval suite has no automated grader — author has explicitly committed to a future Python converter; tracking-debt rather than blocking.
  • (dropped, < 60) Three unresolved reviewer threads — process item, not a code defect.

Empirical context (run on live DSQL cluster):

  • Type-coercion bypass detection (the PR's headline feature) is real and load-bearing. A 20,000-row test_typecoerce(d date) USING btree_index showed WHERE d = '2024-06-01'::date → Index Cond, 1.2ms; WHERE d = '2024-06-01'::timestamp → Filter, 19,945 rows removed by Filter, 102ms. The cross-type pair is empirically present in pg_amop @ amopmethod=403 and absent at amopmethod=10003, exactly matching the skill's detection logic.
  • IN→EXISTS rewrite delivers ~43× speedup on DSQL. WHERE customer_id IN (SELECT customer_id FROM orders WHERE order_date > '2024-06-01') → Nested Loop, 4,632ms; equivalent EXISTS form → Hash Semi Join, 108ms. Mainline PG ≥ 8.4 auto-unnests both into a semi-join; DSQL does not. The rewrite is genuinely valuable on DSQL — author was right.
  • Propagate-filter rewrite delivers a cardinality-estimate improvement but no significant timing delta at this scale (84ms vs. 81ms). The planner does not derive the transitive predicate automatically. Effect would compound on larger joins.

🤖 Generated with Claude Code

If this code review was useful, please react with 👍. Otherwise, react with 👎.

@@ -0,0 +1,48 @@
# Rewrite: Propagate Filter to JOIN Columns
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I still hold to this? Why are we relying on markdown applications alone? Why are these transformations not written into script patterns/tooling?

| Supported column data types | See docs | `aurora dsql supported data types` |

**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs where hitting a limit would cause failures; any time the exact number can affect user decision.
**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to verify for general guidance or when the exact number doesn't affect the user's decision.

rephrase to be prescriptive rather than avoidant prohibition?


**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs where hitting a limit would cause failures; any time the exact number can affect user decision.
**When to verify:** Before recommending batch sizes, connection pool settings, or schema designs
where hitting a limit would cause failures. No need to verify for general guidance or when
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary and bad format/practice in skill linebreaks again? LOC consumption occupies additional token usage in skills frequently?

Comment on lines +221 to +224
**Recovery — batch fails midway:** Rows already updated keep their new value (each batch committed
in its own transaction). Resume by filtering on the unset state — e.g. add
`WHERE new_column IS NULL` (or the sentinel value) to the next UPDATE — and continue from there.
Re-running the entire migration is safe because the filter naturally excludes completed rows.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOC bloating here again?

@anwesham-lab
Copy link
Copy Markdown
Member

I think it's worth using our self-review skill and doing a couple of explicit passes with subagents deployed from the code-review and pr-toolkit-review plugins to get to an explicit convergence state that you can audit and post to log changes being made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants