Skip to content

EXPLAIN / EXPLAIN ANALYZE / EXPLAIN (FORMAT JSON)#23

Merged
whilo merged 1 commit into
mainfrom
explain-pg-style
May 11, 2026
Merged

EXPLAIN / EXPLAIN ANALYZE / EXPLAIN (FORMAT JSON)#23
whilo merged 1 commit into
mainfrom
explain-pg-style

Conversation

@whilo
Copy link
Copy Markdown
Member

@whilo whilo commented May 11, 2026

Summary

  • Postgres-style indented EXPLAIN text renderer + DuckDB-shape JSON renderer, replacing the legacy single-strategy summary that was effectively unusable through pgwire.
  • EXPLAIN ANALYZE now runs the query under instrumentation and surfaces per-node actual-rows + time-ms plus an Execution Time: footer — inclusive timing per Postgres/DuckDB convention.
  • SQL parser accepts the full grammar: EXPLAIN, EXPLAIN ANALYZE, EXPLAIN VERBOSE, EXPLAIN (ANALYZE), EXPLAIN (FORMAT JSON), EXPLAIN (ANALYZE, FORMAT JSON).
  • Fixes regression bug: EXPLAIN ... WHERE p previously produced byte-identical output to the no-WHERE plan on PSplitAgg sub-plans (filter was applied at runtime, just dropped from the printed tree). Includes a direct regression test.

What changed

File Change
plan.clj plan->data data model, node-details uniform [label, value] pairs, render-text (Postgres-style), render-json (DuckDB shape), format-pred / format-agg SQL-like printers
executor.clj *explain-collector* dynamic var; execute-node split into wrap + impl for inclusive per-node timing; explain-analyze-query entry point
sql.clj parse-explain-prefix handles all six modifier syntaxes; new richer return shape {:options {...} :inner {...}}
server.clj format-explain-result rewritten to dispatch on :format; one pgwire row per tree line for text, single row for JSON; inline json-write-str (no new dep)
api.clj / query.clj :analyze? / :format opts; legacy :strategy / :n-rows / :columns keys preserved
test/stratum/explain_test.clj 12 tests / 51 assertions: text shape, JSON shape, ANALYZE timing, all SQL variants, end-to-end, legacy back-compat, the PSplitAgg-filter regression
test/stratum/asof_join_test.clj Regex match=Match Cond: to match the new labeled format

Sample output

EXPLAIN ANALYZE SELECT MIN(a), MEDIAN(a), MAX(a) FROM t WHERE a > 100:

 PSplitAgg  (est-rows=1 actual-rows=1 time=1.689ms)
    Aggregates: [min(a), median(a), max(a)]
    Strategies: 2
    ->  PDenseGroupBy  (est-rows=1000 actual-rows=1 time=0.696ms)
          Aggregates: [min(a), max(a)]
          Max Key: 1
          Filter: (a > 100)
          ->  PScan  (est-rows=1000 actual-rows=1000 time=0.058ms)
                Columns: [:a]
                Length: 1000
    ->  PPercentileAgg  (est-rows=1 actual-rows=1 time=0.750ms)
          Aggregates: [median(a)]
          Filter: (a > 100)
          ->  PScan  (est-rows=1000 actual-rows=1000 time=0.003ms)
                Columns: [:a]
                Length: 1000
 Execution Time: 1.760 ms

EXPLAIN (FORMAT JSON) ...:

[{"name": "PSplitAgg",
  "children": [{"name": "PDenseGroupBy", ...,
                "extra_info": {"Aggregates": "[min(a), max(a)]",
                               "Filter": "(a > 100)",
                               "__estimated_cardinality__": 1000}},
               ...],
  "extra_info": {"Aggregates": "[min(a), median(a), max(a)]",
                 "Strategies": "2",
                 "__estimated_cardinality__": 1}}]

Out of scope (deliberate)

  • Logical-vs-optimized-vs-physical multi-plan output (we only have one physical plan).
  • HTML / GRAPHVIZ / YAML / MERMAID formats (tooling can consume the JSON).
  • Buffer / IO stats (no instrumentation today).
  • Per-thread breakdown of morsel-parallel nodes (single wall-clock per node).

Test plan

  • Full suite: 1024 tests, 4731 assertions, 0 failures.
  • clj -M:ffix applied (cljfmt clean).
  • Manual: connect via psql, run EXPLAIN, EXPLAIN ANALYZE, EXPLAIN (FORMAT JSON) against a registered table.
  • Manual: verify ANALYZE timings sum sensibly on a deep plan (joins + group-by).

…t + DuckDB-shape JSON

Before this change, `EXPLAIN` over the pgwire interface was effectively
unusable: `format-explain-result` rendered five summary lines that were
all blank for any node the legacy formatter didn't special-case (most
notably PSplitAgg), and `EXPLAIN ANALYZE` was rejected by the regex.
The plan-tree text printer also dropped predicates on PDenseGroupBy /
PPercentileAgg, so `EXPLAIN ... WHERE p` produced byte-identical output
to the no-WHERE plan — making it impossible to tell from the explain
that the filter was pushed down (the filter was applied at runtime).

This rewrite replaces the ad-hoc string formatter with a structured
data model and two renderers, and threads per-node timing through the
executor for ANALYZE.

  plan.clj
    - `plan->data` walks the physical/logical tree and returns a
      serializable map per node:
        {:op :node-id :est-rows :sel :details :children :child-tags
         (:actual-rows :time-ms when ANALYZE)}
    - `node-details` produces uniform `[label string-value]` pairs:
      Filter / Group Keys / Aggregates / Columns / Join Type / Hash
      Cond / Match Cond / Max Key / Extract / etc. Adding a node-type
      now means extending one cond, not 12 ad-hoc printers.
    - `render-text` produces a Postgres-style indented tree with
      `->  ` arrows, labeled sub-lines, and a
      `(est-rows=N sel=S actual-rows=M time=T.TTms)` cost suffix.
      Renders the `Execution Time:` footer when ANALYZE data is present.
    - `render-json` produces a DuckDB-shape `[{name, children,
      extra_info}]` data structure with `__estimated_cardinality__`
      / `__cardinality__` / `__timing__` / `__selectivity__` internal
      keys. Numbers stay JSON numbers, not "N rows" strings.
    - `format-pred` / `format-agg` / `format-col-or-expr` render the
      normalized IR back to SQL-like text (`(a > 100)`, `sum(price)`,
      `(a * b)`).

  executor.clj
    - `*explain-collector*` dynamic var (atom). When bound,
      `execute-node` records `(System/nanoTime)` deltas and output
      row count per call, keyed by node identity-hash.
    - `execute-node` now splits into a public wrap (measures + records)
      and `execute-node-impl` (the dispatch cond). Inclusive timing:
      every recursive child call also goes through the wrap, so a
      parent's recorded time naturally includes its children's —
      matching Postgres / DuckDB EXPLAIN ANALYZE semantics. Zero
      overhead when the collector is nil (one nil check per call).
    - `count-output` handles the three return shapes
      (column ctx, columnar result, row vec).
    - `explain-analyze-query` runs the query under the collector,
      merges per-node timings into the data tree, and adds a root
      `:execution-time-ms`.

  sql.clj
    - `parse-explain-prefix` accepts the full DuckDB/Postgres grammar:
        EXPLAIN <sql>
        EXPLAIN ANALYZE <sql>
        EXPLAIN VERBOSE <sql>
        EXPLAIN (ANALYZE) <sql>
        EXPLAIN (FORMAT JSON) <sql>
        EXPLAIN (ANALYZE, FORMAT JSON) <sql>
    - Returns the new richer shape:
        {:explain {:options {:analyze? :format} :inner {:query|:system ...}}}
      so the wire layer can dispatch on format without re-parsing.

  server.clj
    - `format-explain-result` rewritten to consume the new shape and
      dispatch on `:format`. Text: one pgwire row per tree line under
      column "QUERY PLAN" (matches Postgres' EXPLAIN output exactly).
      JSON: a single row with the pretty-printed JSON document.
    - Inline `json-write-str` avoids a new dep on `clojure.data.json`.

  api.clj / query.clj
    - `explain` accepts `{:analyze? :format}` opts. Result map adds
      `:plan-data` (structured tree), `:plan-tree` (text render),
      `:plan-json` (DuckDB-shape data) when JSON requested,
      `:execution-time-ms` when ANALYZE. Legacy `:strategy` /
      `:n-rows` / `:columns` keys preserved so existing callers don't
      break. SQL EXPLAIN options auto-propagate when calling
      `api/explain` with an `EXPLAIN ...` prefix string.

Tests: 12 new tests / 51 assertions in `test/stratum/explain_test.clj`
covering text shape, JSON shape, ANALYZE timing presence, all six SQL
syntax variants, end-to-end SQL prefix flow, legacy key back-compat,
and a direct regression test that PSplitAgg + WHERE renders
differently from PSplitAgg alone (the original bug). One existing
test updated (`asof_join_test.clj`: `match=` → `Match Cond:`). Full
suite passes: 1024 tests, 4731 assertions, 0 failures.

Out of scope for this PR (deliberate):
  - Logical-vs-optimized-vs-physical multi-plan output. We only have
    one physical plan, so DuckDB's three-banner output is overkill.
  - HTML / GRAPHVIZ / YAML / MERMAID formats. Tooling can consume the
    JSON.
  - Buffer / IO stats. No instrumentation today.
  - Per-thread breakdown of morsel-parallel nodes. Single wall-clock
    span per node is what users want for latency triage.

Signed-off-by: Christian Weilbach <christian@weilbach.name>
@whilo whilo merged commit a118779 into main May 11, 2026
5 of 6 checks passed
@whilo whilo deleted the explain-pg-style branch May 11, 2026 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant