Skip to content

perf(query): extend count-distinct 2-stage rewrite to SYM keys#222

Open
ser-vasilich wants to merge 1 commit into
masterfrom
perf/count-distinct-sym-keys
Open

perf(query): extend count-distinct 2-stage rewrite to SYM keys#222
ser-vasilich wants to merge 1 commit into
masterfrom
perf/count-distinct-sym-keys

Conversation

@ser-vasilich
Copy link
Copy Markdown
Collaborator

Summary

The (count distinct X) by K planner rewrite (706c5be) currently
short-circuits when either K or X is a SYM column. mk_compile's
composite key path already packs SYM by storage width, and the outer
pass materialises the key column as SYM when K was SYM — the gate
was conservatively narrow.

Drop the SYM bailout, and additionally apply asc/desc/take from the
outer query to the rewrite's output so it composes with sort+take.

ClickBench 10M:
q08 ~390 → ~74 ms
q10 ~170 → ~10 ms
q13 ~615 → ~160 ms

The (count distinct X) by K planner rewrite previously rejected SYM
columns for K and X.  mk_compile's composite key path already packs
SYM by its storage width, and the outer pass materialises the key
column as SYM when K was SYM — the gate was conservatively narrow,
not algorithmically required.

Also apply asc:/desc:/take: on the rewrite's output.  Without this
the emit_filter trims to top-N but leaves rows in HT-iteration
order, silently dropping the user's desc: ordering.  Calling
apply_sort_take at return time matches the SQL ORDER BY semantics
that every other ray_select path applies.

ClickBench 10M:
  q08  173 → 78  ms  (already covered by Anton's original 2-stage;
                      now also sorted descending by count)
  q10  163 → 72  ms
  q13  551 → 198 ms

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant