feat(cubesql): merge view joins on a shared cube member into a single CubeScan#10977
feat(cubesql): merge view joins on a shared cube member into a single CubeScan#10977paveltiunov wants to merge 8 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #10977 +/- ##
==========================================
+ Coverage 83.32% 83.38% +0.05%
==========================================
Files 255 256 +1
Lines 77261 77581 +320
==========================================
+ Hits 64379 64689 +310
- Misses 12882 12892 +10
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Claude finished @paveltiunov's task in 2m 14s —— View job Claude Code ReviewVerdict: LGTM with a few nits. No correctness blockers found. 1 low (test gap), 2 nits (cleanliness/efficiency). Full review (expand)
SummaryThe change cleanly generalizes Findings[Low] Test coverage gap for RIGHT JOIN and FULL JOIN. [Nit] Dangling egraph mutations on no-op outcome. [Nit] Composite-key cube name only carries the last column. Other notes
|
…eScan Generalize the push-down-cube-join rewrite so that a join between two CubeScans (typically views) on a dimension that resolves to the same underlying cube member is merged into a single CubeScan, just like the existing __cubeJoinField cube-to-cube join. A view dimension keeps its original cube.dimension path in alias_member, which is used to detect that both sides of the equi-join reference the same shared key. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Mirror the motivating query exactly: SELECT c.customer_city, measure(o.revenue), measure(c.avg_age) FROM customers_view c LEFT JOIN orders_view o ON o.customer_city = c.customer_city GROUP BY 1 and assert it merges into a single grouped multi-fact CubeScan. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
When merging a join between two views on a shared cube member, the downstream multi-fact query is rendered as a FULL OUTER JOIN over the shared key. To recover the requested join semantics, the rewrite now adds a measure 'set' filter on each side that must be present: - INNER: both sides required - LEFT: left side required - RIGHT: right side required - FULL: no extra filter Branch presence is detected via a measure of the side (the grouping key is COALESCEd across sides downstream, so it cannot tell sides apart). Covered with left/inner group-by tests. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…e filter Detect side presence with the side's join-key dimension instead of an arbitrary measure. The join key is always available and is the actual shared-key marker, avoiding the nullable-measure caveat and the case where a side has no selected measure. - LEFT: left join key must be set - RIGHT: right join key must be set - INNER: both join keys must be set - FULL: no extra filter Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…n dimensions Make the merge gate explicit: the entire join key must resolve to dimensions (or time dimensions) on both sides and to the same underlying cube member. A join key that touches a measure/segment/etc. is rejected and the join falls back to normal (non-merged) handling. Add a negative test that joining two views on measures is not merged. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
- Add #[cfg(test)] to the test_cube_join_views module so it is not compiled into non-test builds (fixes unresolved pretty_assertions and unused-import errors under clippy -D warnings and the native builds). - Remove the unused right_filters_var from push_down_cube_join. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…shift) Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
21aa80a to
9a682af
Compare
|
Claude finished @cursor[bot]'s task in 46s —— View job Claude Code ReviewVerdict: LGTM. No correctness blockers. The merge gate ("join key fully within dimensions, same underlying member on both sides") is conservative, so non-view/unrelated joins fall through unchanged. The Notes (expand) — 1 low, 2 nits; all re-affirmed from the prior review pass[Low] No test for RIGHT or FULL join semantics. [Nit] Egraph/subst mutation runs before the alias_to_cube validation. [Nit] Composite-key cube name only carries the last column. Style: the verbose |
Summary
Implements support for joining two views (or cubes) in the SQL API when the join is on a dimension that resolves to the same underlying cube member, e.g.:
This is a purely local rewrite in the SQL API (CubeSQL egg rewriter). It converts a join between two
CubeScans on views into a singleCubeScanover the combined members — exactly like the existing__cubeJoinFieldcube-to-cube join rewrite. The merged scan is then handled by the query planner (Tesseract) as a multi-fact query over the shared key.Approach
The change generalizes the existing
push-down-cube-joinrewrite. In addition to the classicleft.__cubeJoinField = right.__cubeJoinFieldcondition, the transform now also recognizes an equi-join whose left/right columns resolve to dimension members that share the same underlyingcube.dimension.A view dimension keeps its original
cube.dimensionpath in thealiasMemberfield of the metadata; this is used to detect that both sides of the join reference the same shared key. When they do, the two scans are merged with combined members, filters, and join hints — identical to any other cube-to-cube join.Merge gate: the join key must be fully within dimensions
The merge only fires when the entire join key resolves to dimensions (or time dimensions) on both sides and to the same underlying cube member. A join key that touches a measure/segment/etc., or that mixes underlying members, is rejected and the join falls back to normal (non-merged) handling. This also naturally scopes the rule: unrelated cubes (dimensions resolving to distinct members) are never merged, so existing behavior is unchanged.
Join semantics (inner / left / right / full)
The downstream multi-fact planner renders the stitched scan as a FULL OUTER JOIN over the shared key. To recover the SQL join semantics requested by the query, the rewrite adds a
setfilter on the join key of each side that must be present:FULLINNERsetLEFTsetRIGHTsetThe join key (the ON-clause dimension on each side) is always present and is the actual shared-key presence marker, so it is used instead of a measure.
Changes
rust/cubesql/.../rewrite/rules/members.rs:push_down_cube_jointransform to also accept a shared-underlying-member dimension equi-join (resolving each join column's dimension to itsalias_member);setfilters on the required side(s) to enforce inner/left/right semantics on the FULL OUTER multi-fact stitch.rust/cubesql/.../test/test_cube_join_views.rs: tests for the merge, the join-semantics filtering (ungrouped, grouped LEFT, grouped INNER), and a negative test that a join on a measure is not merged.Testing
cargo test -p cubesql --lib compile::— 663 passed, 0 failed (includes the existing join suites and the four new view-join tests).