Skip to content

feat(cubesql): merge view joins on a shared cube member into a single CubeScan#10977

Open
paveltiunov wants to merge 8 commits into
masterfrom
cursor/view-join-shared-member-375b
Open

feat(cubesql): merge view joins on a shared cube member into a single CubeScan#10977
paveltiunov wants to merge 8 commits into
masterfrom
cursor/view-join-shared-member-375b

Conversation

@paveltiunov
Copy link
Copy Markdown
Member

@paveltiunov paveltiunov commented May 31, 2026

Summary

Implements support for joining two views (or cubes) in the SQL API when the join is on a dimension that resolves to the same underlying cube member, e.g.:

SELECT c.customer_city, measure(o.revenue), measure(c.avg_age)
FROM customers_view c
LEFT JOIN orders_view o ON o.customer_city = c.customer_city
GROUP BY 1

This is a purely local rewrite in the SQL API (CubeSQL egg rewriter). It converts a join between two CubeScans on views into a single CubeScan over the combined members — exactly like the existing __cubeJoinField cube-to-cube join rewrite. The merged scan is then handled by the query planner (Tesseract) as a multi-fact query over the shared key.

Approach

The change generalizes the existing push-down-cube-join rewrite. In addition to the classic left.__cubeJoinField = right.__cubeJoinField condition, the transform now also recognizes an equi-join whose left/right columns resolve to dimension members that share the same underlying cube.dimension.

A view dimension keeps its original cube.dimension path in the aliasMember field of the metadata; this is used to detect that both sides of the join reference the same shared key. When they do, the two scans are merged with combined members, filters, and join hints — identical to any other cube-to-cube join.

Merge gate: the join key must be fully within dimensions

The merge only fires when the entire join key resolves to dimensions (or time dimensions) on both sides and to the same underlying cube member. A join key that touches a measure/segment/etc., or that mixes underlying members, is rejected and the join falls back to normal (non-merged) handling. This also naturally scopes the rule: unrelated cubes (dimensions resolving to distinct members) are never merged, so existing behavior is unchanged.

Join semantics (inner / left / right / full)

The downstream multi-fact planner renders the stitched scan as a FULL OUTER JOIN over the shared key. To recover the SQL join semantics requested by the query, the rewrite adds a set filter on the join key of each side that must be present:

SQL join Filters added
FULL none (full outer is already correct)
INNER the join key of both sides must be set
LEFT the left join key must be set
RIGHT the right join key must be set

The join key (the ON-clause dimension on each side) is always present and is the actual shared-key presence marker, so it is used instead of a measure.

Changes

  • rust/cubesql/.../rewrite/rules/members.rs:
    • generalize the push_down_cube_join transform to also accept a shared-underlying-member dimension equi-join (resolving each join column's dimension to its alias_member);
    • gate the merge on the join key being fully within dimensions;
    • capture the matched per-side join-key members and add set filters on the required side(s) to enforce inner/left/right semantics on the FULL OUTER multi-fact stitch.
  • rust/cubesql/.../test/test_cube_join_views.rs: tests for the merge, the join-semantics filtering (ungrouped, grouped LEFT, grouped INNER), and a negative test that a join on a measure is not merged.

Testing

  • cargo test -p cubesql --lib compile::663 passed, 0 failed (includes the existing join suites and the four new view-join tests).
Open in Web Open in Cursor 

@github-actions github-actions Bot added the rust Pull requests that update Rust code label May 31, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 1, 2026

Codecov Report

❌ Patch coverage is 96.27329% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.38%. Comparing base (0d6393b) to head (9a682af).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...besql/cubesql/src/compile/rewrite/rules/members.rs 93.44% 8 Missing ⚠️
...l/cubesql/src/compile/test/test_cube_join_views.rs 98.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10977      +/-   ##
==========================================
+ Coverage   83.32%   83.38%   +0.05%     
==========================================
  Files         255      256       +1     
  Lines       77261    77581     +320     
==========================================
+ Hits        64379    64689     +310     
- Misses      12882    12892      +10     
Flag Coverage Δ
cubesql 83.38% <96.27%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@paveltiunov paveltiunov marked this pull request as ready for review June 3, 2026 01:17
@paveltiunov paveltiunov requested a review from a team as a code owner June 3, 2026 01:17
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 3, 2026

Claude finished @paveltiunov's task in 2m 14s —— View job


Claude Code Review

Verdict: LGTM with a few nits. No correctness blockers found. 1 low (test gap), 2 nits (cleanliness/efficiency).

Full review (expand)
  • Read PR context
  • Read changed files (members.rs push_down_cube_join + test_cube_join_views.rs)
  • Check existing review threads (none)
  • Post review

Summary

The change cleanly generalizes push_down_cube_join to also match dimension equi-joins whose left/right columns resolve to the same underlying cube member (via alias_member). The merge gate (join key fully within dimensions, every column pair resolving to the same underlying member) is appropriately conservative — joins on measures/segments or on different underlying members fall through to the existing non-merged handling, so risk of regression in unrelated query shapes is low. Re-introducing INNER/LEFT/RIGHT semantics on top of the multi-fact FULL OUTER stitch by injecting set filters on the required-side join keys is a sensible approach and matches the prior PR refactor away from a measure-based presence marker.

Findings

[Low] Test coverage gap for RIGHT JOIN and FULL JOIN.
test_cube_join_views.rs covers the un-grouped LEFT, grouped LEFT, grouped INNER, and a negative (measure key). The semantics table in the PR description lists FULL (no filters) and RIGHT (right-side set filter), but neither is asserted. A 4th case for RIGHT JOIN (and optionally a FULL JOIN asserting filters: Some(vec![])) would lock in the behavior described in the table — members.rs:2947-2955 is otherwise the only spec for it.

[Nit] Dangling egraph mutations on no-op outcome.
In push_down_cube_join (members.rs:2966-2986), egraph.add(...) for the filter chain and the subst.insert(left_filters_var, acc) run before the nested 4-deep var_iter! loops at members.rs:2989-3034. If those loops yield zero combinations, the function returns false while leaving orphan filter e-nodes in the egraph and a stale subst mutation. egg dedups identical adds so it's not a correctness or blow-up risk, but it's tidier to construct the filter chain inside the innermost loop right before return true, or after the alias_to_cube validation passes. Today's pattern relies on egg's idempotency and on the outer loops effectively always being non-empty (because is_proper_cube_join_condition / the shared-member detection has already validated members exist).

[Nit] Composite-key cube name only carries the last column.
members.rs:2910-2914 overwrites left_cube_name / right_cube_name each loop iteration via name.split('.').next(). For a single-column join (the only realistic case today this rewrite fires on) this is fine, but for a hypothetical composite key spanning columns from different cubes on one side, the code would silently pick whichever cube the last column belongs to. Worth either asserting all columns share a cube prefix (return None otherwise) or adding a comment that single-cube-per-side is the assumed precondition.

Other notes

  • The verbose crate::compile::rewrite::FilterMember{Member,Op,Values} paths at members.rs:2969-2977 are inconsistent with the rest of the file's import style. Optional cleanup — add to the existing use block at the top.
  • The shared-member path does not gate on member_name_to_expr.is_some() the way is_proper_cube_join_condition does (members.rs:3229-3237). In practice the e-class analysis is populated by the time joins are considered, but it's worth a quick check or a comment noting why the guard isn't needed here.
  • Comments are thorough and the rationale (FULL OUTER stitch + set filters to recover join semantics) is well captured at the code level.
· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

cursoragent and others added 8 commits June 5, 2026 23:46
…eScan

Generalize the push-down-cube-join rewrite so that a join between two
CubeScans (typically views) on a dimension that resolves to the same
underlying cube member is merged into a single CubeScan, just like the
existing __cubeJoinField cube-to-cube join. A view dimension keeps its
original cube.dimension path in alias_member, which is used to detect
that both sides of the equi-join reference the same shared key.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Mirror the motivating query exactly:
  SELECT c.customer_city, measure(o.revenue), measure(c.avg_age)
  FROM customers_view c
  LEFT JOIN orders_view o ON o.customer_city = c.customer_city
  GROUP BY 1
and assert it merges into a single grouped multi-fact CubeScan.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
When merging a join between two views on a shared cube member, the
downstream multi-fact query is rendered as a FULL OUTER JOIN over the
shared key. To recover the requested join semantics, the rewrite now
adds a measure 'set' filter on each side that must be present:
- INNER: both sides required
- LEFT:  left side required
- RIGHT: right side required
- FULL:  no extra filter

Branch presence is detected via a measure of the side (the grouping key
is COALESCEd across sides downstream, so it cannot tell sides apart).
Covered with left/inner group-by tests.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…e filter

Detect side presence with the side's join-key dimension instead of an
arbitrary measure. The join key is always available and is the actual
shared-key marker, avoiding the nullable-measure caveat and the case
where a side has no selected measure.

- LEFT:  left join key must be set
- RIGHT: right join key must be set
- INNER: both join keys must be set
- FULL:  no extra filter

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…n dimensions

Make the merge gate explicit: the entire join key must resolve to
dimensions (or time dimensions) on both sides and to the same underlying
cube member. A join key that touches a measure/segment/etc. is rejected
and the join falls back to normal (non-merged) handling. Add a negative
test that joining two views on measures is not merged.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
- Add #[cfg(test)] to the test_cube_join_views module so it is not
  compiled into non-test builds (fixes unresolved pretty_assertions and
  unused-import errors under clippy -D warnings and the native builds).
- Remove the unused right_filters_var from push_down_cube_join.

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
…shift)

Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
@cursor cursor Bot force-pushed the cursor/view-join-shared-member-375b branch from 21aa80a to 9a682af Compare June 5, 2026 23:51
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jun 5, 2026

Claude finished @cursor[bot]'s task in 46s —— View job


Claude Code Review

Verdict: LGTM. No correctness blockers. The merge gate ("join key fully within dimensions, same underlying member on both sides") is conservative, so non-view/unrelated joins fall through unchanged. The set-filter trick to recover INNER/LEFT/RIGHT on top of the FULL OUTER multi-fact stitch is consistent with the prior PR refactor away from a measure-based presence marker.

Notes (expand) — 1 low, 2 nits; all re-affirmed from the prior review pass

[Low] No test for RIGHT or FULL join semantics.
test_cube_join_views.rs covers ungrouped/grouped LEFT, grouped INNER, and a negative (measure key). The semantics table in the PR description lists FULL (no filters) and RIGHT (right-side set). Both branches in members.rs:2947-2955 are otherwise unexercised — a quick RIGHT case and a FULL case (asserting filters: Some(vec![])) would lock the table in.

[Nit] Egraph/subst mutation runs before the alias_to_cube validation.
members.rs:2966-2986 adds filter e-nodes and calls subst.insert(left_filters_var, acc) before the nested var_iter! loops at members.rs:2989-3034 decide whether to return true. If those loops yield zero combinations (rare in practice — is_proper_cube_join_condition/shared-member detection already validated the members), the function returns false having left orphan filter nodes and a stale subst entry. egg dedupes identical adds, so this isn't a correctness or blow-up risk, but constructing the filter chain inside the innermost loop right before return true (or after the alias_to_cube iteration produces ≥1 pair) would be tidier.

[Nit] Composite-key cube name only carries the last column.
members.rs:2910-2914 overwrites left_cube_name/right_cube_name each loop iteration via name.split('.').next(). Fine for single-column joins (the only case the rewrite realistically fires on today), but for a hypothetical composite key whose columns span different cubes on one side, the code silently picks whichever cube the last column belongs to. Either assert all columns share a cube prefix (return None otherwise) or add a comment that single-cube-per-side is the assumed precondition.

Style: the verbose crate::compile::rewrite::FilterMember* paths at members.rs:2969-2977 are inconsistent with the file's import style — fold them into the existing use block when convenient.

· [branch](https://github.com/cube-js/cube/tree/cursor/view-join-shared-member-375b)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants