Conversation
📝 WalkthroughWalkthroughThe pull request implements comprehensive support for transforms on cross-model measures by introducing dedicated CTE-generation methods, refactoring the core generation logic to distinguish window transforms from self-join transforms, and adding integration test coverage for the new functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant Query as SlayerQuery
participant Gen as SQLGenerator
participant CMBase as _build_cm_cte_sql()
participant TransformRouter as Transform Router
participant Window as Window Transforms<br/>(cumsum, lag, lead...)
participant SelfJoin as _build_cm_self_join_ctes()<br/>(change, change_pct...)
participant Final as Final SELECT
Query->>Gen: _generate_with_cross_model()
Gen->>CMBase: Build base cross-model CTE
CMBase->>Gen: Returns _main CTE<br/>(base aggregation + joins)
Gen->>TransformRouter: Categorize post-CM transforms
alt Window Transforms
TransformRouter->>Window: Apply as window functions<br/>over _main CTE
Window->>Final: Window function results
else Self-Join Transforms
TransformRouter->>SelfJoin: Generate CTE chain<br/>(ROW_NUMBER scaffold,<br/>time_offset calendar,<br/>final LEFT JOIN)
SelfJoin->>Final: Self-join CTE results
end
Final->>Gen: SELECT from appropriate CTE<br/>(deduplicated joins)
Gen->>Query: Materialized SQL
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/integration/test_integration.py (1)
1234-1261: LGTM, but consider adding test for multiple self-join transforms on same measure.This test validates the mixed window + self-join scenario well. However, given the potential issue identified in
generator.pywith multiple self-join transforms targeting the same cross-model measure, consider adding a test case like:Field(formula="change(customers.avg_score)", name="score_change"), Field(formula="time_shift(customers.avg_score, -1)", name="prev_score"),This would help verify (or catch) the edge case where two self-join CTE chains are needed for the same CM.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/test_integration.py` around lines 1234 - 1261, Add a new integration test (alongside test_mixed_window_and_selfjoin_on_cross_model) that constructs a SlayerQuery using the same cross-model measure twice via two self-join transforms (e.g., Field(formula="change(customers.avg_score)", name="score_change") and Field(formula="time_shift(customers.avg_score, -1)", name="prev_score")) to reproduce the generator.py edge case; ensure the test uses the same TimeDimension and ordering as the existing test, executes via engine.execute(query), and asserts correct row_count and expected values for both the multiple self-join-derived fields (verifying NULLs/shifted values where appropriate) so the code path that builds multiple self-join CTE chains is exercised.slayer/sql/generator.py (1)
659-659: Consider removing the unusedis_cteparameter.The parameter is assigned to
_to silence warnings, but all code paths now wrapbase_sqlas a CTE. If this parameter is truly obsolete, removing it would be cleaner than silencing it.♻️ Suggested cleanup
def _generate_with_cross_model(self, enriched: EnrichedQuery, - base_sql: str, is_cte: bool = False) -> str: + base_sql: str) -> str: """Wrap the main query with cross-model measure sub-queries. ... """ - _ = is_cte # All paths wrap base_sql as a CTE # Wrap the base/computed SQL as a CTEAnd update the call site at line 75-76:
- sql = self._generate_with_cross_model(enriched=enriched, base_sql=sql if has_computed else base_sql, - is_cte=has_computed) + sql = self._generate_with_cross_model(enriched=enriched, base_sql=sql if has_computed else base_sql)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@slayer/sql/generator.py` at line 659, Remove the now-unused is_cte parameter: delete it from the function signature that currently does "_ = is_cte" in slayer/sql/generator.py, remove the dummy assignment, and update the function's docstring/type hints accordingly; then update all call sites that pass an is_cte argument to that function to stop supplying it (and adjust any callers' argument ordering if necessary), and run/update unit tests or type checks that reference this function to ensure no remaining usages remain.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@slayer/sql/generator.py`:
- Around line 733-740: The current loop that sets source_cte for a cross-model
measure stops at the first matching sjoin CTE (it uses a break) so subsequent
self-join transforms for the same cm.alias are never exposed in the FROM/JOINs;
update the logic that iterates sjoin_cte_map.values() so every sjoin_cte_name
whose sjoin_cm.alias == cm.alias is handled (do not break), and make final_parts
reference the correct specific sjoin_cte_name for each transform rather than a
single shared source_cte; also ensure the JOIN construction code that consumes
these CTE names iterates over all matching sjoin_cte_names so each sjoin CTE is
actually joined into _main.
- Around line 768-787: The JOIN loop currently stops at the first sjoin CTE
matching a cross-model (due to the break), leaving additional CTEs unjoined;
update the logic in the block that builds FROM/_main so that for each (cte_name,
cm) in cm_cte_names you collect all sjoin CTE names whose sjoin_cm.alias ==
cm.alias (iterate over sjoin_cte_map.values() without breaking) and treat each
matching sjoin_cte_name as a separate join target (fall back to the base
cte_name if none match); for each join target (join_cte) that is not yet in
joined_ctes compute join_on_parts from cm.shared_dimensions and
cm.shared_time_dimensions and append a LEFT JOIN {join_cte} ON ... so every CTE
produced for that CM is joined rather than only the first.
---
Nitpick comments:
In `@slayer/sql/generator.py`:
- Line 659: Remove the now-unused is_cte parameter: delete it from the function
signature that currently does "_ = is_cte" in slayer/sql/generator.py, remove
the dummy assignment, and update the function's docstring/type hints
accordingly; then update all call sites that pass an is_cte argument to that
function to stop supplying it (and adjust any callers' argument ordering if
necessary), and run/update unit tests or type checks that reference this
function to ensure no remaining usages remain.
In `@tests/integration/test_integration.py`:
- Around line 1234-1261: Add a new integration test (alongside
test_mixed_window_and_selfjoin_on_cross_model) that constructs a SlayerQuery
using the same cross-model measure twice via two self-join transforms (e.g.,
Field(formula="change(customers.avg_score)", name="score_change") and
Field(formula="time_shift(customers.avg_score, -1)", name="prev_score")) to
reproduce the generator.py edge case; ensure the test uses the same
TimeDimension and ordering as the existing test, executes via
engine.execute(query), and asserts correct row_count and expected values for
both the multiple self-join-derived fields (verifying NULLs/shifted values where
appropriate) so the code path that builds multiple self-join CTE chains is
exercised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f10ae792-7e87-4cf9-bb79-60dafd84dd6c
📒 Files selected for processing (3)
docs/examples/05_joined_measures/joined_measures.mdslayer/sql/generator.pytests/integration/test_integration.py
| # If a self-join transform targets this CM, get the measure from the | ||
| # sjoin CTE (which carries it through); otherwise from the base CM CTE | ||
| source_cte = cte_name | ||
| for sjoin_cte_name, sjoin_cm in sjoin_cte_map.values(): | ||
| if sjoin_cm.alias == cm.alias: | ||
| source_cte = sjoin_cte_name | ||
| break | ||
| final_parts.append(f'{source_cte}."{cm.alias}"') |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the sjoin CTE structure - does it carry through the base measure?
# Look at _build_cm_self_join_ctes to see what columns the sjoin CTE selects
ast-grep --pattern $'def _build_cm_self_join_ctes($$$) {
$$$
}'Repository: MotleyAI/slayer
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Search for the SELECT in sjoin CTE construction
rg -n "sjoin_sql" slayer/sql/generator.py -A5 -B5Repository: MotleyAI/slayer
Length of output: 797
🏁 Script executed:
# Find where sjoin_cte_map is initialized and populated
rg -n "sjoin_cte_map" slayer/sql/generator.py | head -20Repository: MotleyAI/slayer
Length of output: 448
🏁 Script executed:
# Look at the broader context around line 638 to understand join_cols construction
sed -n '620,660p' slayer/sql/generator.pyRepository: MotleyAI/slayer
Length of output: 1809
🏁 Script executed:
# Search for where _build_cm_self_join_ctes or similar function is called
rg -n "build.*sjoin|self.*join.*cte" slayer/sql/generator.py -iRepository: MotleyAI/slayer
Length of output: 833
🏁 Script executed:
# Look at the loop that builds sjoin_cte_map to understand structure
rg -n "for.*sjoin|sjoin_cte_map\[" slayer/sql/generator.py -B5 -A5Repository: MotleyAI/slayer
Length of output: 3587
🏁 Script executed:
# Get lines 768-790 to see how JOINs are built
sed -n '768,790p' slayer/sql/generator.pyRepository: MotleyAI/slayer
Length of output: 1171
🏁 Script executed:
# Also check what happens after the loop - is the full FROM clause constructed?
sed -n '778,810p' slayer/sql/generator.pyRepository: MotleyAI/slayer
Length of output: 1537
🏁 Script executed:
# Search for how cm_cte_names is populated to understand if duplicates are possible
rg -n "cm_cte_names" slayer/sql/generator.py -B3 -A3Repository: MotleyAI/slayer
Length of output: 2667
🏁 Script executed:
# Look for _build_cm_self_join_ctes function definition to see if it can chain sjoin CTEs
rg -n "def _build_cm_self_join_ctes" slayer/sql/generator.py -A20Repository: MotleyAI/slayer
Length of output: 974
🏁 Script executed:
# Search for test cases with multiple transforms on same cross-model measure
rg -n "change.*time_shift|time_shift.*change" --type pyRepository: MotleyAI/slayer
Length of output: 1357
🏁 Script executed:
# Check if there are any safeguards or comments about multiple transforms on same CM
rg -n "multiple.*transform|duplicate.*transform" slayer/sql/generator.py -iRepository: MotleyAI/slayer
Length of output: 41
🏁 Script executed:
# Verify: when second transform is processed, does it use base CM CTE or previous sjoin CTE?
# Check the loop context around line 705-723 more carefully
sed -n '705,725p' slayer/sql/generator.pyRepository: MotleyAI/slayer
Length of output: 1020
Multiple self-join transforms on the same cross-model measure will fail with a missing table error.
If a query applies multiple self-join transforms to the same cross-model measure (e.g., both change(customers.avg_score) and time_shift(customers.avg_score, -2)), each transform creates its own sjoin CTE. However, the JOIN construction at lines 771-777 breaks after adding the first matching sjoin_cte for a given CM, leaving subsequent sjoin_ctes in the CTE definition but not in the FROM clause. When the SELECT clause tries to reference all transform results (lines 765-766), it fails because only the first sjoin_cte is properly joined to _main.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@slayer/sql/generator.py` around lines 733 - 740, The current loop that sets
source_cte for a cross-model measure stops at the first matching sjoin CTE (it
uses a break) so subsequent self-join transforms for the same cm.alias are never
exposed in the FROM/JOINs; update the logic that iterates sjoin_cte_map.values()
so every sjoin_cte_name whose sjoin_cm.alias == cm.alias is handled (do not
break), and make final_parts reference the correct specific sjoin_cte_name for
each transform rather than a single shared source_cte; also ensure the JOIN
construction code that consumes these CTE names iterates over all matching
sjoin_cte_names so each sjoin CTE is actually joined into _main.
| # Build JOINs: join each cross-model CTE (or its sjoin CTE) to _main | ||
| from_clause = "FROM _main" | ||
| joined_ctes = set() | ||
| for cte_name, cm in cm_cte_names: | ||
| if cte_name in joined_ctes: | ||
| # Determine which CTE to join: sjoin CTE if self-join transforms exist, else base CM CTE | ||
| join_cte = cte_name | ||
| for sjoin_cte_name, sjoin_cm in sjoin_cte_map.values(): | ||
| if sjoin_cm.alias == cm.alias: | ||
| join_cte = sjoin_cte_name | ||
| break | ||
| if join_cte in joined_ctes: | ||
| continue | ||
| joined_ctes.add(cte_name) | ||
| joined_ctes.add(join_cte) | ||
| join_on_parts = [] | ||
| for dim in cm.shared_dimensions: | ||
| join_on_parts.append(f'_main."{dim.alias}" = {cte_name}."{dim.alias}"') | ||
| join_on_parts.append(f'_main."{dim.alias}" = {join_cte}."{dim.alias}"') | ||
| for td in cm.shared_time_dimensions: | ||
| join_on_parts.append(f'_main."{td.alias}" = {cte_name}."{td.alias}"') | ||
| join_on_parts.append(f'_main."{td.alias}" = {join_cte}."{td.alias}"') | ||
| if join_on_parts: | ||
| from_clause += f"\nLEFT JOIN {cte_name} ON {' AND '.join(join_on_parts)}" | ||
| from_clause += f"\nLEFT JOIN {join_cte} ON {' AND '.join(join_on_parts)}" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there are tests for multiple self-join transforms on the same measure
rg -n "change.*time_shift|time_shift.*change" tests/ --type pyRepository: MotleyAI/slayer
Length of output: 324
🏁 Script executed:
#!/bin/bash
# Find where sjoin_cte_map is populated and understand its structure
rg -n "sjoin_cte_map\[" slayer/sql/generator.py -A 2 -B 2Repository: MotleyAI/slayer
Length of output: 579
🏁 Script executed:
#!/bin/bash
# Find where cm_cte_names is built
rg -n "cm_cte_names" slayer/sql/generator.py | head -20Repository: MotleyAI/slayer
Length of output: 394
🏁 Script executed:
#!/bin/bash
# Look for tests with multiple distinct self-join transforms on same cross-model measure
rg -n "change.*customers\.|time_shift.*customers\." tests/ --type py -A 3 -B 3Repository: MotleyAI/slayer
Length of output: 2408
🏁 Script executed:
#!/bin/bash
# Find where cm_self_join_transforms is defined and populated
rg -n "cm_self_join_transforms" slayer/sql/generator.py -B 5 -A 2Repository: MotleyAI/slayer
Length of output: 1263
🏁 Script executed:
#!/bin/bash
# Look at the context around lines 771-786 to understand the full JOIN logic
sed -n '680,790p' slayer/sql/generator.pyRepository: MotleyAI/slayer
Length of output: 5359
🏁 Script executed:
#!/bin/bash
# Check if there's a test with both change() and time_shift() on the same cross-model measure
rg -n "change\(" tests/ --type py -A 10 | rg -A 10 "time_shift"Repository: MotleyAI/slayer
Length of output: 2829
🏁 Script executed:
#!/bin/bash
# Search for any test with multiple distinct self-join transforms on same measure
# (not nested, but parallel like Field(change(...)) and Field(time_shift(...)))
rg -n "Field\(" tests/test_sql_generator.py -A 1 | rg -B 1 "change\(.*\)" | rg -A 10 "time_shift"Repository: MotleyAI/slayer
Length of output: 552
🏁 Script executed:
#!/bin/bash
# Check if any test has both change and time_shift in the same query (as separate fields)
rg "change\(" tests/test_sql_generator.py -A 5 | grep -c "time_shift"Repository: MotleyAI/slayer
Length of output: 60
🏁 Script executed:
#!/bin/bash
# Check the integration tests more carefully for same-measure scenarios
sed -n '1090,1110p' tests/integration/test_integration.pyRepository: MotleyAI/slayer
Length of output: 852
🏁 Script executed:
#!/bin/bash
# Look for tests with multiple fields on same cross-model measure
rg -n "Field.*change.*customers\.|Field.*time_shift.*customers\." tests/ --type py -B 5 -A 10 | head -40Repository: MotleyAI/slayer
Length of output: 3395
🏁 Script executed:
#!/bin/bash
# Look for test with BOTH change AND time_shift as separate fields on same CM
grep -n "Field.*formula.*change" tests/test_sql_generator.py -A 5 -B 5 | grep -A 5 -B 5 "time_shift"Repository: MotleyAI/slayer
Length of output: 1013
🏁 Script executed:
#!/bin/bash
# Create a minimal test case to see if this scenario is tested anywhere
rg "fields=\[" tests/ --type py -A 20 | rg -B 15 "change\(" | rg -A 15 "time_shift\(" | head -50Repository: MotleyAI/slayer
Length of output: 3899
🏁 Script executed:
#!/bin/bash
# Check the _build_cm_self_join_ctes method to confirm each transform creates separate CTE chains
rg -n "_build_cm_self_join_ctes" slayer/sql/generator.py -A 1Repository: MotleyAI/slayer
Length of output: 348
🏁 Script executed:
#!/bin/bash
# Look at the method signature and how it's called
sed -n '595,730p' slayer/sql/generator.py | grep -A 50 "_build_cm_self_join_ctes"Repository: MotleyAI/slayer
Length of output: 743
🏁 Script executed:
#!/bin/bash
# Confirm the scenario by looking at the actual flow:
# When we have two distinct self-join transforms on the same CM,
# each creates separate sjoin_cte entries, but the JOIN loop only uses first one
sed -n '771,790p' slayer/sql/generator.pyRepository: MotleyAI/slayer
Length of output: 1031
🏁 Script executed:
#!/bin/bash
# Check if there's defensive logic we missed - maybe duplicate CM entries or multi-CTE joins
sed -n '745,790p' slayer/sql/generator.pyRepository: MotleyAI/slayer
Length of output: 2207
Multiple self-join transforms targeting the same cross-model measure will fail with unjoined CTE reference.
When two distinct self-join transforms reference the same cross-model measure (e.g., Field(formula="change(customers.avg_score)") and Field(formula="time_shift(customers.avg_score, -1)")), each generates its own CTE chain. The JOIN building loop (lines 771-786) only joins the first matching CTE per CM due to the break statement, leaving the second transform's CTE unjoined. The subsequent SELECT references both CTEs (line 765-766), resulting in a SQL error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@slayer/sql/generator.py` around lines 768 - 787, The JOIN loop currently
stops at the first sjoin CTE matching a cross-model (due to the break), leaving
additional CTEs unjoined; update the logic in the block that builds FROM/_main
so that for each (cte_name, cm) in cm_cte_names you collect all sjoin CTE names
whose sjoin_cm.alias == cm.alias (iterate over sjoin_cte_map.values() without
breaking) and treat each matching sjoin_cte_name as a separate join target (fall
back to the base cte_name if none match); for each join target (join_cte) that
is not yet in joined_ctes compute join_on_parts from cm.shared_dimensions and
cm.shared_time_dimensions and append a LEFT JOIN {join_cte} ON ... so every CTE
produced for that CM is joined rather than only the first.
Support all transforms on cross-model measures
Summary
Problem
Cross-model measures (e.g., customers.avg_score queried from orders) are computed as separate CTE sub-queries and joined back to the main query. Transforms on these measures were applied post-join in the outer SELECT via _build_transform_sql(), which only handles window functions. Self-join transforms (change, change_pct, time_shift) hit a raise ValueError because they require their own CTE chain (shifted base + self-join), not a window function.
Approach
Extended _generate_with_cross_model() in generator.py to handle self-join transforms by generating additional CTEs at the same level as the cross-model CTE — no nested WITH clauses needed.
For example, change(customers.avg_score) now produces:
WITH
_main AS (...),
_cm_customers_avg_score AS (SELECT month, AVG(score) ... GROUP BY month),
_cm_customers_avg_score_rn AS (SELECT ..., ROW_NUMBER() OVER (...) FROM _cm_customers_avg_score),
shifted_base_cm_score_change AS (...),
shifted_cm_score_change AS (SELECT ..., ROW_NUMBER() OVER (...) FROM shifted_base_cm_score_change),
sjoin_cm_score_change AS (SELECT ..., cur - prev AS change FROM ... LEFT JOIN ... ON rn offset)
SELECT _main.*, sjoin_cm_score_change.avg_score, sjoin_cm_score_change.change
FROM _main LEFT JOIN sjoin_cm_score_change ON shared_dims
Key changes to generator.py:
calendar-based joins. Reuses existing _build_row_number_join(), _build_self_join_column(), and _build_time_offset_expr().
CTE chain via the new helper.
Summary by CodeRabbit
New Features
change(),change_pct(),time_shift(),lag(),lead(), andrank().Documentation