Skip to content

Refactor query_engine.py#17

Merged
AivanF merged 2 commits intomainfrom
aivan/2026-04-09-Refactor
Apr 9, 2026
Merged

Refactor query_engine.py#17
AivanF merged 2 commits intomainfrom
aivan/2026-04-09-Refactor

Conversation

@AivanF
Copy link
Copy Markdown
Collaborator

@AivanF AivanF commented Apr 9, 2026

Summary by CodeRabbit

Release Notes

  • Refactor

    • Restructured query processing pipeline to improve code organization and maintainability
  • Documentation

    • Enhanced example code with improved formatting and clearer terminology for better readability
  • Tests

    • Updated test suite to align with refactored architecture

@AivanF AivanF requested a review from ZmeiGorynych April 9, 2026 10:52
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 9, 2026

📝 Walkthrough

Walkthrough

A new query enrichment module extracts the enrichment phase from the query engine, transforming SlayerQuery into EnrichedQuery while handling dimension/join resolution, filter classification, and cross-model measure support. Supporting code is refactored to delegate enrichment and align terminology from "rollup" to "joins".

Changes

Cohort / File(s) Summary
New Enrichment Module
slayer/engine/enrichment.py
Introduces 709 lines implementing the enrichment phase: enrich_query() orchestrates resolution of dimensions, time dimensions, joins, filters, and cross-model measures. Helper functions handle dimension resolution, time-alias selection, last-aggregation detection, join dependency discovery via BFS, filter transform extraction, column qualification, and WHERE/HAVING/post-filter classification.
Query Engine Refactoring
slayer/engine/query_engine.py
Delegates enrichment to new enrich_query() function via engine callbacks. Removed 666 lines of inline enrichment logic (dimension/time resolution, filter preprocessing, join discovery). _resolve_join_target callback now returns tuple of table SQL and target instead of full join tuples.
Example/Test Verification Updates
examples/verify_common.py, tests/test_sql_generator.py
Updated to use new enrich_query() from enrichment module. verify_common.py renamed "rollup" concept to "joins", changed join detection from dotted dimension inspection to explicit get("joins", []), and updated cross-model query paths from regions.name to customers.regions.name. Test helper uses no-op resolver lambdas.
Formatting & Minor Adjustments
examples/embedded/verify.py, examples/seed.py
Converted single-line engine.execute() calls to multi-line format; adjusted string formatting spacing in separators and f-strings. Simplified imports in seed.py to keep only datetime.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Possibly related PRs

  • Dynamic Joins & Multi-Stage Queries #9: Directly related—both PRs enhance query enrichment, cross-model join/measure handling, EnrichedQuery fields, join resolution logic, and SQL generation for cross-model measures.

Suggested reviewers

  • ZmeiGorynych

Poem

🐰 Enrichment flows through pipelines clean,
Dimensions bloom where joins convene,
From query spark to SQL bright,
We parse and classify just right!
Cross-model fields now take their flight ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Refactor query_engine.py' is overly generic and does not clearly convey the main architectural change: extracting enrichment logic into a dedicated module. Consider a more descriptive title such as 'Extract query enrichment logic into dedicated module' or 'Move enrichment to separate engine.enrichment module' to better reflect the primary refactoring intent.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch aivan/2026-04-09-Refactor

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
slayer/engine/enrichment.py (1)

459-464: Substring match could produce false positives in edge cases.

The time dimension name lookup uses a simple substring check (if td_name in f_str), which could match unintended occurrences (e.g., dimension "at" matching within "status").

This is a fallback heuristic only triggered when no explicit time dimension is provided, so the impact is limited. Consider using word boundaries if this causes issues in practice:

♻️ Optional: Use word-boundary matching
     if query.filters:
         time_dim_names = {d.name for d in model.dimensions if d.type in (DataType.TIMESTAMP, DataType.DATE)}
         for f_str in query.filters or []:
             for td_name in time_dim_names:
-                if td_name in f_str:
+                if re.search(rf"\b{re.escape(td_name)}\b", f_str):
                     return f"{model.name}.{td_name}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slayer/engine/enrichment.py` around lines 459 - 464, The current heuristic
inspects filters with a substring check (`if td_name in f_str`) which can
produce false positives (e.g., matching "at" inside "status"); replace this with
a whole-word match using word-boundary regex or tokenization so you only match
complete dimension identifiers: in the block that computes time_dim_names from
model.dimensions and iterates query.filters, use a compiled regex like
r'\b{td_name}\b' (with proper escaping) or split tokens to test membership, and
ensure you import/prepare the regex beforehand and use it in place of the
substring check so return f"{model.name}.{td_name}" only on true whole-word
matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@slayer/engine/enrichment.py`:
- Around line 459-464: The current heuristic inspects filters with a substring
check (`if td_name in f_str`) which can produce false positives (e.g., matching
"at" inside "status"); replace this with a whole-word match using word-boundary
regex or tokenization so you only match complete dimension identifiers: in the
block that computes time_dim_names from model.dimensions and iterates
query.filters, use a compiled regex like r'\b{td_name}\b' (with proper escaping)
or split tokens to test membership, and ensure you import/prepare the regex
beforehand and use it in place of the substring check so return
f"{model.name}.{td_name}" only on true whole-word matches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14c550a9-22eb-4b87-b4b9-40b40c053362

📥 Commits

Reviewing files that changed from the base of the PR and between b7191ec and 012ef83.

📒 Files selected for processing (6)
  • examples/embedded/verify.py
  • examples/seed.py
  • examples/verify_common.py
  • slayer/engine/enrichment.py
  • slayer/engine/query_engine.py
  • tests/test_sql_generator.py

@AivanF AivanF merged commit 56fe230 into main Apr 9, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant