This repository contains my undergraduate thesis implementation on top of LingoDB.
The work targets repeated Join computation in multi-query scenarios and introduces optimizations at both RelAlg and SubOperator levels.
Core idea:
- detect equivalent Join sub-expressions and reuse them;
- adjust build/probe roles to enable better hash sharing;
- reuse hash multi-map states instead of rebuilding;
- compile multiple SQL statements into one MLIR module to expose cross-query optimization opportunities.
- Thesis title:
Multi-Query Optimization via Common Sub-Join Reuse in Relational Compiled Databases - Thesis PDF: undergraduate thesis
- Completion date:
May 20, 2024 - Baseline used in evaluation: LingoDB commit
eee8c78847b2377ddc8b84974585182a7bb67700
The following four snapshots show how the optimizer rewrites plans in representative scenarios.
Repeated join fragments are detected and merged to avoid redundant join work.
Equivalent plan structures are normalized so downstream passes can apply stronger rewrites.
Hash build states are shared across compatible contexts, reducing repeated hash construction.
Multiple statements in one module expose larger reuse opportunities than isolated execution.
| Area | Contribution | Main files |
|---|---|---|
| RelAlg | Equivalent Join edge matching for CSE | lib/RelAlg/Transforms/MatOpt/joinCSE.cpp, include/mlir/Dialect/RelAlg/Transforms/MatOpt/GraphMatcher.h |
| RelAlg | Plan rebuild to realize Join reuse | include/mlir/Dialect/RelAlg/Transforms/MatOpt/RebuildPlan.h |
| RelAlg | Alias cleanup and column reference repair | lib/RelAlg/Transforms/MatOpt/EraseAlias.cpp |
| RelAlg | Build/probe role adjustment for hash reuse | lib/RelAlg/Transforms/MatOpt/SwitchBuildProbe.cpp |
| SubOperator | Hash multi-map reuse pass | lib/SubOperator/Transforms/ReuseHashMultiMap.cpp |
| Pipeline wiring | Inject new passes into optimization flow | lib/RelAlg/Passes.cpp, lib/execution/Execution.cpp |
| Multi-query input | Parse SQL file into multi-statement module | lib/execution/Frontend.cpp |
All numbers below come from Chapter 8 of the thesis.
xychart-beta
title "Join CSE Improvement (%)"
x-axis ["1a", "1b", "1c", "2a", "2b"]
y-axis "Improvement %" 0 --> 100
bar [61, 66, 49, 97, 36]
| Case | Baseline (s) | Join CSE (s) | Improvement |
|---|---|---|---|
| 1a | 3.32 | 1.31 | 61% |
| 1b | 6.03 | 2.10 | 66% |
| 1c | 14.68 | 7.50 | 49% |
| 2a | 0.22 | 0.007 | 97% |
| 2b | 17.50 | 11.30 | 36% |
xychart-beta
title "HashMap Reuse Improvement (%)"
x-axis ["3a", "3b", "3c"]
y-axis "Improvement %" 0 --> 100
bar [58, 91, 89]
| Case | Baseline (s) | HashMap Reuse (s) | Improvement |
|---|---|---|---|
| 3a | 2.60 | 1.11 | 58% |
| 3b | 36.96 | 3.45 | 91% |
| 3c | 38.74 | 4.49 | 89% |
xychart-beta
title "Batch Execution Improvement (%)"
x-axis ["4a", "4b", "4c"]
y-axis "Improvement %" 0 --> 100
bar [59, 66, 67]
| Case | Baseline (s, sequential) | Optimized (s, batch) | Improvement |
|---|---|---|---|
| 4a | 2.25 + 3.32 | 2.30 | 59% |
| 4b | 2.75 + 6.03 | 2.97 | 66% |
| 4c | 6.33 + 14.68 | 6.95 | 67% |
pie showData
title Average Improvement by Optimization Type
"Join CSE" : 60
"HashMap Reuse" : 79
"Batch (Join CSE)" : 64
- The optimization is integrated as compiler passes, not a one-off executor hack.
- Reuse is done on hash structures, which avoids many extra materialization costs.
- Multi-statement compilation increases optimization visibility and can amplify gains.
You can still use LingoDB via:
- Hosted SQL web interface
- Python package:
pip install lingodb - Docker image
- Build from source: official docs
Official documentation: lingo-db docs
Docs repository: github.com/lingo-db/docs



