feat(sql): support multiple right-hand-side tables in HORIZON JOIN#6881
feat(sql): support multiple right-hand-side tables in HORIZON JOIN#6881
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR ReviewCritical1.
|
| Original claim | Why dismissed |
|---|---|
Ambiguous column name resolution in buildMultiHorizonColumnMappings no-alias fallback silently picks first slave |
The no-alias path is a defensive fallback; JoinRecordMetadata stores alias-prefixed names, so the alias path handles real queries. |
closeSlaveOnFailure not restored after pending horizon slave accumulation |
Line 5059 resets closeSlaveOnFailure = true at top of each iteration. No bug. |
MultiHorizonJoinSymbolTableSource mutable masterSource field not volatile |
of() called before dispatchAndAwait() creates happens-before edge. Safe. |
Bare timestamp/offset names mapped to SOURCE_SEQUENCE could shadow real columns |
JoinRecordMetadata stores alias-prefixed names. Not a real risk. |
| Single HORIZON JOIN without RANGE/LIST now parsed successfully (deferred error) | Post-loop check at line 5286-5292 catches this and produces a clear error. |
Summary
Verdict: Request changes — the perWorkerFilters null bug (Finding 1) is a confirmed regression that causes resource leaks and thread-safety violations on the parallel execution path with stolen filters. It must be fixed before merge.
12 draft findings verified, 5 downgraded as false positives, 7 confirmed findings reported.
|
@bluestreak01 all items but 3 and 5 are addressed in the following commits: ed88dcb, 802ef32, 76296af |
Move the adaptive backward/forward scan state (prevAsOfRowId, isForwardScanMode, bwdScanRowsAtPositionStart) and threshold config from the caller sites into HorizonJoinTimeFrameHelper. Add findKeyedAsOfMatch() which encapsulates the ~30-line adaptive scan block that was previously copy-pasted across all 8 processHorizonTimestamps variants. This eliminates the duplicated orchestration logic from 4 single-slave and 4 multi-slave factory files, replacing each copy with a single method call. The scan state now lives on the helper alongside the backward/forward watermarks it interacts with. toTop() resets all state per frame. The per-worker scan-state arrays (prevAsOfRowIds, isForwardScanModes, bwdScanRowsAtPositionStarts) and their getters are removed from BaseAsyncMultiHorizonJoinAtom, and the threshold getters are removed from both atom base classes. Each helper instance is already per-worker per-slave, so thread safety is preserved. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap testHorizonJoinParallelExecution in assertMemoryLeak() so that native memory leaks on the parallel execution path are detected. Add testMultiHorizonJoinMasterNoDesignatedTimestamp to cover the multi-slave code path where the master table has no designated timestamp. The general ASOF join validation in validateBothTimestamps() catches this before generateMultiHorizonJoinFactory() runs, so the redundant SqlException there is replaced with an assert. Add testMultiHorizonJoinDifferentSlaveTimestampTypes where each source uses a distinct timestamp type (master=NS, slave0=MICRO, slave1=NS). This exercises per-slave timestamp scaling with different scale factors across slaves, which the randomized tests miss because both slaves share a single rightTableTimestampType. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
[PR Coverage check]😍 pass : 1345 / 1965 (68.45%) file detail
|
Documentation PR: questdb/documentation#397
Summary
LongListoffsets withlong[]to avoid on-heap allocations on the hot path, simplifiesBaseAsyncHorizonJoinAtomconstructor by removing redundant@Transientparameters already baked into the compiledRecordSinkclasses, and replaces the always-forward scan strategy inprocessHorizonTimestampswith an adaptive backward/forward approach controlled by configurable thresholds (cairo.sql.horizon.join.bwd.scan.*properties) — the new heuristic starts with backward-only scans and switches to forward scans when the backward scan cost exceeds a gap-relative or absolute thresholdSQL syntax
The last HORIZON JOIN in the chain carries the
RANGE/LISTandASclauses; preceding HORIZON JOINs omit them:TODOs
Multi*factory performance vs. single-RHS table HORIZON JOIN