fix(agg): use f64 sum for integer AVG to avoid overflow#221
Open
ser-vasilich wants to merge 1 commit into
Open
fix(agg): use f64 sum for integer AVG to avoid overflow#221ser-vasilich wants to merge 1 commit into
ser-vasilich wants to merge 1 commit into
Conversation
The scalar reduction path computed AVG on integer columns as (double)acc.sum_i / cnt where sum_i is accumulated via uint64 wrap to dodge UBSan. For columns whose true sum exceeds 2^63 (e.g. ClickBench UserID, signed values around ±9e18 × 10M rows, true sum ~7.6e24) the wrap leaves garbage and the mean is wrong by orders of magnitude. Add a parallel f64 sum_d field to reduce_acc_t, populate it in every integer reduction kernel (REDUCE_LOOP_I, BOOL/U8 path) and in reduce_merge. AVG (and VAR/STDDEV mean) now read sum_d instead of sum_i. F64 path is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The scalar reduction path computed AVG on integer columns as
(double)acc.sum_i / cntwheresum_iis accumulated throughunsigned wrap (to dodge UBSan). For columns whose true sum exceeds
2^63 — e.g. an i64 user-id column with signed values spread across
±9e18 over millions of rows, true sum ~7.6e24 — the wrap leaves
garbage in
sum_iand the mean is wrong by orders of magnitude.Add a parallel f64
sum_dfield toreduce_acc_t, populate it inevery integer reduction kernel (REDUCE_LOOP_I, BOOL/U8 path) and in
reduce_merge.OP_AVG(and VAR/STDDEV mean) now readsum_dinstead of
sum_i. F64 path is unchanged.Reduction kernels gain one f64 add per element; vectorises with the
existing integer SIMD store.