Skip to content

fix(agg): use f64 sum for integer AVG to avoid overflow#221

Open
ser-vasilich wants to merge 1 commit into
masterfrom
fix/avg-i64-overflow
Open

fix(agg): use f64 sum for integer AVG to avoid overflow#221
ser-vasilich wants to merge 1 commit into
masterfrom
fix/avg-i64-overflow

Conversation

@ser-vasilich
Copy link
Copy Markdown
Collaborator

Summary

The scalar reduction path computed AVG on integer columns as
(double)acc.sum_i / cnt where sum_i is accumulated through
unsigned wrap (to dodge UBSan). For columns whose true sum exceeds
2^63 — e.g. an i64 user-id column with signed values spread across
±9e18 over millions of rows, true sum ~7.6e24 — the wrap leaves
garbage in sum_i and the mean is wrong by orders of magnitude.

Add a parallel f64 sum_d field to reduce_acc_t, populate it in
every integer reduction kernel (REDUCE_LOOP_I, BOOL/U8 path) and in
reduce_merge. OP_AVG (and VAR/STDDEV mean) now read sum_d
instead of sum_i. F64 path is unchanged.

Reduction kernels gain one f64 add per element; vectorises with the
existing integer SIMD store.

The scalar reduction path computed AVG on integer columns as
(double)acc.sum_i / cnt where sum_i is accumulated via uint64 wrap
to dodge UBSan.  For columns whose true sum exceeds 2^63
(e.g. ClickBench UserID, signed values around ±9e18 × 10M rows,
true sum ~7.6e24) the wrap leaves garbage and the mean is wrong
by orders of magnitude.

Add a parallel f64 sum_d field to reduce_acc_t, populate it in
every integer reduction kernel (REDUCE_LOOP_I, BOOL/U8 path) and
in reduce_merge.  AVG (and VAR/STDDEV mean) now read sum_d
instead of sum_i.  F64 path is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant