-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Summary
Numeric columns with values near or above 2^53 (Number.MAX_SAFE_INTEGER) produce no histogram. The widget renders fine, but the histogram row is empty for these columns.
Cause
PR #631 fixed a crash where np.histogram() raised ValueError for these columns (see #632). The fix catches the error and returns empty histogram_args, which means the column falls through to no histogram at all.
The underlying issue is that at values near 2^53, float64 epsilon is ≥2.0, so percentile trimming can leave a range too narrow for np.histogram() to create bins.
Suggested fix
Use offset-based binning: np.histogram(meat - meat.min(), 10) instead of np.histogram(meat, 10). The range between values is small and float64-safe even when the absolute values aren't. Then adjust bin edges back by adding meat.min().
Reproduction
import pandas as pd
from buckaroo.buckaroo_widget import BuckarooWidget
df = pd.DataFrame({
'big_id': [9007199254740993, 9007199254740994, 9007199254740995],
'label': ['a', 'b', 'c'],
})
bw = BuckarooWidget(df)
# Widget works, but big_id column has no histogramWho's affected
Columns with large integer IDs: Snowflake IDs, Discord IDs, database surrogate keys > 2^53, etc.