Skip to content

Histogram missing for large-integer columns near 2^53 #636

@paddymul

Description

@paddymul

Summary

Numeric columns with values near or above 2^53 (Number.MAX_SAFE_INTEGER) produce no histogram. The widget renders fine, but the histogram row is empty for these columns.

Cause

PR #631 fixed a crash where np.histogram() raised ValueError for these columns (see #632). The fix catches the error and returns empty histogram_args, which means the column falls through to no histogram at all.

The underlying issue is that at values near 2^53, float64 epsilon is ≥2.0, so percentile trimming can leave a range too narrow for np.histogram() to create bins.

Suggested fix

Use offset-based binning: np.histogram(meat - meat.min(), 10) instead of np.histogram(meat, 10). The range between values is small and float64-safe even when the absolute values aren't. Then adjust bin edges back by adding meat.min().

Reproduction

import pandas as pd
from buckaroo.buckaroo_widget import BuckarooWidget

df = pd.DataFrame({
    'big_id': [9007199254740993, 9007199254740994, 9007199254740995],
    'label': ['a', 'b', 'c'],
})
bw = BuckarooWidget(df)
# Widget works, but big_id column has no histogram

Who's affected

Columns with large integer IDs: Snowflake IDs, Discord IDs, database surrogate keys > 2^53, etc.

Refs: #632, #631, #538

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions