Fix pivot_table duplicate indices with Python 3.14 + NumPy 1.26 #63324

AKHIL-149 · 2025-12-11T03:10:30Z

Summary

fixes #63314 - pivot_table creating duplicate indices on python 3.14 with numpy 1.26

tracked down the actual bug. wasn't in compress_group_index like i thought - it's numpy's searchsorted that's broken with this version combo.

What was happening

unstack uses searchsorted to build the compressor array
with py3.14 + numpy 1.26, searchsorted returns duplicate values instead of unique positions
this causes multiple different index values to map to the same output row

The fix

fallback to the np.unique approach when on python 3.14 + numpy < 2.0. this is the same method the non-sorted path already uses, so it's tested.

Testing

tested with the reproduction case from the issue (100k rows, 3 metrics). works correctly now.

found the real issue - searchsorted is broken with python 3.14 + numpy 1.26. it's not compress_group_index, it's the compressor calculation in unstack that uses searchsorted. just fallback to the unique/return_index approach for this combo, same as what the non-sorted path does. works with 100k rows now.

AKHIL-149 · 2025-12-11T03:41:26Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

jorisvandenbossche · 2025-12-12T15:08:28Z

pandas/core/reshape/reshape.py

+            # GH 63314: avoid searchsorted bug with py3.14 + numpy < 2.0
+            numpy_major = int(np.__version__.split(".")[0])
+            has_searchsorted_bug = sys.version_info >= (3, 14) and numpy_major < 2


There are some existing constants you can use for this:

from pandas.compat.numpy import np_version_gt2 from pandas.compat import PY314

jorisvandenbossche · 2025-12-12T15:14:13Z

@AKHIL-149 Thanks for the analysis!

Can you reproduce the issue with Python 3.14 + NumPy 1.26 with a small example as well? (since we don't have that combo in CI, it's also not really a supported combination by numpy):

>>> arr = np.repeat(np.arange(100_000), 3)
>>> res1 = arr.searchsorted(np.arange(100_000))
>>> res2 = np.sort(np.unique(arr, return_index=True)[1])
>>> np.allclose(res1, res2)
True

addressing review feedback - use PY314 and np_version_gt2 instead of manual version parsing

AKHIL-149 · 2025-12-12T19:19:05Z

@jorisvandenbossche Thanks for the review, Updated to use the existing constants.

Regarding the searchsorted test, the simple example you provided works fine:

>>> arr = np.repeat(np.arange(100_000), 3)
>>> res1 = arr.searchsorted(np.arange(100_000))
>>> res2 = np.sort(np.unique(arr, return_index=True)[1])
>>> np.allclose(res1, res2)
True

However, the bug manifests specifically in the pivot_table/unstack scenario with the comp_index array that gets created. The issue seems to be triggered by the specific pattern of values and array size that occurs during the unstack operation, not with a simple repeated array pattern.

I tested with Python 3.14.2 + NumPy 1.26.4 and can reproduce the pivot_table bug consistently at around 15k+ indices (45k+ rows), where it returns 5001 unique indices instead of 15000 (exactly 1/3 ratio). The workaround using np.unique fixes the issue.

AKHIL-149 · 2025-12-12T21:05:44Z

@jorisvandenbossche done - switched to using the compat constants.

the searchsorted bug is weird, only shows up with the specific array pattern from unstack, not with simple test cases. but the workaround handles it.

rhshadrach

With NumPy 1.26 EOL prior to the release of Python 3.14, I'm not sure we should be making changes in pandas to support this.

AKHIL-149

yeah that's a fair point about the EOL timing. i guess the main reason i looked at this is people are still hitting it in the wild (like the OP) - probably locked environments or slow upgrades.

the fix itself is pretty minimal, just routing to the existing fallback path that's already there. but i get it if supporting an EOL combo doesn't make sense for pandas.

could also just close this and document it as a known issue with that specific version combo if you prefer.

rhshadrach · 2025-12-12T21:25:49Z

It seems to me a good resolution here would be to enforce numpy>=2.0 when the user has Python 3.14.

AKHIL-149 · 2025-12-12T21:31:20Z

ah yeah that's probably cleaner than the workaround. so basically, add numpy>=2.0 to the install_requires when python_version>='3.14' in the dependency specs?

makes sense - forces people to upgrade instead of patching around the bug. i can update the PR to do that instead if you want or just close this one.

[pre-commit.ci] auto fixes from pre-commit.com hooks

246ffec

for more information, see https://pre-commit.ci

jorisvandenbossche reviewed Dec 12, 2025

View reviewed changes

jorisvandenbossche mentioned this pull request Dec 12, 2025

BUG: large pivot_table has incorrect output with Python 3.14 #63314

Open

3 tasks

use pandas compat constants for version checks

af06a77

addressing review feedback - use PY314 and np_version_gt2 instead of manual version parsing

rhshadrach requested changes Dec 12, 2025

View reviewed changes

AKHIL-149 commented Dec 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix pivot_table duplicate indices with Python 3.14 + NumPy 1.26 #63324

Fix pivot_table duplicate indices with Python 3.14 + NumPy 1.26 #63324

AKHIL-149 commented Dec 11, 2025

Uh oh!

AKHIL-149 commented Dec 11, 2025

Uh oh!

jorisvandenbossche Dec 12, 2025

Uh oh!

jorisvandenbossche commented Dec 12, 2025

Uh oh!

AKHIL-149 commented Dec 12, 2025 •

edited

Loading

Uh oh!

AKHIL-149 commented Dec 12, 2025

Uh oh!

rhshadrach left a comment

Uh oh!

AKHIL-149 left a comment

Uh oh!

rhshadrach commented Dec 12, 2025

Uh oh!

AKHIL-149 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Fix pivot_table duplicate indices with Python 3.14 + NumPy 1.26 #63324

Are you sure you want to change the base?

Fix pivot_table duplicate indices with Python 3.14 + NumPy 1.26 #63324

Conversation

AKHIL-149 commented Dec 11, 2025

Summary

What was happening

The fix

Testing

Uh oh!

AKHIL-149 commented Dec 11, 2025

Uh oh!

jorisvandenbossche Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Dec 12, 2025

Uh oh!

AKHIL-149 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AKHIL-149 commented Dec 12, 2025

Uh oh!

rhshadrach left a comment

Choose a reason for hiding this comment

Uh oh!

AKHIL-149 left a comment

Choose a reason for hiding this comment

Uh oh!

rhshadrach commented Dec 12, 2025

Uh oh!

AKHIL-149 commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AKHIL-149 commented Dec 12, 2025 •

edited

Loading