API: Use unsigned byte-wise comparison to fix UUID comparison bug #14500

bodduv · 2025-11-04T11:12:44Z

The problem is described in the issue as well as in the dev mailing list post.

The changes proposed in the PR take a simple approach to fix the comparisons bug. This directly addresses correctness issues at both ManifestEntry as well as ManifestFile filtering. Note that the changes make UUID values comparison forward-compatible, but compliant with UUID RFCs.

bodduv · 2025-11-04T11:16:09Z

Tagging @nastra and @Fokko based on your previous work on UUIDs and @dimas-b as I touch TestComparators file you added.

jhrotko · 2025-11-07T14:32:01Z

Thanks for handling this!

dimas-b

The proposed change looks reasonable to me given the explanation in #14216... however, I cannot assess the impact on existing Iceberg clients and tables. I hope other reviewers can help with that.

api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java

api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java

nastra · 2025-11-17T08:40:47Z

overall the change LGTM, just left a few minor comments but we need to decide in the community how we want to proceed with this change

pvary · 2025-11-17T11:07:05Z

@bodduv: Could you please help me understand the effect of this change on the current tables?
What happens if:

We have a table with an UUID column
We inserted 2 rows to the table with UUID_MIN and UUID_MAX with Java Iceberg 1.10.0, and calculated column stats (min=UUID_MAX, max=UUID_MIN)
We run a query which filter on UUID_MIDDLE.
- I expect that the metadata filtering will return the new file (UUID_MAX < UUID_MIDDLE < UUID_MIN), and we will find the row

Am I correct, that after the upgrade the metadata filtering will skip the new file (UUID_MIDDLE < UUID_MAX) - filtered out by the wrong min value?

bodduv · 2025-11-17T11:50:38Z

Thank you for the comment @pvary

We have a table with an UUID column

We inserted 2 rows to the table with UUID_MIN and UUID_MAX with Java Iceberg 1.10.0, and calculated column stats (min=UUID_MAX, max=UUID_MIN)

It matters how a query engine prepares min, max values for UUID columns to hand them over for writing manifest file and manifest lists. Some engines could use min and max values as prepared by Parquet Java (which is RFC compliant) during writes.

We run a query which filter on UUID_MIDDLE.

I expect that the metadata filtering will return the new file (UUID_MAX < UUID_MIDDLE < UUID_MIN), and we will find the row

Am I correct, that after the upgrade the metadata filtering will skip the new file (UUID_MIDDLE < UUID_MAX) - filtered out by the wrong min value?

Yes, if the min and max metrics persisted in manifest file and manifest list are constructed using the faulty non-RFC compliant UUID comparisons, then yes we would not be able to read the new file back with such a filter (on UUID column) after upgrading. What is even more problematic: even an equality filter uuid_col = ... will leave out records that are supposed to be returned. Note that with a full table scan we will be read the new file.

A remedy would be to migrate the table (doing a full table scan) to rewrite metrics accurately.

Note: This issue is only in Java implementation of the spec. Go, Rust, Cpp implementations are RFC compliant making the bug more severe. I.e., If the same table is read with a filter using Go implementation, it produced correct, but different records than what Java implementation produces.

pvary · 2025-11-17T13:03:17Z

This is a serious behavioral change which could effect correctness.
I agree that we should find a way to move forward, but the current RFC incompatible solution works for java only implementation and this change would break them. We should need to find a solution which allows fixing this issue without effecting correctness.

I would try to resurrect the thread with a summary (short/easily understandable problem statement), and with a focused more detailed description.

Also, I would add this to the next community sync topics.

bodduv · 2025-11-17T13:26:54Z

This is a serious behavioral change which could effect correctness.

It is. But I also think this is a serious data correctness bug in Java implementation of the Iceberg spec. If we would like to preserve the old non-RFC compliant way of UUID comparisons, then there is a disparity with other implementations of the spec. So the actual question is: What does Iceberg spec say about UUID comparisons, how should UUID values be compared?

I agree that we should find a way to move forward, but the current RFC incompatible solution works for java only implementation and this change would break them. We should need to find a solution which allows fixing this issue without effecting correctness.

I should clarify regarding ^this. If we stick to RFC compliance, then we do NOT need a solution for implementations other than Java as other implementations are not affected by this UUID comparison bug. Let me clarify: If one uses Go implementation of the spec to create Iceberg table with a UUID column just like above. In this case, min=UUID_MIN and max=UUID_MAX compliant with RFC. No surprises while using a filter on UUID_MIDDLE, the new file should be read correctly.

Query engines using Java implementation of the spec might need to revisit UUID comparisons.

There is another approach of disabling any manifest entry filtering (data file filtering) and manifest file filtering (partition pruning) so as to not trigger any UUID comparisons (via Iceberg Java APIs). This are taking this approach and
I believe Trino as well [1], although this come with significant performance implications. But query engines must approach this from data correctness POV.

I would try to resurrect the thread with a summary (short/easily understandable problem statement), and with a focused more detailed description.

Also, I would add this to the next community sync topics.

Thank you @pvary for effort and taking a closer look into this.

[1] Trino fixed trinodb/trino#12834 by trinodb/trino#12911.

amogh-jahagirdar

I echo @pvary concern that this is a pretty significant behavior change, as the ordering won't be stable for UUIDs between versions. I'm digging into the RFCs exact language but I also came across https://stackoverflow.com/questions/79489893/does-the-uuid-comparison-in-java-violate-the-uuid-standard it sounds like this sorting behavior is particularly defined for UUIDV6/7 and sorting is not prescribed in V4? At the same time, the openJDK folks acknowledge that this is a bug, so I'm not sure which yet (like I said, still digging into it) https://bugs.openjdk.org/browse/JDK-7025832

In general, I think we should close this ambiguity on the sorting of UUIDs in the spec definition; as you pointed out, implementations are inconsistent in how this is performed. Whether or not this definition should be based off the RFC or if there's a good argument to retroactively work from the Java reference implementation behavior is something I'm still thinking through, and I think we should discuss in the mailing list.

bodduv · 2025-11-17T15:00:26Z

I'm digging into the RFCs exact language but I also came across https://stackoverflow.com/questions/79489893/does-the-uuid-comparison-in-java-violate-the-uuid-standard it sounds like this sorting behavior is particularly defined for UUIDV6/7 and sorting is not prescribed in V4? At the same time, the openJDK folks acknowledge that this is a bug, so I'm not sure which yet (like I said, still digging into it) https://bugs.openjdk.org/browse/JDK-7025832

I have addressed how RFCs define ordering among UUIDs in #14216.
If you take a closer look at RFC 4122, page 4, paragraph heading "Rules for Lexical Equivalence" this specifies ordering. RFC 4122 specifies versions 1 through 4. It would be surprising if UUID v6/7 have different comparison semantics from v1-4.

RFC 9562 section "6.1.1. Sorting" mentions "UUID formats created by this specification are intended to be lexicographically sortable while in the textual representation."

if there's a good argument to retroactively work from the Java reference implementation behavior

Maintaining backward compatibility is an argument in this case.

But on the other hand, looking forward into the future, is there a possibility to communicate breaking changes and provide table migration strategy(ies) in release docs? I suppose its a topic for the mailing list.

pvary · 2025-11-17T17:07:27Z

Anything which could be a breaking change needs to be discussed on the dev list. If needed, discussed on the sync

github-actions · 2026-01-11T00:21:47Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

bodduv · 2026-01-16T13:48:04Z

@pvary Thank you for promptly jumping in (during previous community sync) while I could not recollect the details of why we wanted to pause on the behavior change.

It was discussed that evaluating expressions once with RFC-compliant comparator and again with signed comparator to then allow the filter to pass if either of the expression evaluations is true. I attempt this in fe9faf5. Following is a short description of the changes.

Write Path: while writing new Parquet files, metrics are prepared using RFC-compliant UUID comparison (Comparators.uuids()), producing correct min/max statistics.
Read Path: uses two expression evaluations: once with RFC-compliant comparator (to work with newly written files) and another with signed comparator for backward compatibility with older files that may have UUID statistics computed with Java's signed UUID.compareTo().
Affected evaluators: InclusiveMetricsEvaluator, ManifestEvaluator, ParquetMetricsRowGroupFilter
Performance: (almost) zero overhead for non-UUID queries. May require evaluating (filter) expressions twice, so there is some non-zero performance impact. But I suppose we would prioritize correctness.

Implementation around expression evaluation was not conducive to such two fold evaluations. So I had to force it. I hope the increased code complexity is acceptable.

Java's UUID.compareTo() uses signed comparison of most significant bits and least significant bits, which is not compliant with RFC 4122, RFC 9562. This causes incorrect ManifestEntry and ManifestFile filtering/pruning in the presence of UUID filters.

…sses either RFC or non-RFC compliant UUID comparator

github-actions bot added the API label Nov 4, 2025

github-actions bot added the flink label Nov 5, 2025

dimas-b approved these changes Nov 7, 2025

View reviewed changes

nastra reviewed Nov 14, 2025

View reviewed changes

api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java Outdated Show resolved Hide resolved

nastra reviewed Nov 14, 2025

View reviewed changes

api/src/test/java/org/apache/iceberg/expressions/TestInclusiveManifestEvaluator.java Outdated Show resolved Hide resolved

nastra reviewed Nov 14, 2025

View reviewed changes

api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java Outdated Show resolved Hide resolved

bodduv requested a review from nastra November 16, 2025 20:58

nastra reviewed Nov 17, 2025

View reviewed changes

api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java Outdated Show resolved Hide resolved

nastra reviewed Nov 17, 2025

View reviewed changes

api/src/test/java/org/apache/iceberg/expressions/TestInclusiveMetricsEvaluator.java Outdated Show resolved Hide resolved

nastra requested review from Fokko, amogh-jahagirdar, pvary and stevenzwu November 17, 2025 08:40

amogh-jahagirdar reviewed Nov 17, 2025

View reviewed changes

github-actions bot added stale parquet data and removed stale labels Jan 11, 2026

Let ManifestFile and ManifestEntry to be read if filter expression pa…

fe9faf5

…sses either RFC or non-RFC compliant UUID comparator

bodduv force-pushed the uuid_comparison_bug branch from 1f74764 to fe9faf5 Compare January 16, 2026 14:18

bodduv requested review from amogh-jahagirdar and nastra January 16, 2026 14:21

API: Use unsigned byte-wise comparison to fix UUID comparison bug #14500

Are you sure you want to change the base?

API: Use unsigned byte-wise comparison to fix UUID comparison bug #14500

Uh oh!

Conversation

bodduv commented Nov 4, 2025

Uh oh!

bodduv commented Nov 4, 2025

Uh oh!

jhrotko commented Nov 7, 2025

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nastra commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary commented Nov 17, 2025

Uh oh!

bodduv commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary commented Nov 17, 2025

Uh oh!

bodduv commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amogh-jahagirdar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bodduv commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pvary commented Nov 17, 2025

Uh oh!

github-actions bot commented Jan 11, 2026

Uh oh!

bodduv commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nastra commented Nov 17, 2025 •

edited

Loading

bodduv commented Nov 17, 2025 •

edited

Loading

bodduv commented Nov 17, 2025 •

edited

Loading

amogh-jahagirdar left a comment •

edited

Loading

bodduv commented Nov 17, 2025 •

edited

Loading

bodduv commented Jan 16, 2026 •

edited

Loading