Skip to content

Conversation

@bodduv
Copy link

@bodduv bodduv commented Nov 4, 2025

Fixes #14216

The problem is described in the issue as well as in the dev mailing list post.

The changes proposed in the PR take a simple approach to fix the comparisons bug. This directly addresses correctness issues at both ManifestEntry as well as ManifestFile filtering. Note that the changes make UUID values comparison forward-compatible, but compliant with UUID RFCs.

@github-actions github-actions bot added the API label Nov 4, 2025
@bodduv
Copy link
Author

bodduv commented Nov 4, 2025

Tagging @nastra and @Fokko based on your previous work on UUIDs and @dimas-b as I touch TestComparators file you added.

@github-actions github-actions bot added the flink label Nov 5, 2025
@jhrotko
Copy link

jhrotko commented Nov 7, 2025

Thanks for handling this!

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change looks reasonable to me given the explanation in #14216... however, I cannot assess the impact on existing Iceberg clients and tables. I hope other reviewers can help with that.

@bodduv bodduv requested a review from nastra November 16, 2025 20:58
@nastra
Copy link
Contributor

nastra commented Nov 17, 2025

overall the change LGTM, just left a few minor comments but we need to decide in the community how we want to proceed with this change

@pvary
Copy link
Contributor

pvary commented Nov 17, 2025

@bodduv: Could you please help me understand the effect of this change on the current tables?
What happens if:

  • We have a table with an UUID column
  • We inserted 2 rows to the table with UUID_MIN and UUID_MAX with Java Iceberg 1.10.0, and calculated column stats (min=UUID_MAX, max=UUID_MIN)
  • We run a query which filter on UUID_MIDDLE.
    • I expect that the metadata filtering will return the new file (UUID_MAX < UUID_MIDDLE < UUID_MIN), and we will find the row

Am I correct, that after the upgrade the metadata filtering will skip the new file (UUID_MIDDLE < UUID_MAX) - filtered out by the wrong min value?

@bodduv
Copy link
Author

bodduv commented Nov 17, 2025

Thank you for the comment @pvary

  • We have a table with an UUID column
  • We inserted 2 rows to the table with UUID_MIN and UUID_MAX with Java Iceberg 1.10.0, and calculated column stats (min=UUID_MAX, max=UUID_MIN)

It matters how a query engine prepares min, max values for UUID columns to hand them over for writing manifest file and manifest lists. Some engines could use min and max values as prepared by Parquet Java (which is RFC compliant) during writes.

  • We run a query which filter on UUID_MIDDLE.

    • I expect that the metadata filtering will return the new file (UUID_MAX < UUID_MIDDLE < UUID_MIN), and we will find the row

Am I correct, that after the upgrade the metadata filtering will skip the new file (UUID_MIDDLE < UUID_MAX) - filtered out by the wrong min value?

Yes, if the min and max metrics persisted in manifest file and manifest list are constructed using the faulty non-RFC compliant UUID comparisons, then yes we would not be able to read the new file back with such a filter (on UUID column) after upgrading. What is even more problematic: even an equality filter uuid_col = ... will leave out records that are supposed to be returned. Note that with a full table scan we will be read the new file.

A remedy would be to migrate the table (doing a full table scan) to rewrite metrics accurately.

Note: This issue is only in Java implementation of the spec. Go, Rust, Cpp implementations are RFC compliant making the bug more severe. I.e., If the same table is read with a filter using Go implementation, it produced correct, but different records than what Java implementation produces.

@pvary
Copy link
Contributor

pvary commented Nov 17, 2025

This is a serious behavioral change which could effect correctness.
I agree that we should find a way to move forward, but the current RFC incompatible solution works for java only implementation and this change would break them. We should need to find a solution which allows fixing this issue without effecting correctness.

I would try to resurrect the thread with a summary (short/easily understandable problem statement), and with a focused more detailed description.

Also, I would add this to the next community sync topics.

@bodduv
Copy link
Author

bodduv commented Nov 17, 2025

This is a serious behavioral change which could effect correctness.

It is. But I also think this is a serious data correctness bug in Java implementation of the Iceberg spec. If we would like to preserve the old non-RFC compliant way of UUID comparisons, then there is a disparity with other implementations of the spec. So the actual question is: What does Iceberg spec say about UUID comparisons, how should UUID values be compared?

I agree that we should find a way to move forward, but the current RFC incompatible solution works for java only implementation and this change would break them. We should need to find a solution which allows fixing this issue without effecting correctness.

I should clarify regarding ^this. If we stick to RFC compliance, then we do NOT need a solution for implementations other than Java as other implementations are not affected by this UUID comparison bug. Let me clarify: If one uses Go implementation of the spec to create Iceberg table with a UUID column just like above. In this case, min=UUID_MIN and max=UUID_MAX compliant with RFC. No surprises while using a filter on UUID_MIDDLE, the new file should be read correctly.

Query engines using Java implementation of the spec might need to revisit UUID comparisons.

There is another approach of disabling any manifest entry filtering (data file filtering) and manifest file filtering (partition pruning) so as to not trigger any UUID comparisons (via Iceberg Java APIs). This are taking this approach and
I believe Trino as well [1], although this come with significant performance implications. But query engines must approach this from data correctness POV.

I would try to resurrect the thread with a summary (short/easily understandable problem statement), and with a focused more detailed description.

Also, I would add this to the next community sync topics.

Thank you @pvary for effort and taking a closer look into this.

[1] Trino fixed trinodb/trino#12834 by trinodb/trino#12911.

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I echo @pvary concern that this is a pretty significant behavior change, as the ordering won't be stable for UUIDs between versions. I'm digging into the RFCs exact language but I also came across https://stackoverflow.com/questions/79489893/does-the-uuid-comparison-in-java-violate-the-uuid-standard it sounds like this sorting behavior is particularly defined for UUIDV6/7 and sorting is not prescribed in V4? At the same time, the openJDK folks acknowledge that this is a bug, so I'm not sure which yet (like I said, still digging into it) https://bugs.openjdk.org/browse/JDK-7025832

In general, I think we should close this ambiguity on the sorting of UUIDs in the spec definition; as you pointed out, implementations are inconsistent in how this is performed. Whether or not this definition should be based off the RFC or if there's a good argument to retroactively work from the Java reference implementation behavior is something I'm still thinking through, and I think we should discuss in the mailing list.

@bodduv
Copy link
Author

bodduv commented Nov 17, 2025

I'm digging into the RFCs exact language but I also came across https://stackoverflow.com/questions/79489893/does-the-uuid-comparison-in-java-violate-the-uuid-standard it sounds like this sorting behavior is particularly defined for UUIDV6/7 and sorting is not prescribed in V4? At the same time, the openJDK folks acknowledge that this is a bug, so I'm not sure which yet (like I said, still digging into it) https://bugs.openjdk.org/browse/JDK-7025832

I have addressed how RFCs define ordering among UUIDs in #14216.
If you take a closer look at RFC 4122, page 4, paragraph heading "Rules for Lexical Equivalence" this specifies ordering. RFC 4122 specifies versions 1 through 4. It would be surprising if UUID v6/7 have different comparison semantics from v1-4.

RFC 9562 section "6.1.1. Sorting" mentions "UUID formats created by this specification are intended to be lexicographically sortable while in the textual representation."

if there's a good argument to retroactively work from the Java reference implementation behavior

Maintaining backward compatibility is an argument in this case.

But on the other hand, looking forward into the future, is there a possibility to communicate breaking changes and provide table migration strategy(ies) in release docs? I suppose its a topic for the mailing list.

@pvary
Copy link
Contributor

pvary commented Nov 17, 2025

Anything which could be a breaking change needs to be discussed on the dev list. If needed, discussed on the sync

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@bodduv
Copy link
Author

bodduv commented Jan 16, 2026

@pvary Thank you for promptly jumping in (during previous community sync) while I could not recollect the details of why we wanted to pause on the behavior change.

It was discussed that evaluating expressions once with RFC-compliant comparator and again with signed comparator to then allow the filter to pass if either of the expression evaluations is true. I attempt this in fe9faf5. Following is a short description of the changes.

  • Write Path: while writing new Parquet files, metrics are prepared using RFC-compliant UUID comparison (Comparators.uuids()), producing correct min/max statistics.
  • Read Path: uses two expression evaluations: once with RFC-compliant comparator (to work with newly written files) and another with signed comparator for backward compatibility with older files that may have UUID statistics computed with Java's signed UUID.compareTo().
    Affected evaluators: InclusiveMetricsEvaluator, ManifestEvaluator, ParquetMetricsRowGroupFilter
  • Performance: (almost) zero overhead for non-UUID queries. May require evaluating (filter) expressions twice, so there is some non-zero performance impact. But I suppose we would prioritize correctness.

Implementation around expression evaluation was not conducive to such two fold evaluations. So I had to force it. I hope the increased code complexity is acceptable.

Java's UUID.compareTo() uses signed comparison of most significant bits
and least significant bits, which is not compliant with RFC 4122, RFC 9562.
This causes incorrect ManifestEntry and ManifestFile filtering/pruning
in the presence of UUID filters.
…sses either RFC or non-RFC compliant UUID comparator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[UUID] Non-RFC compliant UUID literal comparisons

6 participants