-
Notifications
You must be signed in to change notification settings - Fork 3k
Include key metadata in manifest tables #14750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include key metadata in manifest tables #14750
Conversation
|
cc @ggershinsky |
|
@tom-s-powell Thanks for the PR! The fix makes sense to me. Would it be possible to add a regression test that fails without this change and passes with it? |
|
We are also going to need #14751 for full testable change. The test I would like to add to https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestTableEncryption.java is: Without the changes in #14751 we get the following: If we have the changes from #14751 without the changes in this PR we get the following: |
| private static final String SCHEMA = "schema"; | ||
| private static final String SPECS = "partition-specs"; | ||
| private static final String MANIFEST_LIST_LOCATION = "manifest-list-Location"; | ||
| private static final String MANIFEST_LIST_KEY_ID = "manifest-list-key-id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it have to follow the https://github.com/apache/iceberg/blob/main/format/spec.md ? If so, the field is named "key-id" there. This string is also used in https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L56
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not. The "manifest-list-Location" is also different from spec's version (and https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L52). Is this ok?
|
@huaxingao curious if you had more thoughts on this |
| ManifestsTable.partitionSummariesToRows(spec, manifest.partitions()), | ||
| referenceSnapshotId); | ||
| referenceSnapshotId, | ||
| manifest.keyMetadata() == null ? null : manifest.keyMetadata().array()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: manifest.keyMetadata().array() assumes the buffer is array‑backed. To be defensive against non‑array‑backed ByteBuffers, could we use ByteBuffers.toByteArray(manifest.keyMetadata()) here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to that @huaxingao
| Types.NestedField.optional(13, "upper_bound", Types.StringType.get())))), | ||
| REF_SNAPSHOT_ID); | ||
| REF_SNAPSHOT_ID, | ||
| Types.NestedField.optional(19, "key_metadata", Types.BinaryType.get())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szehon-ho Just wondering if you have any comments on adding key_metadata in MANIFEST_FILE_SCHEMA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not familiar with key_metadata. Which table is it in today? Files/Entries?
Do we need one in regular (not all) manifest table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key_metadata is the encryption key metadata. Today it’s in both files and entries. This PR is adding key_metadata for manifest files (i.e., ManifestFile.keyMetadata()) to .all_manifests, since Spark actions read manifests via .all_manifests and need the manifest’s key metadata to open/decrypt encrypted manifest files. I think we should add it to the regular .manifests table too for parity, but I guess it can be done in a followup.
|
@huaxingao - are you happy with this in the current state, perhaps with a FLUP to add it to other tables? |
huaxingao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks @tom-s-powell for the PR! Thanks @ggershinsky @ggershinsky for the review! |
|
@tom-s-powell Now iceberg supports Spark 4.1. Could you please have a follow-up PR to port the changes to 4.1? |
Adding
key_metadatato manifest tables such that these will work when usingEncryptingFileIO.