-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Include key metadata in manifest tables #14750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -37,6 +37,7 @@ class AllManifestsTableTaskParser { | |
| private static final String SCHEMA = "schema"; | ||
| private static final String SPECS = "partition-specs"; | ||
| private static final String MANIFEST_LIST_LOCATION = "manifest-list-Location"; | ||
| private static final String MANIFEST_LIST_KEY_ID = "manifest-list-key-id"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it have to follow the https://github.com/apache/iceberg/blob/main/format/spec.md ? If so, the field is named "key-id" there. This string is also used in https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L56
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe not. The "manifest-list-Location" is also different from spec's version (and https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotParser.java#L52). Is this ok? |
||
| private static final String RESIDUAL = "residual-filter"; | ||
| private static final String REFERENCE_SNAPSHOT_ID = "reference-snapshot-id"; | ||
|
|
||
|
|
@@ -63,7 +64,10 @@ static void toJson(AllManifestsTable.ManifestListReadTask task, JsonGenerator ge | |
|
|
||
| generator.writeEndArray(); | ||
|
|
||
| generator.writeStringField(MANIFEST_LIST_LOCATION, task.manifestListLocation()); | ||
| generator.writeStringField(MANIFEST_LIST_LOCATION, task.manifestList().location()); | ||
| if (task.manifestList().encryptionKeyID() != null) { | ||
| generator.writeStringField(MANIFEST_LIST_KEY_ID, task.manifestList().encryptionKeyID()); | ||
| } | ||
|
|
||
| generator.writeFieldName(RESIDUAL); | ||
| ExpressionParser.toJson(task.residual(), generator); | ||
|
|
@@ -92,6 +96,7 @@ static AllManifestsTable.ManifestListReadTask fromJson(JsonNode jsonNode) { | |
|
|
||
| Map<Integer, PartitionSpec> specsById = PartitionUtil.indexSpecs(specsBuilder.build()); | ||
| String manifestListLocation = JsonUtil.getString(MANIFEST_LIST_LOCATION, jsonNode); | ||
| String manifestListKeyId = JsonUtil.getStringOrNull(MANIFEST_LIST_KEY_ID, jsonNode); | ||
| Expression residualFilter = ExpressionParser.fromJson(JsonUtil.get(RESIDUAL, jsonNode)); | ||
| long referenceSnapshotId = JsonUtil.getLong(REFERENCE_SNAPSHOT_ID, jsonNode); | ||
|
|
||
|
|
@@ -100,7 +105,7 @@ static AllManifestsTable.ManifestListReadTask fromJson(JsonNode jsonNode) { | |
| fileIO, | ||
| schema, | ||
| specsById, | ||
| manifestListLocation, | ||
| new BaseManifestListFile(manifestListLocation, manifestListKeyId), | ||
| residualFilter, | ||
| referenceSnapshotId); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szehon-ho Just wondering if you have any comments on adding
key_metadatainMANIFEST_FILE_SCHEMAThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not familiar with key_metadata. Which table is it in today? Files/Entries?
Do we need one in regular (not all) manifest table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
key_metadatais the encryption key metadata. Today it’s in both files and entries. This PR is addingkey_metadatafor manifest files (i.e., ManifestFile.keyMetadata()) to .all_manifests, since Spark actions read manifests via .all_manifests and need the manifest’s key metadata to open/decrypt encrypted manifest files. I think we should add it to the regular .manifests table too for parity, but I guess it can be done in a followup.