Out-of-spec by design — filing for discussion before implementation.
Engines that mirror an external table format into Iceberg sometimes need to persist engine-private metadata that doesn't fit data_file columns (file lineage to original storage path, engine schema versions, encoded partition keys as raw bytes, etc.). Today this requires forking FastAppend / SnapshotUpdate because there's no callback to write a parallel sidecar atomically with the manifest set.
Proposed extension point
class SnapshotUpdate {
// Called after manifests are written but before metadata.json is committed.
// Receives the produced ManifestFile list so the user can write a sidecar
// keyed by manifest path. Failures abort the commit.
auto& WithSidecarWriter(this auto& self,
std::function<Status(std::span<const ManifestFile>)>);
};
Sidecars become orphans on commit failure — would naturally pair with RemoveOrphanFiles walking sidecars too once the convention exists.
Open questions: should sidecars be a first-class catalog concept, or a pure callback the user manages on their own? I'd argue the latter is enough.
Out-of-spec by design — filing for discussion before implementation.
Engines that mirror an external table format into Iceberg sometimes need to persist engine-private metadata that doesn't fit
data_filecolumns (file lineage to original storage path, engine schema versions, encoded partition keys as raw bytes, etc.). Today this requires forkingFastAppend/SnapshotUpdatebecause there's no callback to write a parallel sidecar atomically with the manifest set.Proposed extension point
Sidecars become orphans on commit failure — would naturally pair with
RemoveOrphanFileswalking sidecars too once the convention exists.Open questions: should sidecars be a first-class catalog concept, or a pure callback the user manages on their own? I'd argue the latter is enough.