Skip to content

Add extension hook in SnapshotUpdate to write engine-specific sidecar files atomically #11

@manuzhang

Description

@manuzhang

Out-of-spec by design — filing for discussion before implementation.

Engines that mirror an external table format into Iceberg sometimes need to persist engine-private metadata that doesn't fit data_file columns (file lineage to original storage path, engine schema versions, encoded partition keys as raw bytes, etc.). Today this requires forking FastAppend / SnapshotUpdate because there's no callback to write a parallel sidecar atomically with the manifest set.

Proposed extension point

class SnapshotUpdate {
  // Called after manifests are written but before metadata.json is committed.
  // Receives the produced ManifestFile list so the user can write a sidecar
  // keyed by manifest path. Failures abort the commit.
  auto& WithSidecarWriter(this auto& self,
    std::function<Status(std::span<const ManifestFile>)>);
};

Sidecars become orphans on commit failure — would naturally pair with RemoveOrphanFiles walking sidecars too once the convention exists.

Open questions: should sidecars be a first-class catalog concept, or a pure callback the user manages on their own? I'd argue the latter is enough.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions