Skip to content

Add RemoveOrphanFiles maintenance action #8

@manuzhang

Description

@manuzhang

iceberg-cpp implements ExpireSnapshots (src/iceberg/update/expire_snapshots.{h,cc}) but does not have an equivalent of Java's RemoveOrphanFiles action. After expiration, table-scoped GC of unreferenced files (data files, delete files, manifests, manifest lists, statistics files, sidecar files) is left to the user.

Scope

  • Walk the table location; collect every file path
  • Walk all referenced files reachable from snapshots in TableMetadata.snapshots (data + delete files via manifests, manifest lists, statistics files, partition statistics files, metadata.json files referenced by metadata-log)
  • Optional staleness threshold (skip files newer than N hours, mirroring the Java action's olderThan to avoid racing with concurrent writers)
  • Pluggable delete_func similar to SnapshotUpdate::DeleteWith

Reference

  • org.apache.iceberg.actions.RemoveOrphanFiles

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions