Skip to content

Add recovery API to repair filesystem catalog's latest snapshot pointer #10

@manuzhang

Description

@manuzhang

Depends on the filesystem catalog issue.

A filesystem catalog uses version-hint.text as the latest-version pointer. Crashes between writing vN.metadata.json and updating version-hint.text can leave the pointer trailing the actual latest committed metadata. Recovery should:

  1. List metadata/v*.metadata.json
  2. Parse each, identify the highest version with a consistent snapshot lineage
  3. Optionally rewrite version-hint.text to point at it (or surface the discrepancy and let the caller decide)

Spark/Java avoid this entirely because the fix is implicit on next read (HadoopTableOperations rescans), but a C++ embedded user committing in a long-lived process needs an explicit recovery entry point.

API sketch

class FilesystemCatalog : public Catalog {
  Result<int64_t> RepairLatestPointer(const TableIdentifier& id);
};

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions