Skip to content

[core] Parallelize manifest reads in snapshot expiration#8233

Draft
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/parallel-expire-manifest-read
Draft

[core] Parallelize manifest reads in snapshot expiration#8233
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:codex/parallel-expire-manifest-read

Conversation

@leaves12138

Copy link
Copy Markdown
Contributor

Purpose

Snapshot expiration currently reads manifest data serially in several cleanup paths. This can make expiration slow when IO and CPU resources are sufficient, especially for tables with many snapshots, manifest lists, and index manifests.

Changes

  • Parallelize reading data manifest entries during data file cleanup while preserving the original manifest order for merge/delete decisions.
  • Parallelize reading manifest lists and index manifests across snapshots during metadata cleanup, then apply deletion decisions sequentially against the shared skipping set.
  • Parallelize building the retained manifest/index skipping set across retained/tagged snapshots.
  • Keep existing best-effort behavior for unavailable manifest lists and missing index manifests.

Tests

  • mvn -N -Pfast-build -DskipTests install
  • mvn -pl paimon-api,paimon-test-utils,paimon-common,paimon-codegen,paimon-codegen-loader,paimon-arrow,paimon-format -Pfast-build -DskipTests install
  • mvn -pl paimon-core -Pfast-build -DskipTests compile
  • mvn -pl paimon-core -Pfast-build -Dtest=FileDeletionTest,ExpireSnapshotsTest,IndexFileExpireTableTest test
  • mvn -pl paimon-core -DskipTests validate

@leaves12138 leaves12138 force-pushed the codex/parallel-expire-manifest-read branch from dc31ab8 to 58594ff Compare June 15, 2026 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant