[Improvement]: Refactor snapshot-expiring via ProcessFactory plugin by baiyangtx · Pull Request #4107 · apache/amoro

baiyangtx · 2026-03-05T14:11:10Z

Why are the changes needed?

Close #xxx.

Brief change log

How was this patch tested?

Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before making a pull request

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…actory Co-Authored-By: Aime <aime@bytedance.com> Change-Id: Idfac8a56427baccaeeca27e8f71719d476d7839a

czy006 · 2026-03-13T08:12:22Z

amoro-ams/src/main/java/org/apache/amoro/server/process/iceberg/IcebergProcessFactory.java

+      ConfigOptions.key("expire-snapshots.enabled").booleanType().defaultValue(true);
+
+  public static final ConfigOption<Duration> SNAPSHOT_EXPIRE_INTERVAL =
+      ConfigOptions.key("expire-snapshot.interval")


YAML is expire-snapshots.interval

czy006 · 2026-03-13T08:17:19Z

amoro-common/src/main/java/org/apache/amoro/process/LocalExecutionEngine.java

+        properties.keySet().stream()
+            .filter(key -> key.startsWith(POOL_CONFIG_PREFIX))
+            .map(key -> key.substring(POOL_CONFIG_PREFIX.length()))
+            .map(key -> key.substring(0, key.indexOf(".") + 1))


last result is pool.default..thread-count / pool.snapshots-expiring..thread-count, that's not get the pool

czy006 · 2026-03-13T08:20:30Z

.../src/main/resources/META-INF/services/org.apache.amoro.server.process.executor.ExecuteEngine

org.apache.amoro.process.ExecuteEngine

czy006 · 2026-03-13T08:28:29Z

amoro-ams/src/main/java/org/apache/amoro/server/process/iceberg/SnapshotsExpiringProcess.java

+  @Override
+  public void run() {
+    try {
+      AmoroTable<?> amoroTable = tableRuntime.loadTable();


The problem is that the new scheduling path no longer preserves the old “run, then record cleanup time” behavior for snapshot expiration.

In the old implementation, SnapshotsExpiringExecutor.java executed tableMaintainer.expireSnapshots() synchronously. Only after that finished did PeriodicTableScheduler.java (line 125) update lastCleanTime and schedule the next run. So the interval was effectively measured from the end of the previous cleanup.

In the new path, ActionCoordinatorScheduler.java (line 103) only submits/registers a process and returns immediately. After that return, PeriodicTableScheduler still updates lastCleanTime right away, even though the real cleanup work has not finished yet. The actual cleanup now happens later in SnapshotsExpiringProcess.java (line 53).

Building on your observation — the async submission also introduces a state-loss issue in LocalExecutionEngine.getStatus().

getStatus() removes the Future from the map on terminal states (isDone/isCancelled), making it non-idempotent:

Call 1: future.isDone() == true → remove → SUCCESS Call 2: future == null → KILLED (wrong!)

TableProcessExecutor polls getStatus() in a loop (line 107), so if any retry or concurrent access queries the same identifier twice after completion, it gets KILLED instead of the real result.

There's also a TOCTOU race between containsKey and get across cancelingInstances/activeInstances (lines 67-70), since the compound check-then-act isn't atomic even with ConcurrentHashMap.

czy006 · 2026-03-13T08:38:43Z

It looks like IcebergProcessFactory receives available execute engines too early.

In AmoroServiceContainer, availableExecuteEngines(executeEngineManager.installedPlugins()) is called before executeEngineManager.initialize(), so installedPlugins() is still empty at that point. As a result, IcebergProcessFactory.localEngine is never set.

Later, when snapshot expiration is triggered, triggerExpireSnapshot() returns Optional.empty() because localEngine == null, so no SnapshotsExpiringProcess is ever created or submitted.

In other words, the new expire-snapshots path is effectively disabled due to initialization order. We probably need to initialize execute engines before injecting them into process factories, or re-inject them after engine initialization.

github-actions bot added module:ams-server Ams server module type:infra type:build module:common labels Mar 5, 2026

czy006 self-requested a review March 12, 2026 13:42

[Improvement] Refactor SnapshotExpiring inline executor with ProcessF…

b5e817f

…actory Co-Authored-By: Aime <aime@bytedance.com> Change-Id: Idfac8a56427baccaeeca27e8f71719d476d7839a

baiyangtx force-pushed the upstream/SnapshotsExpiring-processFactory branch from 709fc06 to b5e817f Compare March 12, 2026 13:50

baiyangtx marked this pull request as ready for review March 12, 2026 13:51

zhangyongxiang.alpha added 2 commits March 13, 2026 15:47

Refactor

902cb4e

fixed

3f7310f

czy006 reviewed Mar 13, 2026

View reviewed changes

zhangyongxiang.alpha added 5 commits March 13, 2026 16:38

check

167f8f3

fix executeEngineManager

2128282

spotless

ec45556

fix ActionCoordinatorScheduler

3fe4764

fix ActionCoordinatorScheduler

7330b17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Improvement]: Refactor snapshot-expiring via ProcessFactory plugin#4107

[Improvement]: Refactor snapshot-expiring via ProcessFactory plugin#4107
baiyangtx wants to merge 8 commits intoapache:masterfrom
baiyangtx:upstream/SnapshotsExpiring-processFactory

baiyangtx commented Mar 5, 2026

Uh oh!

czy006 Mar 13, 2026

Uh oh!

czy006 Mar 13, 2026

Uh oh!

czy006 Mar 13, 2026

Uh oh!

czy006 Mar 13, 2026

Uh oh!

j1wonpark Mar 15, 2026

Uh oh!

czy006 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

baiyangtx commented Mar 5, 2026

Why are the changes needed?

Brief change log

How was this patch tested?

Documentation

Uh oh!

czy006 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

czy006 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

czy006 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

czy006 Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

j1wonpark Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

czy006 commented Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants