[Feature] Add remote maintainer framework for Paimon tables#4068
[Feature] Add remote maintainer framework for Paimon tables#4068
Conversation
This commit introduces a framework for executing Paimon table maintenance operations (snapshot expiration, orphan file cleanup) remotely on Spark optimizers, following the existing Optimizer pattern. Changes: - Add MaintainerInput/Output interfaces and base implementations - Add MaintainerExecutor/Factory interfaces for remote execution - Create amoro-optimizer-paimon-spark module with SparkMaintainerExecutor - Implement PaimonSnapshotExpire* components for snapshot expiration - Add placeholder SparkOptimizer for future Paimon optimizing support Co-Authored-By: Claude (glm-4.7) <noreply@anthropic.com>
|
Thanks for working on this. Wondering how does this maintainer framework work with the current process/external process API. I notice there has been some Paimon related work on this by @LiangDai-Mars and @baiyangtx . I think it's the right time to have a dicussion on this. Curious how people think @zhoujinsong By the way I'm open to have a new framwork if we could provide better extension to multi-format |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@amoro.apache.org list. Thank you for your contributions. |
This PR introduces a framework for executing Paimon table maintenance operations (snapshot expiration, orphan file cleanup) remotely on Spark optimizers.
Changes:
Key Design: