Skip to content

Refactor SparkSubmitOperator resumable job tracking backends#68543

Open
onlyarnav wants to merge 1 commit into
apache:mainfrom
onlyarnav:refactor-spark-submit-resumable-backends
Open

Refactor SparkSubmitOperator resumable job tracking backends#68543
onlyarnav wants to merge 1 commit into
apache:mainfrom
onlyarnav:refactor-spark-submit-resumable-backends

Conversation

@onlyarnav

Copy link
Copy Markdown

Decouples the three resumable deployment backends (Spark standalone driver-status tracking, YARN cluster mode, and Kubernetes driver-pod tracking) in SparkSubmitOperator.

Problem

Previously, each method in the ResumableJobMixin implementation of SparkSubmitOperator (submit_job, get_job_status, is_job_active, is_job_succeeded, poll_until_complete, on_kill) branched inline on the active deployment backend. This spread backend-specific logic across multiple methods, making the codebase hard to follow and difficult to extend.

Solution

  1. Introduced a Strategy pattern:
    • SparkSubmitResumableBackend acts as the abstract base class/interface.
    • YarnSparkSubmitBackend, KubernetesSparkSubmitBackend, and StandaloneSparkSubmitBackend encapsulate backend-specific logic.
  2. Added a cached _resumable_backend property in SparkSubmitOperator to resolve the backend selection exactly once (lazily on first access).
  3. Delegated all the mixin methods in SparkSubmitOperator directly to the active strategy backend, keeping the operator clean.

closes: #68505


Was generative AI tooling used to co-author this PR?
  • Yes — Claude Code (Opus 4.8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor SparkSubmitOperator resumable backends into separate methods/classes

1 participant