Summary
The SparkSubmitOperator ResumableJobMixin implementation now supports three
deployment backends (Spark standalone driver-status tracking, YARN cluster mode,
and Kubernetes driver-pod tracking). Each mixin method branches on the backend
inline, so per-backend logic is scattered across many methods instead of living
in one place per backend. This issue tracks decoupling them.
Background
Resumability for SparkSubmitOperator landed incrementally:
During review of #68067, the refactor was raised as a non-blocking idea and the
author agreed to follow up after the 3.3.0 release
(#68067 (comment)).
Problem
In providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py,
the ResumableJobMixin methods each carry their own backend branching:
submit_job
get_job_status
is_job_active
is_job_succeeded
poll_until_complete
on_kill
Every method repeats if self._hook._is_yarn_cluster_mode: ... if self._hook._is_kubernetes: ... else (standalone).
A single backend's behaviour is therefore spread across six methods, which makes
the flow hard to follow, easy to break when adding a backend, and awkward to test
in isolation.
Proposed change
Separate each backend's logic so it is cohesive - for example a per-backend
strategy/handler class (standalone / YARN / K8s) implementing a common interface
(submit_job, get_job_status, is_job_active, is_job_succeeded,
poll_until_complete, on_kill), with the operator selecting the handler based
on deploy mode and tracking flags. A lighter alternative is grouping each
backend's branch into dedicated private methods. Decide between the two during
design.
Acceptance criteria
- Per-backend logic is cohesive (one class or one method group per backend), not
interleaved across the mixin methods.
- Backend selection happens once instead of being re-derived in every method.
- Existing behaviour is unchanged; current tests pass and per-backend logic is
unit-testable in isolation.
- No public API change to
SparkSubmitOperator.
Notes
- Non-breaking, internal refactor -- target after the 3.3.0 release.
Summary
The
SparkSubmitOperatorResumableJobMixinimplementation now supports threedeployment backends (Spark standalone driver-status tracking, YARN cluster mode,
and Kubernetes driver-pod tracking). Each mixin method branches on the backend
inline, so per-backend logic is scattered across many methods instead of living
in one place per backend. This issue tracks decoupling them.
Background
Resumability for
SparkSubmitOperatorlanded incrementally:ResumableJobMixinwithSparkSubmitOperatoras a case study for surviving worker failures (standalone) #67118: standalone SparkResumableJobMixinDuring review of #68067, the refactor was raised as a non-blocking idea and the
author agreed to follow up after the 3.3.0 release
(#68067 (comment)).
Problem
In
providers/apache/spark/src/airflow/providers/apache/spark/operators/spark_submit.py,the
ResumableJobMixinmethods each carry their own backend branching:submit_jobget_job_statusis_job_activeis_job_succeededpoll_until_completeon_killEvery method repeats
if self._hook._is_yarn_cluster_mode: ... if self._hook._is_kubernetes: ... else (standalone).A single backend's behaviour is therefore spread across six methods, which makes
the flow hard to follow, easy to break when adding a backend, and awkward to test
in isolation.
Proposed change
Separate each backend's logic so it is cohesive - for example a per-backend
strategy/handler class (standalone / YARN / K8s) implementing a common interface
(
submit_job,get_job_status,is_job_active,is_job_succeeded,poll_until_complete,on_kill), with the operator selecting the handler basedon deploy mode and tracking flags. A lighter alternative is grouping each
backend's branch into dedicated private methods. Decide between the two during
design.
Acceptance criteria
interleaved across the mixin methods.
unit-testable in isolation.
SparkSubmitOperator.Notes