Skip to content

[FLINK-38915][Kubernetes Operator] In-place suspension for FlinkBlueGreenDeployment #14

Open
james-kan-shopify wants to merge 8 commits into
mainfrom
jk.bg-in-place-restarts
Open

[FLINK-38915][Kubernetes Operator] In-place suspension for FlinkBlueGreenDeployment #14
james-kan-shopify wants to merge 8 commits into
mainfrom
jk.bg-in-place-restarts

Conversation

@james-kan-shopify
Copy link
Copy Markdown

@james-kan-shopify james-kan-shopify commented Jan 14, 2026

What is the purpose of the change

Tackles: https://issues.apache.org/jira/browse/FLINK-38915

Improve blue/green suspend/resume behavior: allow in-place suspension/resume without spawning new deployments, propagate spec changes while suspended, block suspend during transitions, and fix BG status sync bugs.

Brief change log

  • Add SUSPEND/RESUME diff detection and in-place handling in BlueGreenDeploymentService. (This means if suspension was done on blue, the pipeline will be resumed on blue when state is set back to running).
  • Block suspend requests during blue/green transitions until transition completes (Post transition will execute the suspend).
  • Block initial deployment when job.state=SUSPENDED.

Verifying this change

This change added tests and can be verified as follows:

  • FlinkBlueGreenDeploymentControllerTest: suspend/resume in-place, suspend during transition blocked, initial suspended rejection.
  • FlinkBlueGreenDeploymentSpecDiffTest: SUSPEND/RESUME diff detection.

Does this pull request potentially affect one of the following parts:

  • Dependencies: no
  • Public API (CustomResourceDescriptors): no
  • Core observer or reconciler logic: yes (blue/green suspend/resume paths, status sync)

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? not applicable

@james-kan-shopify james-kan-shopify changed the title Draft WIP [FLINK-38915][Kubernetes Operator] In-place suspension for BlueGreenDeployment Jan 15, 2026
@james-kan-shopify james-kan-shopify changed the title [FLINK-38915][Kubernetes Operator] In-place suspension for BlueGreenDeployment [FLINK-38915][Kubernetes Operator] In-place suspension for FlinkBlueGreenDeployment Jan 15, 2026
Copy link
Copy Markdown

@drossos drossos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some question about the in-place patching methods we use + general comments. Going to stress test more in sandbox.

// Reschedule to process any pending spec changes (e.g., suspend requested during
// transition)
return patchStatusUpdateControl(context, nextState, JobStatus.RUNNING, null)
.rescheduleAfter(0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we add this part?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScheduleAfter forces a reconciliation again, and after a transition it would apply the suspension that was entered but wasn't executed on during transition so we still honour what's in the parent down to the child in terms of a suspend.

Copy link
Copy Markdown

@drossos drossos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 lets flip to ready-to-review here then clean up commits and get this as an OSS PR. Great stuff

@james-kan-shopify james-kan-shopify marked this pull request as ready for review January 16, 2026 23:29
@james-kan-shopify james-kan-shopify force-pushed the jk.bg-in-place-restarts branch 5 times, most recently from eb4717a to a636429 Compare January 21, 2026 03:32
Copy link
Copy Markdown

@drossos drossos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have done this on last review, but LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants