[FLINK-38915][Kubernetes Operator] In-place suspension for FlinkBlueGreenDeployment #14
Open
james-kan-shopify wants to merge 8 commits into
Open
[FLINK-38915][Kubernetes Operator] In-place suspension for FlinkBlueGreenDeployment #14james-kan-shopify wants to merge 8 commits into
james-kan-shopify wants to merge 8 commits into
Conversation
…ob in native mode
…l for suspend/upgrade
drossos
reviewed
Jan 16, 2026
drossos
left a comment
There was a problem hiding this comment.
Some question about the in-place patching methods we use + general comments. Going to stress test more in sandbox.
| // Reschedule to process any pending spec changes (e.g., suspend requested during | ||
| // transition) | ||
| return patchStatusUpdateControl(context, nextState, JobStatus.RUNNING, null) | ||
| .rescheduleAfter(0); |
Author
There was a problem hiding this comment.
ScheduleAfter forces a reconciliation again, and after a transition it would apply the suspension that was entered but wasn't executed on during transition so we still honour what's in the parent down to the child in terms of a suspend.
drossos
reviewed
Jan 16, 2026
drossos
left a comment
There was a problem hiding this comment.
LGTM 👍 lets flip to ready-to-review here then clean up commits and get this as an OSS PR. Great stuff
eb4717a to
a636429
Compare
drossos
approved these changes
Jan 23, 2026
drossos
left a comment
There was a problem hiding this comment.
Should have done this on last review, but LGTM 👍
Draft
…estartSavepointNonce Co-authored-by: Daniel Rossos <daniel.rossos@shopify.com>
a636429 to
4847dfd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the change
Tackles: https://issues.apache.org/jira/browse/FLINK-38915
Improve blue/green suspend/resume behavior: allow in-place suspension/resume without spawning new deployments, propagate spec changes while suspended, block suspend during transitions, and fix BG status sync bugs.
Brief change log
BlueGreenDeploymentService. (This means if suspension was done on blue, the pipeline will be resumed on blue when state is set back to running).job.state=SUSPENDED.Verifying this change
This change added tests and can be verified as follows:
FlinkBlueGreenDeploymentControllerTest: suspend/resume in-place, suspend during transition blocked, initial suspended rejection.FlinkBlueGreenDeploymentSpecDiffTest: SUSPEND/RESUME diff detection.Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors): noDocumentation