Skip to content

Conversation

@schongloo
Copy link
Contributor

@schongloo schongloo commented Dec 5, 2025

What is the purpose of the change

This draft PR presents the initial idea for Advanced Coordination for Blue/Green Deployments.

FLIP-503 introduced the Basic Blue/Green Deployment functionality to the Flink K8s Operator. It was very straightforward, simply transitioning to the second deployment once it's considered stable. This Advanced version brings about the notion of "record-level" coordination between the 2 deployments to have no data duplication and exactly once semantics while preserving a smooth transition.

This functionality is NOT ready for the general public as some edge cases have only simple workarounds or remain unaddressed. Some examples of this are:

  • With a Kafka source we're unable to get a full copy of the traffic onto the 2nd deployment simply by leveraging the state from the other deployment's savepoint.
  • From the client-side, the GateProcessFunction simplifies the communication/writing to the ConfigMap from only 1 subtask; however if there's no data traffic flowing through that subtask index, no communication from the pipeline will occur.

The main goals of this Draft PR are:

  • For the community to take a quick look at the current functionality already mentioned at the Flink Forward 2025 Conference
  • To get feedback and improvement suggestions

Brief change log

  • Introduced a new TransitionMode in the FlinkDeploymentTemplateSpec (BASIC/ADVANCED)
  • In this mode the operator declares a ConfigMap to drive and communicate with each "pair" of Blue/Green Flink jobs
  • New GateProcessFunction abstract class defines the overall contract to implement record level filtering.
  • Client-side pipelines will communicate with the Blue/Green Controller via the aforementioned ConfigMap via a new (concrete) WatermarkGateProcessFunction, which in turn controls the flow of the records.
  • New module flink-kubernetes-operator-bluegreen-client, a distributable artifact with

Verifying this change

This change added tests and can be verified as follows:

  • Added a set of Unit Tests under WatermarkGateProcessFunctionTest

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changes to the CustomResourceDescriptors: (yes / no)
  • Core observer or reconciler logic that is regularly executed: (yes / no)

Documentation

image image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant