Skip to content

Comments

CFP-41292 : xDS-controlled L4 LoadBalancer#75

Draft
tsotne95 wants to merge 1 commit intocilium:mainfrom
tsotne95:xds-l4-lb-cfp
Draft

CFP-41292 : xDS-controlled L4 LoadBalancer#75
tsotne95 wants to merge 1 commit intocilium:mainfrom
tsotne95:xds-l4-lb-cfp

Conversation

@tsotne95
Copy link

@tsotne95 tsotne95 commented Aug 4, 2025

Add new CFP for xDS-controlled L4 LoadBalancer

@xmulligan
Copy link
Member

Cc: @cilium/sig-datapath

@xmulligan
Copy link
Member

@tsotne95 is this related to #14 at all?

@tsotne95
Copy link
Author

tsotne95 commented Aug 5, 2025

hi,
I wasn't aware of it (that we had a separate cfp), but it's definitely continuation of experimental xds client (34484) work.

cc: @bowei

@xmulligan xmulligan changed the title feat(cfp): Add new CFP for xDS-controlled L4 LoadBalancer CFP-41292 : xDS-controlled L4 LoadBalancer Aug 20, 2025
Signed-off-by: Tsotne Chakhvadze <tsotne@google.com>
@joamaki
Copy link

joamaki commented Aug 21, 2025

Would be great to see some more thinking behind how this would be usable by the community. In other words what would be the plan for providing a production ready xDS server for this (as brought up in the earlier CFP #14), how would this be tested, maintained and documented over the long term? Having a clear longer term plan would reduce the risk that this ends up staying an experimental feature that is hard/impossible to use.

Comment on lines +135 to +137
### MCS Service Importer Integration

Once this feature is stable, the MCS (Multi-Cluster Services) service importer can be refactored to use this xDS client as its backend for L4 service discovery, rather than directly programming services itself. This would unify the standalone and multi-cluster L4 LB implementation paths, reducing complexity and improving maintainability. No newline at end of file
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand more on in what ways would you be able to refactor MCS support with XDS? Are you talking about Cilium ClusterMesh or third party implementations (like GKE I'm assuming) that would use this to do their own non ClusterMesh MCS implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tsotne95 -- I think this refers to the Google-specific MCS implementation, which is a detail that doesn't make sense for an open-source upstream proposal. I know it make sense for our systems, but its probably not very relevant here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using xDS for the MCS flow is meant to streamline any Multi-Cluster Services implementation in Cilium - not just the Google/GKE variant. The current upstream MCS controller (pkg/clustermesh/mcsapi) constructs ServiceImports and then directly programs derived Services and backends to drive Cilium’s datapath. Once the xDS-based L4 LB is available, that controller could instead translate the ServiceImport state into Cluster and ClusterLoadAssignment resources and feed them through the same xDS client used by standalone L4 LB. This removes the special-case service-programming logic in the MCS code path and lets both ClusterMesh and third‑party MCS solutions reuse a single, well-documented control-plane interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok -- that context really helps.

Copy link
Member

@MrFreezeex MrFreezeex Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional context!

One useful thing with the current implementation is that ServiceImport implemented using a derived Service technically inherently brings an additional feature allowing users to target the derived Service with a third party controller not directly supporting MCS-API, for instance one useful example is the Prometheus operator.

I also reckon that since this exist https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/ipallocator/ipallocator.go it seems possible to allocate an IP which removes an additional complexity of allocating the service IPs for a native implementation. But what's the advantage of using a XDS implementation vs having the agent watching for ServiceImport directly anyway?

Also FYI there's all the logic to sync the backends across clusters which is tied to the global derived Service, this is definitely not an easy task...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the questions. The CFP’s primary goal is a standalone xDS‑controlled L4 load balancer. The MCS discussion is meant to show one possible follow‑on benefit, not to imply that the feature only targets MCS or GKE. But yes, I’m not an MCS expert.

As I see, Cilium’s upstream MCS support has dedicated controllers that synthesize a “derived” Service and mirrored EndpointSlices to drive the datapath. The mcsAPIServiceReconciler creates a new Service annotated as global so ClusterMesh logic programs the VIP and backends. A companion mcsAPIEndpointSliceMirrorReconciler copies each local EndpointSlice into that derived Service. (https://github.com/cilium/cilium/blob/main/pkg/clustermesh/mcsapi/README.md)

Once the xDS L4 LB is in place, the MCS controller could translate aggregated ServiceImport state into standard Cluster and ClusterLoadAssignment resources instead of constructing derived Services and mirrored slices. The agent’s xDS client would program VIPs and endpoints directly, so the mentioned controllers above could be removed or greatly simplified.

Copy link
Member

@MrFreezeex MrFreezeex Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCS discussion is meant to show one possible follow‑on benefit, not to imply that the feature only targets MCS or GKE. But yes, I’m not an MCS expert. [...] Once the xDS L4 LB is in place, the MCS controller could translate aggregated ServiceImport state into standard Cluster and ClusterLoadAssignment resources

Yes sure I got that, but the CFP quote several time MCS as a possible future benefit but I am a bit skeptic on whether or not implementing MCS that way is actually a good choice.

My main question is assuming we want to have a native implementation of MCS-API that does not create derived Service (which is already a big question mark), why that would be easier to do that with a xDS server vs just making the cilum-agent start watching the ServiceImport resources and associated EndpointSlice? I don't really get why we would want to rely on xDS here, it seems to go against your goal of "unify different L4 service discovery mechanisms" if for instance load balacing for Service still rely on watching the kube-apiserver directly (+ the ClusterMesh controlplanes for Global Services not directly related to MCS) and that for MCS there would be some xDS server involved instead of watching kube-apiserver + the ClusterMesh controlplanes for other clusters?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I'm pretty sure we don't know each other, so I don't like your tone.

I want to start by saying that I don’t share your view here. The suggestion to simply have the agent watch ServiceImport/EndpointSlice directly misses the larger architectural picture and, in my opinion, works against the goal of simplifying and unifying service discovery in Cilium.

  1. One agent ingestion path, not three.
    Today the agent already has a control-plane surface for Envoy/xDS (used for L7 and control components). If we translate ServiceImport/EndpointSlice to CDS/EDS once (in a controller), the agent consumes the same schema for all L4 sources (local Services, ClusterMesh/global services, MCS). That means one programming pipeline to eBPF LB state instead of separate Service, MCS, and ClusterMesh codepaths in the agent.

  2. Protocol semantics you'd otherwise re-implement.
    xDS gives you ACK/NACK, versioning, and delta (incremental) updates. Those are valuable under churn (endpoint scale-out/scale-in across clusters) and during rollouts. Doing raw informer watches in every agent means you have to home-grow back-pressure and consistency logic (or accept more flapping). With xDS, these semantics are built-in and widely tested.

  3. Less load on the kube-apiserver.
    With watch directly in every agent, N agents watch MCS objects and each performs similar merges. With xDS, a central (or sharded) translator watches ServiceImport/EndpointSlice once, computes the desired VIP/backends, and fans out compact deltas to agents. This reduces apiserver fan-out and gives you a place to shard/cache if a fleet grows.

  4. Cleaner separation of concerns.
    The producer (MCS translator) knows Kubernetes types. the consumer (agent) only knows Clusters and Endpoints (CDS/EDS). That decouples Kubernetes schema evolution from the agent. It also keeps the agent's L4 logic identical whether the source is a local Service, a ClusterMesh export, or MCS.

  5. Extensibility you’ll need later.
    The moment you want per-cluster weights, locality hints, failover policies, or to ingest services from non-Kubernetes registries, you don’t touch the agent—only emit the same xDS resources. That’s exactly why xDS exists across service-mesh ecosystems.

I believe this makes the reasoning behind my proposal clear. I don't intend to keep going in circles on this point - my position is that xDS provides the unification and scalability properties we want, while duplicating watchers in the agent does not.

Copy link
Member

@MrFreezeex MrFreezeex Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, I'm pretty sure we don't know each other, so I don't like your tone.

Sorry if it appeared that way, I am genuinely interested to know what you are proposing might (since it's a future goal) impact the ClusterMesh area and what you wrote here is very helpful to me to know what this is about.

One agent ingestion path, not three.
Today the agent already has a control-plane surface for Envoy/xDS (used for L7 and control components). If we translate ServiceImport/EndpointSlice to CDS/EDS once (in a controller), the agent consumes the same schema for all L4 sources (local Services, ClusterMesh/global services, MCS). That means one programming pipeline to eBPF LB state instead of separate Service, MCS, and ClusterMesh codepaths in the agent.

Ah ok this was the main point that I was missing! The CFP seems to suggest that the xDS server would be an optional thing though, but I am assuming that ideally you would want this to become the primary way to do load balancing in Cilium and eventually sunset the per agent Service/EndpointSlice watchers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very good questions!

I think one of the motivators was the feature request by some users at the Dev Summit for integration via xDS -- as this is a common control plane used in many places (I think this statement is still true excluding my own biases :-) ). It makes Cilium an even more useful infrastructure building block as brings capabilities that will be hard to replicate in K8s API infra such as dynamic load-balancing weights.

That being said, I am agreed that we don't want to create a partial solution in the project, so the some e2e solution will need to exist. I think the scope of such a thing will be something that requires discussion. My opinion is that the server should be mostly off-the-shelf and meet the basic user journeys, rather than trying to build an expansive feature at least at this point.

Whether or not MCS will reuse this mechanism -- I do know there are some drawbacks to the derived Service approach in that it causes extra load on the API server.

Regarding how we support this in the project -- I view this completing the experimental xDS client that exists today. We should have strict requirements that if this doesn't get usage, then we can remove it entirely to avoid paying a maintenance cost.

@bowei
Copy link
Member

bowei commented Oct 31, 2025

@tsotne95 -- if you aren't planning on working on this one, I would park this proposal for the time being.

@tsotne95
Copy link
Author

tsotne95 commented Nov 3, 2025

@tsotne95 -- if you aren't planning on working on this one, I would park this proposal for the time being.

Thanks for checking in. Our immediate priorities have shifted, but It's definitely still on my radar, and I'll come back to it as soon as I can (this year).

@pchaigno pchaigno marked this pull request as draft November 3, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants