CFP-41292 : xDS-controlled L4 LoadBalancer by tsotne95 · Pull Request #75 · cilium/design-cfps

tsotne95 · 2025-08-04T18:54:48Z

Add new CFP for xDS-controlled L4 LoadBalancer

xmulligan · 2025-08-05T08:22:04Z

Cc: @cilium/sig-datapath

xmulligan · 2025-08-05T08:22:28Z

@tsotne95 is this related to #14 at all?

tsotne95 · 2025-08-05T09:55:46Z

hi,
I wasn't aware of it (that we had a separate cfp), but it's definitely continuation of experimental xds client (34484) work.

cc: @bowei

cilium/CFP-41292-xds-standalone-loadbalancer.md

Signed-off-by: Tsotne Chakhvadze <tsotne@google.com>

joamaki · 2025-08-21T08:39:53Z

Would be great to see some more thinking behind how this would be usable by the community. In other words what would be the plan for providing a production ready xDS server for this (as brought up in the earlier CFP #14), how would this be tested, maintained and documented over the long term? Having a clear longer term plan would reduce the risk that this ends up staying an experimental feature that is hard/impossible to use.

MrFreezeex · 2025-08-25T22:43:04Z

cilium/CFP-41292-xds-standalone-loadbalancer.md

+### MCS Service Importer Integration
+
+Once this feature is stable, the MCS (Multi-Cluster Services) service importer can be refactored to use this xDS client as its backend for L4 service discovery, rather than directly programming services itself. This would unify the standalone and multi-cluster L4 LB implementation paths, reducing complexity and improving maintainability.


Could you expand more on in what ways would you be able to refactor MCS support with XDS? Are you talking about Cilium ClusterMesh or third party implementations (like GKE I'm assuming) that would use this to do their own non ClusterMesh MCS implementation?

@tsotne95 -- I think this refers to the Google-specific MCS implementation, which is a detail that doesn't make sense for an open-source upstream proposal. I know it make sense for our systems, but its probably not very relevant here?

Using xDS for the MCS flow is meant to streamline any Multi-Cluster Services implementation in Cilium - not just the Google/GKE variant. The current upstream MCS controller (pkg/clustermesh/mcsapi) constructs ServiceImports and then directly programs derived Services and backends to drive Cilium’s datapath. Once the xDS-based L4 LB is available, that controller could instead translate the ServiceImport state into Cluster and ClusterLoadAssignment resources and feed them through the same xDS client used by standalone L4 LB. This removes the special-case service-programming logic in the MCS code path and lets both ClusterMesh and third‑party MCS solutions reuse a single, well-documented control-plane interface.

Ah ok -- that context really helps.

Thanks for the additional context!

One useful thing with the current implementation is that ServiceImport implemented using a derived Service technically inherently brings an additional feature allowing users to target the derived Service with a third party controller not directly supporting MCS-API, for instance one useful example is the Prometheus operator.

I also reckon that since this exist https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/ipallocator/ipallocator.go it seems possible to allocate an IP which removes an additional complexity of allocating the service IPs for a native implementation. But what's the advantage of using a XDS implementation vs having the agent watching for ServiceImport directly anyway?

Also FYI there's all the logic to sync the backends across clusters which is tied to the global derived Service, this is definitely not an easy task...

Thanks for the questions. The CFP’s primary goal is a standalone xDS‑controlled L4 load balancer. The MCS discussion is meant to show one possible follow‑on benefit, not to imply that the feature only targets MCS or GKE. But yes, I’m not an MCS expert.

As I see, Cilium’s upstream MCS support has dedicated controllers that synthesize a “derived” Service and mirrored EndpointSlices to drive the datapath. The mcsAPIServiceReconciler creates a new Service annotated as global so ClusterMesh logic programs the VIP and backends. A companion mcsAPIEndpointSliceMirrorReconciler copies each local EndpointSlice into that derived Service. (https://github.com/cilium/cilium/blob/main/pkg/clustermesh/mcsapi/README.md)

Once the xDS L4 LB is in place, the MCS controller could translate aggregated ServiceImport state into standard Cluster and ClusterLoadAssignment resources instead of constructing derived Services and mirrored slices. The agent’s xDS client would program VIPs and endpoints directly, so the mentioned controllers above could be removed or greatly simplified.

The MCS discussion is meant to show one possible follow‑on benefit, not to imply that the feature only targets MCS or GKE. But yes, I’m not an MCS expert. [...] Once the xDS L4 LB is in place, the MCS controller could translate aggregated ServiceImport state into standard Cluster and ClusterLoadAssignment resources

Yes sure I got that, but the CFP quote several time MCS as a possible future benefit but I am a bit skeptic on whether or not implementing MCS that way is actually a good choice.

My main question is assuming we want to have a native implementation of MCS-API that does not create derived Service (which is already a big question mark), why that would be easier to do that with a xDS server vs just making the cilum-agent start watching the ServiceImport resources and associated EndpointSlice? I don't really get why we would want to rely on xDS here, it seems to go against your goal of "unify different L4 service discovery mechanisms" if for instance load balacing for Service still rely on watching the kube-apiserver directly (+ the ClusterMesh controlplanes for Global Services not directly related to MCS) and that for MCS there would be some xDS server involved instead of watching kube-apiserver + the ClusterMesh controlplanes for other clusters?

First of all, I'm pretty sure we don't know each other, so I don't like your tone.

I want to start by saying that I don’t share your view here. The suggestion to simply have the agent watch ServiceImport/EndpointSlice directly misses the larger architectural picture and, in my opinion, works against the goal of simplifying and unifying service discovery in Cilium.

One agent ingestion path, not three.
Today the agent already has a control-plane surface for Envoy/xDS (used for L7 and control components). If we translate ServiceImport/EndpointSlice to CDS/EDS once (in a controller), the agent consumes the same schema for all L4 sources (local Services, ClusterMesh/global services, MCS). That means one programming pipeline to eBPF LB state instead of separate Service, MCS, and ClusterMesh codepaths in the agent.

Protocol semantics you'd otherwise re-implement.
xDS gives you ACK/NACK, versioning, and delta (incremental) updates. Those are valuable under churn (endpoint scale-out/scale-in across clusters) and during rollouts. Doing raw informer watches in every agent means you have to home-grow back-pressure and consistency logic (or accept more flapping). With xDS, these semantics are built-in and widely tested.

Less load on the kube-apiserver.
With watch directly in every agent, N agents watch MCS objects and each performs similar merges. With xDS, a central (or sharded) translator watches ServiceImport/EndpointSlice once, computes the desired VIP/backends, and fans out compact deltas to agents. This reduces apiserver fan-out and gives you a place to shard/cache if a fleet grows.

Cleaner separation of concerns.
The producer (MCS translator) knows Kubernetes types. the consumer (agent) only knows Clusters and Endpoints (CDS/EDS). That decouples Kubernetes schema evolution from the agent. It also keeps the agent's L4 logic identical whether the source is a local Service, a ClusterMesh export, or MCS.

Extensibility you’ll need later.
The moment you want per-cluster weights, locality hints, failover policies, or to ingest services from non-Kubernetes registries, you don’t touch the agent—only emit the same xDS resources. That’s exactly why xDS exists across service-mesh ecosystems.

I believe this makes the reasoning behind my proposal clear. I don't intend to keep going in circles on this point - my position is that xDS provides the unification and scalability properties we want, while duplicating watchers in the agent does not.

First of all, I'm pretty sure we don't know each other, so I don't like your tone.

Sorry if it appeared that way, I am genuinely interested to know what you are proposing might (since it's a future goal) impact the ClusterMesh area and what you wrote here is very helpful to me to know what this is about.

One agent ingestion path, not three.
Today the agent already has a control-plane surface for Envoy/xDS (used for L7 and control components). If we translate ServiceImport/EndpointSlice to CDS/EDS once (in a controller), the agent consumes the same schema for all L4 sources (local Services, ClusterMesh/global services, MCS). That means one programming pipeline to eBPF LB state instead of separate Service, MCS, and ClusterMesh codepaths in the agent.

Ah ok this was the main point that I was missing! The CFP seems to suggest that the xDS server would be an optional thing though, but I am assuming that ideally you would want this to become the primary way to do load balancing in Cilium and eventually sunset the per agent Service/EndpointSlice watchers?

These are very good questions!

I think one of the motivators was the feature request by some users at the Dev Summit for integration via xDS -- as this is a common control plane used in many places (I think this statement is still true excluding my own biases :-) ). It makes Cilium an even more useful infrastructure building block as brings capabilities that will be hard to replicate in K8s API infra such as dynamic load-balancing weights.

That being said, I am agreed that we don't want to create a partial solution in the project, so the some e2e solution will need to exist. I think the scope of such a thing will be something that requires discussion. My opinion is that the server should be mostly off-the-shelf and meet the basic user journeys, rather than trying to build an expansive feature at least at this point.

Whether or not MCS will reuse this mechanism -- I do know there are some drawbacks to the derived Service approach in that it causes extra load on the API server.

Regarding how we support this in the project -- I view this completing the experimental xDS client that exists today. We should have strict requirements that if this doesn't get usage, then we can remove it entirely to avoid paying a maintenance cost.

bowei · 2025-10-31T23:30:09Z

@tsotne95 -- if you aren't planning on working on this one, I would park this proposal for the time being.

tsotne95 · 2025-11-03T08:11:50Z

@tsotne95 -- if you aren't planning on working on this one, I would park this proposal for the time being.

Thanks for checking in. Our immediate priorities have shifted, but It's definitely still on my radar, and I'll come back to it as soon as I can (this year).

tsotne95 force-pushed the xds-l4-lb-cfp branch from dafffd1 to 553ca2f Compare August 11, 2025 10:33

tsotne95 mentioned this pull request Aug 20, 2025

CFP: Add new CFP for xDS-controlled L4 LoadBalancer cilium/cilium#41292

Closed

tsotne95 force-pushed the xds-l4-lb-cfp branch from 553ca2f to 3094b05 Compare August 20, 2025 08:02

xmulligan changed the title ~~feat(cfp): Add new CFP for xDS-controlled L4 LoadBalancer~~ CFP-41292 : xDS-controlled L4 LoadBalancer Aug 20, 2025

xmulligan reviewed Aug 20, 2025

View reviewed changes

cilium/CFP-41292-xds-standalone-loadbalancer.md Outdated Show resolved Hide resolved

feat(cfp): Add new CFP for xDS-controlled L4 LoadBalancer

d9a2697

Signed-off-by: Tsotne Chakhvadze <tsotne@google.com>

tsotne95 force-pushed the xds-l4-lb-cfp branch from 3094b05 to d9a2697 Compare August 20, 2025 08:46

MrFreezeex reviewed Aug 25, 2025

View reviewed changes

pchaigno marked this pull request as draft November 3, 2025 10:48

		### MCS Service Importer Integration

		Once this feature is stable, the MCS (Multi-Cluster Services) service importer can be refactored to use this xDS client as its backend for L4 service discovery, rather than directly programming services itself. This would unify the standalone and multi-cluster L4 LB implementation paths, reducing complexity and improving maintainability. No newline at end of file

Comments

Conversation

tsotne95 commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xmulligan commented Aug 5, 2025

Uh oh!

xmulligan commented Aug 5, 2025

Uh oh!

tsotne95 commented Aug 5, 2025

Uh oh!

Uh oh!

joamaki commented Aug 21, 2025

Uh oh!

MrFreezeex Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

bowei Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

tsotne95 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

bowei Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

MrFreezeex Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tsotne95 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

MrFreezeex Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tsotne95 Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

MrFreezeex Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bowei Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

bowei commented Oct 31, 2025

Uh oh!

tsotne95 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tsotne95 commented Aug 4, 2025 •

edited

Loading

MrFreezeex Aug 26, 2025 •

edited

Loading

MrFreezeex Aug 26, 2025 •

edited

Loading

MrFreezeex Aug 26, 2025 •

edited

Loading