CFP-41292 : xDS-controlled L4 LoadBalancer#75
Conversation
|
Cc: @cilium/sig-datapath |
dafffd1 to
553ca2f
Compare
553ca2f to
3094b05
Compare
Signed-off-by: Tsotne Chakhvadze <tsotne@google.com>
3094b05 to
d9a2697
Compare
|
Would be great to see some more thinking behind how this would be usable by the community. In other words what would be the plan for providing a production ready xDS server for this (as brought up in the earlier CFP #14), how would this be tested, maintained and documented over the long term? Having a clear longer term plan would reduce the risk that this ends up staying an experimental feature that is hard/impossible to use. |
| ### MCS Service Importer Integration | ||
|
|
||
| Once this feature is stable, the MCS (Multi-Cluster Services) service importer can be refactored to use this xDS client as its backend for L4 service discovery, rather than directly programming services itself. This would unify the standalone and multi-cluster L4 LB implementation paths, reducing complexity and improving maintainability. No newline at end of file |
There was a problem hiding this comment.
Could you expand more on in what ways would you be able to refactor MCS support with XDS? Are you talking about Cilium ClusterMesh or third party implementations (like GKE I'm assuming) that would use this to do their own non ClusterMesh MCS implementation?
There was a problem hiding this comment.
@tsotne95 -- I think this refers to the Google-specific MCS implementation, which is a detail that doesn't make sense for an open-source upstream proposal. I know it make sense for our systems, but its probably not very relevant here?
There was a problem hiding this comment.
Using xDS for the MCS flow is meant to streamline any Multi-Cluster Services implementation in Cilium - not just the Google/GKE variant. The current upstream MCS controller (pkg/clustermesh/mcsapi) constructs ServiceImports and then directly programs derived Services and backends to drive Cilium’s datapath. Once the xDS-based L4 LB is available, that controller could instead translate the ServiceImport state into Cluster and ClusterLoadAssignment resources and feed them through the same xDS client used by standalone L4 LB. This removes the special-case service-programming logic in the MCS code path and lets both ClusterMesh and third‑party MCS solutions reuse a single, well-documented control-plane interface.
There was a problem hiding this comment.
Ah ok -- that context really helps.
There was a problem hiding this comment.
Thanks for the additional context!
One useful thing with the current implementation is that ServiceImport implemented using a derived Service technically inherently brings an additional feature allowing users to target the derived Service with a third party controller not directly supporting MCS-API, for instance one useful example is the Prometheus operator.
I also reckon that since this exist https://github.com/kubernetes/kubernetes/blob/master/pkg/registry/core/service/ipallocator/ipallocator.go it seems possible to allocate an IP which removes an additional complexity of allocating the service IPs for a native implementation. But what's the advantage of using a XDS implementation vs having the agent watching for ServiceImport directly anyway?
Also FYI there's all the logic to sync the backends across clusters which is tied to the global derived Service, this is definitely not an easy task...
There was a problem hiding this comment.
Thanks for the questions. The CFP’s primary goal is a standalone xDS‑controlled L4 load balancer. The MCS discussion is meant to show one possible follow‑on benefit, not to imply that the feature only targets MCS or GKE. But yes, I’m not an MCS expert.
As I see, Cilium’s upstream MCS support has dedicated controllers that synthesize a “derived” Service and mirrored EndpointSlices to drive the datapath. The mcsAPIServiceReconciler creates a new Service annotated as global so ClusterMesh logic programs the VIP and backends. A companion mcsAPIEndpointSliceMirrorReconciler copies each local EndpointSlice into that derived Service. (https://github.com/cilium/cilium/blob/main/pkg/clustermesh/mcsapi/README.md)
Once the xDS L4 LB is in place, the MCS controller could translate aggregated ServiceImport state into standard Cluster and ClusterLoadAssignment resources instead of constructing derived Services and mirrored slices. The agent’s xDS client would program VIPs and endpoints directly, so the mentioned controllers above could be removed or greatly simplified.
There was a problem hiding this comment.
The MCS discussion is meant to show one possible follow‑on benefit, not to imply that the feature only targets MCS or GKE. But yes, I’m not an MCS expert. [...] Once the xDS L4 LB is in place, the MCS controller could translate aggregated ServiceImport state into standard Cluster and ClusterLoadAssignment resources
Yes sure I got that, but the CFP quote several time MCS as a possible future benefit but I am a bit skeptic on whether or not implementing MCS that way is actually a good choice.
My main question is assuming we want to have a native implementation of MCS-API that does not create derived Service (which is already a big question mark), why that would be easier to do that with a xDS server vs just making the cilum-agent start watching the ServiceImport resources and associated EndpointSlice? I don't really get why we would want to rely on xDS here, it seems to go against your goal of "unify different L4 service discovery mechanisms" if for instance load balacing for Service still rely on watching the kube-apiserver directly (+ the ClusterMesh controlplanes for Global Services not directly related to MCS) and that for MCS there would be some xDS server involved instead of watching kube-apiserver + the ClusterMesh controlplanes for other clusters?
There was a problem hiding this comment.
First of all, I'm pretty sure we don't know each other, so I don't like your tone.
I want to start by saying that I don’t share your view here. The suggestion to simply have the agent watch ServiceImport/EndpointSlice directly misses the larger architectural picture and, in my opinion, works against the goal of simplifying and unifying service discovery in Cilium.
-
One agent ingestion path, not three.
Today the agent already has a control-plane surface for Envoy/xDS (used for L7 and control components). If we translate ServiceImport/EndpointSlice to CDS/EDS once (in a controller), the agent consumes the same schema for all L4 sources (local Services, ClusterMesh/global services, MCS). That means one programming pipeline to eBPF LB state instead of separateService,MCS, andClusterMeshcodepaths in the agent. -
Protocol semantics you'd otherwise re-implement.
xDS gives you ACK/NACK, versioning, and delta (incremental) updates. Those are valuable under churn (endpoint scale-out/scale-in across clusters) and during rollouts. Doing raw informer watches in every agent means you have to home-grow back-pressure and consistency logic (or accept more flapping). With xDS, these semantics are built-in and widely tested. -
Less load on the kube-apiserver.
Withwatch directly in every agent, N agents watch MCS objects and each performs similar merges. With xDS, a central (or sharded) translator watches ServiceImport/EndpointSlice once, computes the desired VIP/backends, and fans out compact deltas to agents. This reduces apiserver fan-out and gives you a place to shard/cache if a fleet grows. -
Cleaner separation of concerns.
The producer (MCS translator) knows Kubernetes types. the consumer (agent) only knowsClustersandEndpoints(CDS/EDS). That decouples Kubernetes schema evolution from the agent. It also keeps the agent's L4 logic identical whether the source is a local Service, a ClusterMesh export, or MCS. -
Extensibility you’ll need later.
The moment you want per-cluster weights, locality hints, failover policies, or to ingest services from non-Kubernetes registries, you don’t touch the agent—only emit the same xDS resources. That’s exactly why xDS exists across service-mesh ecosystems.
I believe this makes the reasoning behind my proposal clear. I don't intend to keep going in circles on this point - my position is that xDS provides the unification and scalability properties we want, while duplicating watchers in the agent does not.
There was a problem hiding this comment.
First of all, I'm pretty sure we don't know each other, so I don't like your tone.
Sorry if it appeared that way, I am genuinely interested to know what you are proposing might (since it's a future goal) impact the ClusterMesh area and what you wrote here is very helpful to me to know what this is about.
One agent ingestion path, not three.
Today the agent already has a control-plane surface for Envoy/xDS (used for L7 and control components). If we translate ServiceImport/EndpointSlice to CDS/EDS once (in a controller), the agent consumes the same schema for all L4 sources (local Services, ClusterMesh/global services, MCS). That means one programming pipeline to eBPF LB state instead of separate Service, MCS, and ClusterMesh codepaths in the agent.
Ah ok this was the main point that I was missing! The CFP seems to suggest that the xDS server would be an optional thing though, but I am assuming that ideally you would want this to become the primary way to do load balancing in Cilium and eventually sunset the per agent Service/EndpointSlice watchers?
There was a problem hiding this comment.
These are very good questions!
I think one of the motivators was the feature request by some users at the Dev Summit for integration via xDS -- as this is a common control plane used in many places (I think this statement is still true excluding my own biases :-) ). It makes Cilium an even more useful infrastructure building block as brings capabilities that will be hard to replicate in K8s API infra such as dynamic load-balancing weights.
That being said, I am agreed that we don't want to create a partial solution in the project, so the some e2e solution will need to exist. I think the scope of such a thing will be something that requires discussion. My opinion is that the server should be mostly off-the-shelf and meet the basic user journeys, rather than trying to build an expansive feature at least at this point.
Whether or not MCS will reuse this mechanism -- I do know there are some drawbacks to the derived Service approach in that it causes extra load on the API server.
Regarding how we support this in the project -- I view this completing the experimental xDS client that exists today. We should have strict requirements that if this doesn't get usage, then we can remove it entirely to avoid paying a maintenance cost.
|
@tsotne95 -- if you aren't planning on working on this one, I would park this proposal for the time being. |
Thanks for checking in. Our immediate priorities have shifted, but It's definitely still on my radar, and I'll come back to it as soon as I can (this year). |
Add new CFP for xDS-controlled L4 LoadBalancer