-
Notifications
You must be signed in to change notification settings - Fork 6.8k
fix(cilium): add dynamic api server endpoint configuration #12624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
|
|
Welcome @r3m8! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: r3m8 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Hi @r3m8. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
tico88612
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you carefully review the existing implementation and integrate it with values.yaml? I don't think we need to maintain another task.
|
I will take the time to check this implementation. |
|
I think I've found the root cause of the issue. By default, it looks like the cilium However, the Cilium ds/rs pull the value from cm rs ds ds I think forcing the definition to 127.0.0.1 is the best solution given the current setup on both workers and control planes. Priority order for KUBERNETES_SERVICE_HOST:
And for KUBERNETES_SERVICE_PORT:
|
|
YAML's I tested both implementations, and they work. |
It's probably better to put it in the role vars/ folders instead. (set_fact is rarely justified, even if we have pervasive usage in Kubespray, most of it should go away) (Not expressing an opinion on the rest, haven't had the time to review more thoroughly) |
b96b763 to
31315df
Compare
Done. I've successfully rebased on upstream. Pre-commit tests are passing, I've squashed into 1 commit and properly renamed the PR. |
|
The config extra vars stuff is passed as a second values files to ciliumcli right ?
In that case you don't need to check it in the default value, since it will be override at helm level and the value rendered in the first value file will be irrelevant.
Also, I think you could use `kube_apiserver_global_endpoint` or whatever the name is, check a similar thing in #12598
|
f9d93ed to
1f225bd
Compare
Yes, Helm will override via the second values file anyway.
I agree. I replicated Calico's current configuration. I was able to launch the deployment, and it works. |
|
In this case, do you still prefer to use role vars/folders, or can we place this directly in templates/values.yaml.j2 like this ? Calico already use this pattern. |
|
Yeah, eschew the variables, the indirection only adds complexity for a simple expression like this.
|
|
It's done, we're good. |
|
/ok-to-test |
|
This seems sane to me, but I'm not deep in cilium. |
|
/cc I'll review later. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This pull request automatically configures Cilium's API server endpoint for high availability (HA) clusters. It patches the
KUBERNETES_SERVICE_HOSTandKUBERNETES_SERVICE_PORTenvironment variables in both cilium-operator Deployment and cilium DaemonSet to match the cluster's load balancer configuration.In HA clusters with multiple control planes, Cilium pods would hardcode the first control plane IP, causing complete network failure when that node shuts down. Even when users configured
cilium_config_extra_varswith the correct endpoint, the environment variables injected at pod creation time were not updated.After Cilium installation/upgrade, the playbook now automatically:
Configuration priority:
cilium_config_extra_vars.k8sServiceHost(highest priority)loadbalancer_apiserverif defined127.0.0.1(default for HA clusters with nginx-proxy)This makes HA Cilium clusters work correctly out-of-the-box without manual intervention.
Which issue(s) this PR fixes:
Fixes #12623
Special notes for your reviewer:
Playbook log on Gist
Does this PR introduce a user-facing change?: