-
Notifications
You must be signed in to change notification settings - Fork 0
Refresh kubeconfig and rancher API token every month #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| labels: | ||
| {{- include "anvilops.commonLabels" . | nindent 4 }} | ||
| spec: | ||
| schedule: "0 0 25 * *" # Run at 12:00AM on the 25th of each month |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the kubeconfig expires every 30 days, maybe this should run more frequently, just to be safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would result in extra kubeconfigs though. They wouldn't be mounted in the pod, but they would still be usable until they finally expire after the full 30 days.
Are you thinking we should make sure there's enough time between when the job refreshes the tokens and when they actually expire, in case something goes wrong in the job? We could move the CronJob schedule to the values.yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking we should make sure there's enough time between when the job refreshes the tokens and when they actually expire, in case something goes wrong in the job?
Yes, that was my concern. I found where the 30 days comes from:
- https://ranchermanager.docs.rancher.com/api/api-tokens#kubeconfig-default-token-ttl-minutes
- on Anvil: https://composable.anvil.rcac.purdue.edu/v3/settings/kubeconfig-default-token-ttl-minutes
Good idea to make it customizable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would result in extra kubeconfigs though. They wouldn't be mounted in the pod, but they would still be usable until they finally expire after the full 30 days.
We can consider this resolved if you want, but if I'm understanding correctly, it looks like you can specify a TTL on a Kubeconfig and delete it before it expires using the new API:
- Create: https://ranchermanager.docs.rancher.com/api/api-reference#tag/extCattleIo_v1/operation/createExtCattleIoV1Kubeconfig (TTL in .spec in request body)
- Delete by name: https://ranchermanager.docs.rancher.com/api/api-reference#tag/extCattleIo_v1/operation/deleteExtCattleIoV1Kubeconfig
So a theoretical flow could be: create a new Kubeconfig, update secrets, restart AnvilOps, wait for the rollout to finish, delete the old Kubeconfig. We could do it every week, or have the job run every day and check if we've passed a certain percentage of the kubeconfig-default-token-ttl-minutes and refresh if we have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we could do that. At first I wasn't sure how long until it was safe to delete the old tokens.
I updated the job to delete the old tokens when updatedReplicas === replicas.
- Remove unused KubeConfig from rancher.ts - Fix exit status for missing rancher information - Add refreshTokens and refreshSchedule to values.yaml
|
Do you think that we could make the It would make the installation experience a bit easier. I'm good with merging now and we can put it on the backlog if you want. |
The secret containing the kubeconfig and refresh information is now required to be named kube-auth. Update documentation on what keys to set when using a kubeconfig. Fix bugs in rotateRancherCredentials.ts
|
That sounds easier. |
A kubeconfig generated from the Rancher API expires in 30 days, while a rancher token expires in up to 90 days.
Changes
I was thinking of switching to using a ServiceAccount for the downstream cluster, since Kubernetes could automatically rotate the token, and using anvilops_svc only for querying the Rancher API. However, I don't think it's possible to give a ServiceAccount the project-wide permissions needed to manage the sandbox. This approach turned out to be simpler.