Skip to content

Conversation

@skkra0
Copy link
Collaborator

@skkra0 skkra0 commented Jan 19, 2026

A kubeconfig generated from the Rancher API expires in 30 days, while a rancher token expires in up to 90 days.

Changes

  • Add a flag to the values.yaml to enable Rancher integrations.
  • If enabled, create a CronJob that requests a new Rancher token and patches it into the rancher-config secret.
    • If a kubeconfig is provided to AnvilOps via secret, also request a new kubeconfig and patch it in.
    • Finally, restart the deployment.

I was thinking of switching to using a ServiceAccount for the downstream cluster, since Kubernetes could automatically rotate the token, and using anvilops_svc only for querying the Rancher API. However, I don't think it's possible to give a ServiceAccount the project-wide permissions needed to manage the sandbox. This approach turned out to be simpler.

@skkra0 skkra0 requested a review from FluxCapacitor2 January 19, 2026 19:02
labels:
{{- include "anvilops.commonLabels" . | nindent 4 }}
spec:
schedule: "0 0 25 * *" # Run at 12:00AM on the 25th of each month
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the kubeconfig expires every 30 days, maybe this should run more frequently, just to be safe?

Copy link
Collaborator Author

@skkra0 skkra0 Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would result in extra kubeconfigs though. They wouldn't be mounted in the pod, but they would still be usable until they finally expire after the full 30 days.
Are you thinking we should make sure there's enough time between when the job refreshes the tokens and when they actually expire, in case something goes wrong in the job? We could move the CronJob schedule to the values.yaml.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking we should make sure there's enough time between when the job refreshes the tokens and when they actually expire, in case something goes wrong in the job?

Yes, that was my concern. I found where the 30 days comes from:

Good idea to make it customizable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would result in extra kubeconfigs though. They wouldn't be mounted in the pod, but they would still be usable until they finally expire after the full 30 days.

We can consider this resolved if you want, but if I'm understanding correctly, it looks like you can specify a TTL on a Kubeconfig and delete it before it expires using the new API:

So a theoretical flow could be: create a new Kubeconfig, update secrets, restart AnvilOps, wait for the rollout to finish, delete the old Kubeconfig. We could do it every week, or have the job run every day and check if we've passed a certain percentage of the kubeconfig-default-token-ttl-minutes and refresh if we have.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we could do that. At first I wasn't sure how long until it was safe to delete the old tokens.
I updated the job to delete the old tokens when updatedReplicas === replicas.

- Remove unused KubeConfig from rancher.ts
- Fix exit status for missing rancher information
- Add refreshTokens and refreshSchedule to values.yaml
@FluxCapacitor2
Copy link
Collaborator

Do you think that we could make the anvilops.serviceAccount.secretName value and corresponding secret optional if Rancher support is enabled? This job could use the Rancher token to generate a kubeconfig and then use that kubeconfig to update the Rancher token secret and create/update the Kubeconfig secret.

It would make the installation experience a bit easier. I'm good with merging now and we can put it on the backlog if you want.

The secret containing the kubeconfig and refresh information is now required to be named kube-auth.
Update documentation on what keys to set when using a kubeconfig.
Fix bugs in rotateRancherCredentials.ts
@skkra0
Copy link
Collaborator Author

skkra0 commented Jan 20, 2026

That sounds easier.
I added a job that patches in a kube-config when installing("helm.sh/hook": pre-install).
When using a kubeconfig, if we fetch the kubeconfig on install instead of the administrator providing it, then the kubeconfig key doesn't need to be set. However, the secret is still required in this case because we need the cluster-id and the use-cluster-name to get a kubeconfig.
We already require some secrets to have specific names, so I changed the kubeconfig-related secret to just be kube-auth. So now this secret is required if .anvilops.serviceAccount.useKubeconfig is true.

@skkra0 skkra0 merged commit 2dc7f5a into main Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants