Skip to content

OCPBUGS-65626: add service account to guard pod#2076

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
ehearne-redhat:add-sa-guard-pod
Mar 11, 2026
Merged

OCPBUGS-65626: add service account to guard pod#2076
openshift-merge-bot[bot] merged 1 commit intoopenshift:masterfrom
ehearne-redhat:add-sa-guard-pod

Conversation

@ehearne-redhat
Copy link
Copy Markdown
Contributor

@ehearne-redhat ehearne-redhat commented Jan 8, 2026

This change adds a bespoke service account to the guard pod. It also handles service account cleanup, and includes additional fields in tests and basic service account testing.

The reason for the change is that we should opt to use a bespoke service account rather than default.

@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jan 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@ehearne-redhat: This pull request references Jira Issue OCPBUGS-65626, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @wangke19

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2026
@ehearne-redhat ehearne-redhat changed the title [WIP] OCPBUGS-65626: add service account to guard pod OCPBUGS-65626: add service account to guard pod Jan 8, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@ehearne-redhat: This pull request references Jira Issue OCPBUGS-65626, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @wangke19

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This change adds a bespoke service account to the guard pod. It also handles service account cleanup, and includes additional fields in tests and basic service account testing.

The reason for the change is that we should opt to use a bespoke service account rather than default.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ehearne-redhat
Copy link
Copy Markdown
Contributor Author

@wangke19 @p0lyn0mial this PR is ready to review. Could you please take a look? :)

@p0lyn0mial
Copy link
Copy Markdown
Contributor

/assign @ingvagabund

@ingvagabund please take a look at this PR. I think you are the best person to take a look since you created the controller. Thanks!

Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go Outdated
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go Outdated
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go
@ehearne-redhat
Copy link
Copy Markdown
Contributor Author

Hey @ingvagabund - thanks so much for your review! I have amended the changes and added unit tests. Tests are passing locally.

Please take a look when you have the chance :)

Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go Outdated
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go Outdated
@ingvagabund
Copy link
Copy Markdown
Member

ingvagabund commented Feb 2, 2026

The PR looks good in overall. Thank you. Just few more nits.

@ingvagabund
Copy link
Copy Markdown
Member

Can you also please open evidence PRs for the corresponding operators as wel? KCM-o, KA-o, KS-o. To see the CI goes green to avoid any hidden corners.

@ehearne-redhat
Copy link
Copy Markdown
Contributor Author

@ingvagabund thanks so much for your review - I have completed the evidence PRs. I'll re-ping for review when the tests come back. :)

@ehearne-redhat
Copy link
Copy Markdown
Contributor Author

Hey @ingvagabund the evidence PR's openshift/cluster-kube-controller-manager-operator#905 , openshift/cluster-kube-apiserver-operator#2026 , and openshift/cluster-kube-scheduler-operator#610 are now ready to review for evidence. I have attached proof of guard pods using their own service accounts in each.

Please take a look when you have the chance. :)

@ingvagabund
Copy link
Copy Markdown
Member

Brilliant :) Thank you.
/lgtm

@ingvagabund
Copy link
Copy Markdown
Member

@p0lyn0mial for the final approval

@ingvagabund
Copy link
Copy Markdown
Member

/lgtm cancel

@ehearne-redhat can you please squash the commits before merging?

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 6, 2026
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 24, 2026
Comment thread pkg/operator/staticpod/controller/guard/manifests/guard-sa.yaml Outdated
Comment thread pkg/operator/staticpod/controller/guard/manifests/guard-pod-sa.yaml
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go Outdated
Comment thread pkg/operator/staticpod/controller/guard/manifests/guard-pod.yaml
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go Outdated
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go Outdated
@ehearne-redhat ehearne-redhat force-pushed the add-sa-guard-pod branch 2 times, most recently from 4cad71f to 5c9bf94 Compare February 25, 2026 13:25
pdbGetter: pdbGetter,
pdbLister: kubeInformersForTargetNamespace.Policy().V1().PodDisruptionBudgets().Lister(),
saGetter: saGetter,
saLister: saLister,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's take the lister from kubeInformersForTargetNamespace

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then let's add the informer to factory.New().WithInformers() below.

if errors.IsNotFound(err) {
_, _, err = resourceapply.ApplyServiceAccount(ctx, c.saGetter, syncCtx.Recorder(), serviceAccount)
if err != nil {
klog.Errorf("Unable to create service account %v for Guard Pods: %v", serviceAccount.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this ere be logged because we use WithSyncDegradedOnError ?
if yes, maybe we should skip the additional logging. (applies to the other places)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will remove the redundant klog.Errorf from my changes.

_, _, err = resourceapply.ApplyServiceAccount(ctx, c.saGetter, syncCtx.Recorder(), serviceAccount)
if err != nil {
klog.Errorf("Unable to create service account %v for Guard Pods: %v", serviceAccount.Name, err)
return fmt.Errorf("Unable to create service account %v for Guard Pods: %v", serviceAccount.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I think that in go in general errors should not be capitalized.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is true. I was just following the convention from the other errors. I will remove those capital starts.

Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go
Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go
Copy link
Copy Markdown
Contributor

@p0lyn0mial p0lyn0mial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for consistency we should change the existing code so that it uses lower-case for errors, uses %w to wrap errors, and doesn't log and return the same error. We could do that in a new commit for easier review.

Comment thread pkg/operator/staticpod/controllers.go Outdated
pdbClient := b.kubeClient.PolicyV1()
saClient := b.kubeClient.CoreV1()
operandInformers := b.kubeNamespaceInformers.InformersFor(b.operandNamespace)
saLister := operandInformers.Core().V1().ServiceAccounts().Lister()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we take the lister from kubeInformersForTargetNamespace.Core().V1().ServiceAccounts().Lister() this is not needed.

Comment thread pkg/operator/staticpod/controllers.go Outdated
podClient,
pdbClient,
saClient,
saLister,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we take the lister from kubeInformersForTargetNamespace.Core().V1().ServiceAccounts().Lister() this is not needed.

podGetter corev1client.PodsGetter,
pdbGetter policyclientv1.PodDisruptionBudgetsGetter,
saGetter corev1client.ServiceAccountsGetter,
saLister corelisterv1.ServiceAccountLister,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we take the lister from kubeInformersForTargetNamespace.Core().V1().ServiceAccounts().Lister() this is not needed.

if err == nil {
_, _, err = resourceapply.DeleteServiceAccount(ctx, c.saGetter, syncCtx.Recorder(), serviceAccount)
if err != nil {
klog.Errorf("unable to delete Service Account: %v", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use %w for err wrapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting this message when I try to use error wrapping
k8s.io/klog/v2.Errorf does not support error-wrapping directive %w

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this ere be logged because we use WithSyncDegradedOnError ?
if yes, maybe we should skip the additional logging. (applies to the other places

errs = append(errs, err)
}
} else if !apierrors.IsNotFound(err) {
klog.Errorf("unable to get service account %v from lister: %v", serviceAccount.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use %w for err wrapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here
k8s.io/klog/v2.Errorf does not support error-wrapping directive %w

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this ere be logged because we use WithSyncDegradedOnError ?
if yes, maybe we should skip the additional logging. (applies to the other places)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use fmt if we want to enhance the context and then add it to the errs

if errors.IsNotFound(err) {
_, _, err = resourceapply.ApplyServiceAccount(ctx, c.saGetter, syncCtx.Recorder(), serviceAccount)
if err != nil {
return fmt.Errorf("Unable to create service account %v for Guard Pods: %v", serviceAccount.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use %w for err wrapping.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I can do. :)

Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go
}
}

func TestGuardServiceAccountManifestStability(t *testing.T) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this test be updated to something like:

      sa := resourceread.ReadServiceAccountV1OrDie(serviceAccountTemplate)

      expected := &corev1.ServiceAccount{
          ObjectMeta: metav1.ObjectMeta{
              Name:   "guard-sa",
              Labels: map[string]string{"app": "guard"},
          },
      }

      if !equality.Semantic.DeepEqual(sa, expected) {
          t.Fatalf("guard-pod-sa.yaml manifest has changed. " +
              "The controller only creates the SA, it does not update existing ones. " +
              "If the manifest changed, add update logic and update the expected value in this test.")
      }

?

Comment thread pkg/operator/staticpod/controller/guard/guard_controller.go
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 3, 2026

@ehearne-redhat: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-encryption 1763302 link true /test e2e-aws-encryption

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ehearne-redhat ehearne-redhat force-pushed the add-sa-guard-pod branch 2 times, most recently from 6d44615 to 19a0412 Compare March 3, 2026 11:48
errs = append(errs, err)
}
} else if !apierrors.IsNotFound(err) {
klog.Errorf("unable to get service account %v from lister: %v", serviceAccount.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this ere be logged because we use WithSyncDegradedOnError ?
if yes, maybe we should skip the additional logging. (applies to the other places)

errs = append(errs, err)
}
} else if !apierrors.IsNotFound(err) {
klog.Errorf("unable to get service account %v from lister: %v", serviceAccount.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use fmt if we want to enhance the context and then add it to the errs

guardServiceAccount *corev1.ServiceAccount
createConditionalFunc func() (bool, bool, error)
withArbiter bool
expectSADeleted bool // true if service account should be deleted (cleanup mode)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rm the comment, not sure it is helpful.

if err == nil {
_, _, err = resourceapply.DeleteServiceAccount(ctx, c.saGetter, syncCtx.Recorder(), serviceAccount)
if err != nil {
klog.Errorf("unable to delete Service Account: %v", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this ere be logged because we use WithSyncDegradedOnError ?
if yes, maybe we should skip the additional logging. (applies to the other places

if errors.IsNotFound(err) {
_, _, err = resourceapply.ApplyServiceAccount(ctx, c.saGetter, syncCtx.Recorder(), serviceAccount)
if err != nil {
return fmt.Errorf("Unable to create service account %v for Guard Pods: %w", serviceAccount.Name, err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the go convention to start err with a lowercase letter ?

// The guard pod uses automountServiceAccountToken: false.
// There's nothing meaningful to update. So update logic isn't needed right now.
_, err = c.saLister.ServiceAccounts(serviceAccount.Namespace).Get(serviceAccount.Name)
if errors.IsNotFound(err) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use apierrors and remove"k8s.io/apimachinery/pkg/api/errors" from the imports.

if err == nil {
_, _, err = resourceapply.DeleteServiceAccount(ctx, c.saGetter, syncCtx.Recorder(), serviceAccount)
if err != nil {
klog.Errorf("unable to delete Service Account: %v", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other log msg use "service account" not "Service Account", let's be consistent.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this error log has been removed based on previous comment above. The consistency has been resolved anyway. :)

This change add a bespoke service account to a guard pod, and
introduces checks to ensure proper service account cleanup.

Tests are included to test behaviour of service account in different
scenarios. Test is included to ensure developers review code in case
of Service Account manifest change.
@p0lyn0mial
Copy link
Copy Markdown
Contributor

/approve

/hold
for the evidence PRs to pass

@openshift-ci openshift-ci bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 6, 2026
@ingvagabund
Copy link
Copy Markdown
Member

/lgtm

@ingvagabund
Copy link
Copy Markdown
Member

/hold cancel

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 11, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 11, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ehearne-redhat, ingvagabund, p0lyn0mial

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit ac826d1 into openshift:master Mar 11, 2026
4 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@ehearne-redhat: Jira Issue OCPBUGS-65626: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-65626 has been moved to the MODIFIED state.

Details

In response to this:

This change adds a bespoke service account to the guard pod. It also handles service account cleanup, and includes additional fields in tests and basic service account testing.

The reason for the change is that we should opt to use a bespoke service account rather than default.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants