Skip to content

CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers#2086

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
ardaguclu:kms-mode
Feb 23, 2026
Merged

CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers#2086
openshift-merge-bot[bot] merged 2 commits intoopenshift:masterfrom
ardaguclu:kms-mode

Conversation

@ardaguclu
Copy link
Member

@ardaguclu ardaguclu commented Jan 23, 2026

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Substantial portion of the changed lines are unit and integration tests.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 23, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 23, 2026

@ardaguclu: This pull request references CNTRLPLANE-2241 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Most of the changes are obsessively added unit tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 23, 2026

@ardaguclu: This pull request references CNTRLPLANE-2241 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Substantial portion of the changed lines are unit tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ardaguclu ardaguclu changed the title CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 23, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2026
@openshift-ci-robot
Copy link

openshift-ci-robot commented Jan 23, 2026

@ardaguclu: This pull request references CNTRLPLANE-2241 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR adds the implementation proposed in openshift/enhancements#1900.

This PR introduces KMS as a new mode in encryption controllers to be functioning along side with the other modes. Basically, the idea is to track the KMS configuration to detect any configuration changes. So that key_controller can create new Secret to initiate migration. KMS Secret will not contain any confidential data. It will simply store the KMS configuration.

For Tech Preview v1, we will only support static unix domain socket (i.e. unix:///var/run/kmsplugin/kms.sock). Consequently, in this version, we don't support KMS -> KMS migrations.

Ref: Previous attemps:

Notes:

  • The changes here will be backported to 4.21
  • Substantial portion of the changed lines are unit and integration tests.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ardaguclu ardaguclu changed the title WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 23, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2026
@ardaguclu ardaguclu changed the title CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 23, 2026
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2026
@ardaguclu ardaguclu force-pushed the kms-mode branch 2 times, most recently from 377c3cb to 0b007f9 Compare January 27, 2026 07:41
@ardaguclu ardaguclu changed the title WIP: CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers CNTRLPLANE-2241: Add KMS encryption mode into library-go encryption controllers Jan 27, 2026
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 27, 2026
@ardaguclu
Copy link
Member Author

ardaguclu commented Jan 27, 2026

/hold
These PRs are prerequisite of this;

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2026
@ardaguclu
Copy link
Member Author

/uncc @dgrisonnet
/cc @benluddy @bertinatto @p0lyn0mial

@openshift-ci openshift-ci bot requested review from benluddy and bertinatto and removed request for dgrisonnet January 27, 2026 15:54
// Since in v1 we don't support kms -> kms migrations, we can use empty key.
// Because we don't need to compare secrets to detect change.
secret := crypto.ModeToNewKeyFunc[state.KMS]()
keyId := strings.Split(provider.KMS.Name, "-")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still wondering about encapsulation here.
I think other parts of the code won’t even be aware that the name has changed.

Only the code in encryptionconfig/config.go will encode and decode the new name, right?

Do we need to ensure that the name doesn’t already contain "-"?
For simplicity, we could add a comment in the key generation function describing this requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the code in encryptionconfig/config.go will encode and decode the new name, right?

That is correct. Rest of the code only deals with KeyState. They do not even know about the details of EncryptionConfiguration.

Do we need to ensure that the name doesn’t already contain "-"?

Kube-apiserver already fails, if it encounters a provider name without -. Because uniqueness has broken.

For simplicity, we could add a comment in the key generation function describing this requirement.

I've added a comment about the reasons behind this provider name generation mechanism. Please let me know your thoughts. Thanks.

func getKeyIDFromProviderName(providerName string) string {
// We just need to obtain the keyID to find our backed secret
// e.g. "1-secrets"
return strings.Split(providerName, "-")[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should this handle an unexpected provider name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it may not be possible, it is better to handle the invalid provider name gracefully. I've updated the code. Please let me know your thoughts.

},
})
case state.KMS:
// In order to preserve the uniqueness, we should insert resource name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there going to be side effects from this naming scheme visible in things like metric labels?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func createKMSProviderName(keyID, resource string) string {
// Ideally we should have used keyId simply in kms provider name.
// However, this is an upstream constraint that every provider name must be unique.
// To maintain uniqueness while still allowing access to the keyId, we generate provider name in this format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is at least less strange than smuggling identity names in phony AES-GCM provider configs. Later, I would be interested in coming up with alternatives that let the controllers correlate provider configs to their source secret without smuggling.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. It was error-prone and not nice. Since now we carry KMSConfigurations in backed Secrets, I think we won't need to carry anything in provider_names.

Comment on lines +361 to +362
// Moreover, we don't trigger new key when external reason is changed.
// Because it would lead to duplicate providers which is not allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, even if we did roll out a new encryption config in response to the "reason" update, we're not even capable of triggering a KEK rotation unless a specific provider happened to expose that capability independently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. That is correct.

Comment on lines 12 to 15
const (
DefaultEndpoint = "unix:///var/run/kmsplugin/kms.sock"
DefaultTimeout = 10 * time.Second
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a real need to export these constants? IMO, duplicating the values in tests is good because it requires that changes in observable behavior are accompanied by changes to test expectations. And other code (doing related things like setting up volume mounts) should be observing effective values rather than using hardcoded defaults.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason is to be accessed by key_controller. If we move these constants in there, we don't need to export them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just inline these in key_controller?

if currentMode == state.KMS {
ks.KMSConfiguration = &apiserverv1.KMSConfiguration{
APIVersion: "v2",
Name: fmt.Sprintf("%d", keyID), // this will be updated by inserting resource name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that the name will be changed only in the EncryptionConfiguration and will not be visible to the KeyState, is this comment still relevant?

maybe we should add a comment about the expected key format, which must include the keyID (at least for now)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment is not relevant anymore. I've removed it.

ks = state.KeyState{
Key: apiserverconfigv1.Key{
Name: keyId,
// Since in v1 we don't support kms -> kms migrations, we can use empty key.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand this comment.

For KMS the Secret doesn't matter and can/will always be empty - because the actual encryption key material is in the external KMS instance.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In v2, we'll have to compare this field (or KMSConfiguration annotation) to detect configurational changes. But it sounds like this comment creates confusions. So I've removed it.

@ardaguclu ardaguclu force-pushed the kms-mode branch 2 times, most recently from b19c772 to 5a7a2b2 Compare February 17, 2026 07:34
@ardaguclu
Copy link
Member Author

Unrelated?
/test unit

func getKeyIDFromProviderName(providerName string) (string, error) {
// We just need to obtain the keyID to find our backed secret
// e.g. "1_secrets"
parsed := strings.Split(providerName, "_")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use strings.SplitN ?

secret := CreateEncryptionKeySecretWithRawKeyWithMode(targetNS, grs, keyID, emptyKey, "KMS")
kmsConfig := &apiserverconfigv1.KMSConfiguration{
APIVersion: "v2",
Name: fmt.Sprintf("%d_%s", keyID, grs[0].Resource),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be fmt.Sprintf("%d", keyID) ?
the resource suffix is added later.

})
case state.KMS:
// In order to preserve the uniqueness, we should insert resource name
kmsCopy := key.KMSConfiguration.DeepCopy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check key.KMSConfiguration for nil ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is not possible, it makes sense gracefully handling the empty KMSConfiguration by skipping it with a message. Please let me know your thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would match the existing pattern. good idea.

}

if ks.KMSConfiguration != nil {
ks, err := json.Marshal(ks.KMSConfiguration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ks shadows the outer ks, we could rename to ksJSON

})
case state.KMS:
if key.KMSConfiguration == nil {
klog.Infof("skipping key %s in KMS mode as its KMSConfiguration is nil", key.Key.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add resource and index too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added resource name in the log.

Index is not the index in EncryptionConfiguration providers. It is an index in the Backed Secrets. We have keyID already as discriminator. Would you still want me to insert index name?

@ardaguclu
Copy link
Member Author

All tests pass on my local. According to the junit results, all the tests passed in the CI jobs too https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_library-go/2086/pull-ci-openshift-library-go-master-unit/2023755244520869888/artifacts/test/artifacts/junit_report.xml. It looks like a transient failure;
/test unit

return providers
}

func createKMSProviderName(keyID, resource string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment: as of today there a no restrictions on length for the provider name. the only restriction i found is that we are not allowed to use :

AESGCM Mode = "aesgcm"
SecretBox Mode = "secretbox" // available from the first release, see defaultMode below
Identity Mode = "identity" // available from the first release, see defaultMode below
KMS Mode = "KMS" // only supports KMS v2
Copy link
Member

@bertinatto bertinatto Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all the others are lowercase ("KMS")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this definition comes from https://github.com/openshift/api/blob/43090db3afe48e7dff0f6e39a866474d15cd0db5/config/v1/types_apiserver.go#L234 API. I agree that it should be lowercase, but we can't change it without API changes.

keyMode := state.Mode(s.Annotations[encryptionSecretMode])
switch keyMode {
case state.AESCBC, state.AESGCM, state.SecretBox, state.Identity:
case state.AESCBC, state.AESGCM, state.SecretBox, state.Identity, state.KMS:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also need to add a check to test if keyMode is KMS and key.KMSConfiguration is nil?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function (i.e. ToKeyState) is pure conversion from Secret to KeyState without any logic. The caller checks the emptiness of KMSConfiguration in https://github.com/openshift/library-go/pull/2086/changes#diff-9aecf0d61d4bb32f2825a75a37e02e2c1b612c000427f3f576d42853ac8d23aeR212-R215 (and silently ignores it). In my opinion, it would be better to leave the decision to user of KMSConfiguration (i.e. stateToProviders).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a check in here though. Once we add it, since ToKeyState is used by all controllers, any misconfiguration (KMS mode is enabled, but somehow KMSConfiguration is nil) will cause the failure of the controllers. On the other hand, this condition must not happen anyway. I'd like to hear opinions from @p0lyn0mial

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can be defensive and add additional checks. What kind of checks would we like to add ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like this;

	case state.KMS:
		if key.KMSConfiguration == nil {
			return state.KeyState{}, fmt.Errorf("KMSConfiguration can not be nil, when mode is KMS")
		}
		key.Mode = keyMode

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please let me know your opinions.

Copy link
Member

@bertinatto bertinatto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ardaguclu, bertinatto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 23, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2026

@ardaguclu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants