Skip to content

(feat): Extra Valkey config#44

Merged
sandeepkunusoth merged 46 commits intovalkey-io:mainfrom
utdrmac:valkeyConfig
Apr 23, 2026
Merged

(feat): Extra Valkey config#44
sandeepkunusoth merged 46 commits intovalkey-io:mainfrom
utdrmac:valkeyConfig

Conversation

@utdrmac
Copy link
Copy Markdown
Contributor

@utdrmac utdrmac commented Jan 9, 2026

This feature request implements the ability for users to supply their own configuration for Valkey as an inline stanza when deploying a new cluster. The user configuration is appended to the required base configuration, which is created as a configMap. Since the operator manages the configMap, any changes to the config are reconciled. Additionally, users cannot override the required cluster settings.

This PR implements the ability for users to supply additional configuration for Valkey. Any provided parameters are appended to default configuration required by the operator

Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Comment thread config/samples/v1alpha1_valkeycluster.yaml Outdated
Comment thread internal/controller/valkeycluster_controller.go
Comment thread internal/controller/valkeycluster_controller.go Outdated
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Comment thread api/v1alpha1/valkeycluster_types.go
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Copy link
Copy Markdown
Member

@sandeepkunusoth sandeepkunusoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@sandeepkunusoth sandeepkunusoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint errors and e2e errors

Signed-off-by: utdrmac <matthew.boehm@percona.com>
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Jan 10, 2026

ugh. why didn't lint/fmt/vet catch those?

Signed-off-by: utdrmac <matthew.boehm@percona.com>
Copy link
Copy Markdown
Collaborator

@jdheyburn jdheyburn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, keen to hear opinions, otherwise LGTM

Comment thread api/v1alpha1/valkeycluster_types.go Outdated
Comment thread api/v1alpha1/valkeycluster_types.go Outdated
Comment thread internal/controller/valkeycluster_controller.go
Comment thread internal/controller/valkeycluster_controller.go Outdated
Comment thread internal/controller/valkeycluster_controller.go Outdated
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Jan 12, 2026

e2e still fails due to #43

Comment thread api/v1alpha1/valkeycluster_types.go Outdated
@sandeepkunusoth
Copy link
Copy Markdown
Member

sandeepkunusoth commented Jan 12, 2026

e2e still fails due to #43

no this is different issue than 43. its failing because of events limitation. maybe as u have added new emit event other events are not coming. https://github.com/valkey-io/valkey-operator/actions/runs/20936033780/job/60159472854?pr=44.
image

check this PR description #37. maybe you can update events check accordingly. currently we are not asserting for all events due to events rate limit.

@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Jan 12, 2026

I only added 1 new event which isn't triggered during the e2e. When I run e2e on my local kind cluster, I don't get that failure; I get the same failure as #43.

@sandeepkunusoth
Copy link
Copy Markdown
Member

sandeepkunusoth commented Jan 12, 2026

I only added 1 new event which isn't triggered during the e2e. When I run e2e on my local kind cluster, I don't get that failure; I get the same failure as #43.

no i don't think its related. you can check the issue description failure test case and logs are different for both of these errors. i have neven seen failure of valkey creation CR test case in main branch. if in case we are seeing, we need to create different issue.

this PR is failing this test "Manager when a ValkeyCluster CR is applied [It] creates a Valkey Cluster deployment"

43 issue is with other test "Manager when a ValkeyCluster experiences degraded state [It] should detect and recover when a deployment is deleted" https://github.com/valkey-io/valkey-operator/actions/runs/20806446577/job/59761668293

quick fix: we may need to remove assertions of some of the events.

g.Expect(output).To(ContainSubstring("ClusterMeet"), "ClusterMeet event should appear in describe")

bjosv
bjosv previously approved these changes Jan 14, 2026
Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bjosv
Copy link
Copy Markdown
Collaborator

bjosv commented Jan 15, 2026

Rebasing (or add merge commit of main) should fix the e2e test now.

Signed-off-by: utdrmac <matthew.boehm@percona.com>
Comment thread internal/controller/valkeycluster_controller.go
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Tolerations: cluster.Spec.Tolerations,
Exporter: cluster.Spec.Exporter,
Containers: cluster.Spec.Containers,
UsersConfigMapName: getConfigMapName(cluster.Name),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if UsersConfigMapName implies it is related to Users, in a similar way that UsersACLSecretName does. Or does Users mean the human deploying the operator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users, as the human deploying the operator. There are some config params that the operator sets, and others that the users would set.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation was that Users here meant for ACLs. If it is ambiguous maybe it should be removed?

Comment thread internal/controller/config.go Outdated
// This hash should be updated whenever the contents of either script changes, which would
// coincide with operator version bump.
// $ cat internal/controller/scripts/{liveness-check.sh,readiness-check.sh} | sha256sum
scriptsHash = "8531132f52ac311772dfcb45c107c34ab05e719a0df644cc332512277b564346"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there not a way we can have this calculated for us?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to not have the operator (re)calculating sha256 on static file contents every 30s. As for an "outside" solution, I could probably do something similar to how version numbers are done, by linking the value to the variable during compile time?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do other operators do this? Could we perform a faster check before the sha256?

has_changed = false
if len(current) != len(desired):
  has_changed = true
elif sha256(current) != sha256(desired):
  has_changed = true

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can speak for 3 of Percona's operators (mysql, pxc, mongo): The liveness, readiness, and health check scripts are part of the operator docker image. The image is used as init-container, where a separate entrypoint runs to copy the scripts into the running pod. The init container finishes, then the main container, using the same image with default entrypoint, launches as a regular container and the scripts are there to be used.
This has some benefits: 1) no checksum calculations, 2) nothing to 'embed', and 3) upgrades to the script become automatic new release/version of the operator.
Once we decide on if we will be making an operator-specific image (ie: for including modules), perhaps we could use a similar model/process?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work, but until we decide on that - could we instead calculate the sha256sum on operator boot? Given that the scripts would be coupled with the operator version, we would then be able to avoid calculating the sha on every reconcile. How does that sound?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds doable!

Comment thread internal/controller/config.go Outdated
var scripts embed.FS

func getConfigMapName(clusterName string) string {
return clusterName + "-config"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we prepend valkey- to this? I'm thinking in case there is a conflict and we can minimise that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. I'll push that shortly.


// Create or update a default valkey.conf
// If additional config is provided, append to the default map
func (r *ValkeyClusterReconciler) upsertConfigMap(ctx context.Context, cluster *valkeyiov1alpha1.ValkeyCluster) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started refactoring other upserts to use controllerutil.CreateOrUpdate, could the same be done here for consistency? It might help reduce some lines too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. I will look into it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this would help with the script-hash that is being set in the PR today.

Comment thread internal/controller/config.go
Comment thread internal/controller/config.go
Comment thread internal/controller/config.go Outdated

// Build the config
var configBuilder strings.Builder
configBuilder.Grow(len(specConfig) * averageParameterLength)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the use of Grow here is prematurely optimising, would it simplify the logic if we removed it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would simplify the logic by 1 line. I read that this pre-allocation was a simple memory optimization if the size of the eventual string was somewhat known.

utdrmac added 4 commits April 6, 2026 20:05
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Apr 9, 2026

Just an update, that I'm working on the minor changes requested above, but I'm getting golang crashes when building the docker image. make build works just fine, but make docker-build is not. I'm investigating what has recently changed, or if it's something on my end. EDIT: fixed. erase all / fresh install (turn it off then back on)

@jdheyburn
Copy link
Copy Markdown
Collaborator

I've deployed this locally, and works perfectly! @utdrmac there are some merge conflicts from the latest PR I merged - which was @hieu2102's PR adding operator and exporter user. If you clear these up then I think we can get this approved and merged.

Copy link
Copy Markdown
Collaborator

@bjosv bjosv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just added a nit for the record.
Just a rebase needed, then open comments can be addressed in separate PR.

// buildValkeyNodeConfigMap builds a ConfigMap containing the embedded liveness
// and readiness probe scripts, plus an empty valkey.conf.
// The ConfigMap is named after valkeyNodeResourceName(node).
// The ConfigMap is named via config.go:getConfigMapName(node).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove this line or change to GetServerConfigMapName, getConfigMapName was maybe used in a previous iteration.

utdrmac added 3 commits April 20, 2026 18:49
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Signed-off-by: utdrmac <matthew.boehm@percona.com>
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Apr 20, 2026

@sandeepkunusoth Please check my merge with your latest TLS, as the config is moved into a separate file.

Comment thread internal/controller/config.go Outdated
Signed-off-by: utdrmac <matthew.boehm@percona.com>
Comment thread internal/controller/config.go Outdated
Comment thread internal/controller/config.go
Comment thread Makefile Outdated
Signed-off-by: utdrmac <matthew.boehm@percona.com>
…eyConfig

Signed-off-by: Joseph Heyburn <jdheyburn@gmail.com>
@jdheyburn
Copy link
Copy Markdown
Collaborator

@utdrmac I rebased this branch from main, looks like its causing some e2e tests to fail - would you mind taking a look?

@sandeepkunusoth sandeepkunusoth merged commit 4f3e468 into valkey-io:main Apr 23, 2026
8 checks passed
@utdrmac
Copy link
Copy Markdown
Contributor Author

utdrmac commented Apr 24, 2026

🎉 Thanks everyone for helping, commenting, and testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants