IR-350: make metrics server tls configuration file-based#1297
IR-350: make metrics server tls configuration file-based#1297openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
Conversation
|
@ricardomaraschini: This pull request references IR-350 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
a36fb9c to
783b2c7
Compare
|
nicely done! /retest-required |
783b2c7 to
f4d7200
Compare
|
/hold This PR depends on openshift/cluster-version-operator#1322 |
| include.release.openshift.io/self-managed-high-availability: "true" | ||
| release.openshift.io/create-only: "true" | ||
| include.release.openshift.io/single-node-developer: "true" | ||
| config.openshift.io/inject-tls: "true" |
There was a problem hiding this comment.
Currently still WIP annotation
|
/retest |
f4d7200 to
8735826
Compare
|
/retest |
4a1f253 to
9089e15
Compare
9089e15 to
c6f82c1
Compare
|
We want to use this to test openshift/cluster-version-operator#1322 so let's unhold it. /unhold |
|
Nevermind, this may cause IPv6 issues. Putting it back on hold. /hold |
|
@kaleemsiddiqu Can we try to test this one on IPv6 ? What we want to guarantee is that this will deploy on an IPv6 cluster and that the metrics endpoint on the operator works. I did some investigations and I believe we are good with this one but we better to be 100% sure. |
|
/retest |
1 similar comment
|
/retest |
|
/retest |
6f53820 to
c87fa26
Compare
WalkthroughAdds controller configuration file support and mounts, refactors metrics to a TLS-capable Server using HTTPServingInfo, updates manifests to provide a ConfigMap, and adjusts tests and operator test helpers to accommodate configurable startup and TLS-enabled metrics testing. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/cluster-image-registry-operator/main.go`:
- Around line 61-63: Replace the permissive YAML unmarshal with a strict one so
unknown config fields fail fast: change the call site where
kubeyaml.Unmarshal(content, &config) is used (the config variable in main.go) to
use kubeyaml.UnmarshalStrict(content, &config) and handle the returned error the
same way so typos like bindAddress/minTLSVersion/cipherSuites are rejected;
ensure the import/usage matches the YAML package's UnmarshalStrict signature.
- Around line 45-60: The config default is IPv4-only and partial unmarshalling
can clear BindAddress; before reading/unmarshalling create and use a reusable
default GenericControllerConfig with ServingInfo.ServingInfo.BindAddress set to
the protocol-agnostic ":60000" (instead of "0.0.0.0:60000"), then if path == ""
return that default, and when unmarshaling into config (the variable config
*configv1.GenericControllerConfig) start from that default so unspecified fields
(like ServingInfo.ServingInfo.BindAddress) are preserved; update the code around
the path check, the creation of config, and the unmarshalling logic to use this
default instance.
In `@pkg/metrics/server.go`:
- Around line 68-75: Server.Run() currently starts ListenAndServeTLS in a
background goroutine and only logs errors, so callers cannot detect
bind/cert/port failures; modify Server.Run (and Server struct) to provide a
startup readiness mechanism: add a channel or WaitGroup/err return (e.g., a
readyChan or Start() that returns error) and have the goroutine send nil on
success or the startup error on failure (capture the error returned by
s.httpServer.ListenAndServeTLS). Update callers/tests to wait on the readiness
channel (or check the returned error) instead of sleeping so startup failures
are propagated deterministically. Ensure symbol references: Server.Run,
s.httpServer.ListenAndServeTLS, and any new ready channel/Start method you add.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: dbe45a7d-6e5e-4753-ab15-44e0c315151a
📒 Files selected for processing (11)
cmd/cluster-image-registry-operator/main.gocmd/cluster-image-registry-operator/main_test.gomanifests/07-operator-config.yamlmanifests/07-operator-ibm-cloud-managed.yamlmanifests/07-operator.yamlpkg/metrics/server.gopkg/metrics/server_test.gopkg/operator/bootstrap.gopkg/operator/bootstrap_test.gopkg/operator/controller_test.gotest/e2e/emptydir_test.go
updates the metrics server to read tls and serving configuration from a controller config file instead of using hardcoded values. this allows for dynamic configuration of bind address, minimum tls version, and cipher suites. this new flag is optional for now. by making it optional hypershift tests pass. to make it required a patch in hypershift is needed (to be done).
the aws bootstrap test was flaky, the test has a concurency issue with the operator sync(). this commit fixes.
c87fa26 to
1106102
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
cmd/cluster-image-registry-operator/main.go (1)
60-61:⚠️ Potential issue | 🟠 MajorReject unknown controller-config fields.
This parser now controls TLS policy, but permissive YAML unmarshalling will silently ignore typos in
bindAddress,minTLSVersion, orcipherSuitesand fall back to defaults.kubeyaml.UnmarshalStrictis safer here.Suggested change
- if err := kubeyaml.Unmarshal(content, config); err != nil { + if err := kubeyaml.UnmarshalStrict(content, config); err != nil { return nil, fmt.Errorf("failed to unmarshal config content: %w", err) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster-image-registry-operator/main.go` around lines 60 - 61, The config unmarshalling uses kubeyaml.Unmarshal which permissively ignores unknown fields; replace the call to kubeyaml.Unmarshal(content, config) with kubeyaml.UnmarshalStrict(content, config) and return the resulting error (same fmt.Errorf wrapping) so unknown/typoed controller-config fields (e.g., bindAddress, minTLSVersion, cipherSuites) cause a hard failure; update any imports or error handling if needed to accommodate UnmarshalStrict.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/metrics/server.go`:
- Around line 70-72: Run is currently listening on servingInfo.BindAddress but
the Kubernetes manifests still hardcode port 60000 for the metrics
container/service, so if BindAddress changes scraping will break; fix this by
plumbing the configured port through the deployment/service instead of
hardcoding 60000: extract the port from servingInfo.BindAddress (where listener
is created with net.Listen in Run), expose that port to the pod via the metrics
container's args or an env var, and update the Deployment/Service templates to
reference that value for containerPort and service port (or alternatively
enforce BindAddress to always use 60000); update the code that builds the
manifest/templates to substitute the configured port into the metrics container
and service spec so they remain in sync with servingInfo.BindAddress.
---
Duplicate comments:
In `@cmd/cluster-image-registry-operator/main.go`:
- Around line 60-61: The config unmarshalling uses kubeyaml.Unmarshal which
permissively ignores unknown fields; replace the call to
kubeyaml.Unmarshal(content, config) with kubeyaml.UnmarshalStrict(content,
config) and return the resulting error (same fmt.Errorf wrapping) so
unknown/typoed controller-config fields (e.g., bindAddress, minTLSVersion,
cipherSuites) cause a hard failure; update any imports or error handling if
needed to accommodate UnmarshalStrict.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6397d9ef-5df8-49e4-b48c-43bf88df88cd
📒 Files selected for processing (11)
cmd/cluster-image-registry-operator/main.gocmd/cluster-image-registry-operator/main_test.gomanifests/07-operator-config.yamlmanifests/07-operator-ibm-cloud-managed.yamlmanifests/07-operator.yamlpkg/metrics/server.gopkg/metrics/server_test.gopkg/operator/bootstrap.gopkg/operator/bootstrap_test.gopkg/operator/controller_test.gotest/e2e/emptydir_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
- test/e2e/emptydir_test.go
- pkg/operator/controller_test.go
- cmd/cluster-image-registry-operator/main_test.go
- pkg/operator/bootstrap.go
- pkg/operator/bootstrap_test.go
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
@ricardomaraschini: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/label px-approved |
|
/verified by @gangwgr |
|
@gangwgr: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@ricardomaraschini: This pull request references IR-350 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: flavianmissi, ricardomaraschini The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Even though this depends the openshift/cluster-version-operator#1322 it should not cause any issues without the latter. Let's get it in. /unhold |
updates the metrics server to read tls and serving configuration from a controller config file instead of using hardcoded values. this allows for dynamic configuration of bind address, minimum tls version, and cipher suites.