Skip to content

IR-350: make metrics server tls configuration file-based#1297

Merged
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
ricardomaraschini:ir-350-operator
Mar 19, 2026
Merged

IR-350: make metrics server tls configuration file-based#1297
openshift-merge-bot[bot] merged 2 commits intoopenshift:mainfrom
ricardomaraschini:ir-350-operator

Conversation

@ricardomaraschini
Copy link
Copy Markdown
Contributor

updates the metrics server to read tls and serving configuration from a controller config file instead of using hardcoded values. this allows for dynamic configuration of bind address, minimum tls version, and cipher suites.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 20, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Feb 20, 2026

@ricardomaraschini: This pull request references IR-350 which is a valid jira issue.

Details

In response to this:

updates the metrics server to read tls and serving configuration from a controller config file instead of using hardcoded values. this allows for dynamic configuration of bind address, minimum tls version, and cipher suites.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested a review from flavianmissi February 20, 2026 11:31
@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 20, 2026
@ricardomaraschini ricardomaraschini force-pushed the ir-350-operator branch 2 times, most recently from a36fb9c to 783b2c7 Compare February 20, 2026 11:59
@flavianmissi
Copy link
Copy Markdown
Member

nicely done!

/retest-required
/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2026
@openshift-ci openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 23, 2026
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/hold

This PR depends on openshift/cluster-version-operator#1322

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 23, 2026
include.release.openshift.io/self-managed-high-availability: "true"
release.openshift.io/create-only: "true"
include.release.openshift.io/single-node-developer: "true"
config.openshift.io/inject-tls: "true"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently still WIP annotation

Comment thread pkg/metrics/server.go Outdated
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

We want to use this to test openshift/cluster-version-operator#1322 so let's unhold it.

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2026
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

Nevermind, this may cause IPv6 issues. Putting it back on hold.

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2026
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

@kaleemsiddiqu Can we try to test this one on IPv6 ? What we want to guarantee is that this will deploy on an IPv6 cluster and that the metrics endpoint on the operator works.

I did some investigations and I believe we are good with this one but we better to be 100% sure.

Comment thread manifests/07-operator-config.yaml
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 17, 2026

Walkthrough

Adds controller configuration file support and mounts, refactors metrics to a TLS-capable Server using HTTPServingInfo, updates manifests to provide a ConfigMap, and adjusts tests and operator test helpers to accommodate configurable startup and TLS-enabled metrics testing.

Changes

Cohort / File(s) Summary
CLI & controller config
cmd/cluster-image-registry-operator/main.go, cmd/cluster-image-registry-operator/main_test.go
Adds --config flag and readAndParseControllerConfig(path) to load/validate YAML controller config (default bind :60000). Initializes metrics server from config.ServingInfo; tests cover valid, partial, invalid, and missing-file cases.
Manifests: operator config & mounts
manifests/07-operator-config.yaml, manifests/07-operator.yaml, manifests/07-operator-ibm-cloud-managed.yaml
Adds image-registry-operator-config ConfigMap and mounts it into operator pods; adds --files/--config args pointing to /var/run/configmaps/image-registry-operator-config/config.yaml.
Metrics server implementation & tests
pkg/metrics/server.go, pkg/metrics/server_test.go
Replaces port-based RunServer with Server type (NewServer, Run, Stop) that loads TLS certs and applies configv1.HTTPServingInfo (BindAddress, MinTLSVersion, CipherSuites). Tests updated to generate temp TLS certs and exercise TLS negotiation and handshake scenarios.
Operator bootstrap error handling
pkg/operator/bootstrap.go
Wraps returned errors in Bootstrap() with %w to preserve error chains for registry resources, platform storage, and registry config creation.
Operator test helpers / controller tests
pkg/operator/bootstrap_test.go, pkg/operator/controller_test.go
Changes start test helper signature to accept a boolean (startRunLoop / enable) to allow starting informers without launching reconciliation; updates call sites and tests accordingly.
E2E tests
test/e2e/emptydir_test.go
Updates log-file watch assertions to include /var/run/configmaps/image-registry-operator-config/config.yaml.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cmd/cluster-image-registry-operator/main.go`:
- Around line 61-63: Replace the permissive YAML unmarshal with a strict one so
unknown config fields fail fast: change the call site where
kubeyaml.Unmarshal(content, &config) is used (the config variable in main.go) to
use kubeyaml.UnmarshalStrict(content, &config) and handle the returned error the
same way so typos like bindAddress/minTLSVersion/cipherSuites are rejected;
ensure the import/usage matches the YAML package's UnmarshalStrict signature.
- Around line 45-60: The config default is IPv4-only and partial unmarshalling
can clear BindAddress; before reading/unmarshalling create and use a reusable
default GenericControllerConfig with ServingInfo.ServingInfo.BindAddress set to
the protocol-agnostic ":60000" (instead of "0.0.0.0:60000"), then if path == ""
return that default, and when unmarshaling into config (the variable config
*configv1.GenericControllerConfig) start from that default so unspecified fields
(like ServingInfo.ServingInfo.BindAddress) are preserved; update the code around
the path check, the creation of config, and the unmarshalling logic to use this
default instance.

In `@pkg/metrics/server.go`:
- Around line 68-75: Server.Run() currently starts ListenAndServeTLS in a
background goroutine and only logs errors, so callers cannot detect
bind/cert/port failures; modify Server.Run (and Server struct) to provide a
startup readiness mechanism: add a channel or WaitGroup/err return (e.g., a
readyChan or Start() that returns error) and have the goroutine send nil on
success or the startup error on failure (capture the error returned by
s.httpServer.ListenAndServeTLS). Update callers/tests to wait on the readiness
channel (or check the returned error) instead of sleeping so startup failures
are propagated deterministically. Ensure symbol references: Server.Run,
s.httpServer.ListenAndServeTLS, and any new ready channel/Start method you add.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dbe45a7d-6e5e-4753-ab15-44e0c315151a

📥 Commits

Reviewing files that changed from the base of the PR and between e000262 and c87fa26.

📒 Files selected for processing (11)
  • cmd/cluster-image-registry-operator/main.go
  • cmd/cluster-image-registry-operator/main_test.go
  • manifests/07-operator-config.yaml
  • manifests/07-operator-ibm-cloud-managed.yaml
  • manifests/07-operator.yaml
  • pkg/metrics/server.go
  • pkg/metrics/server_test.go
  • pkg/operator/bootstrap.go
  • pkg/operator/bootstrap_test.go
  • pkg/operator/controller_test.go
  • test/e2e/emptydir_test.go

Comment thread cmd/cluster-image-registry-operator/main.go Outdated
Comment thread cmd/cluster-image-registry-operator/main.go Outdated
Comment thread pkg/metrics/server.go Outdated
updates the metrics server to read tls and serving configuration from a
controller config file instead of using hardcoded values. this allows
for dynamic configuration of bind address, minimum tls version, and
cipher suites.

this new flag is optional for now. by making it optional hypershift
tests pass. to make it required a patch in hypershift is needed (to be
done).
the aws bootstrap test was flaky, the test has a concurency issue with
the operator sync(). this commit fixes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
cmd/cluster-image-registry-operator/main.go (1)

60-61: ⚠️ Potential issue | 🟠 Major

Reject unknown controller-config fields.

This parser now controls TLS policy, but permissive YAML unmarshalling will silently ignore typos in bindAddress, minTLSVersion, or cipherSuites and fall back to defaults. kubeyaml.UnmarshalStrict is safer here.

Suggested change
-	if err := kubeyaml.Unmarshal(content, config); err != nil {
+	if err := kubeyaml.UnmarshalStrict(content, config); err != nil {
 		return nil, fmt.Errorf("failed to unmarshal config content: %w", err)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/cluster-image-registry-operator/main.go` around lines 60 - 61, The config
unmarshalling uses kubeyaml.Unmarshal which permissively ignores unknown fields;
replace the call to kubeyaml.Unmarshal(content, config) with
kubeyaml.UnmarshalStrict(content, config) and return the resulting error (same
fmt.Errorf wrapping) so unknown/typoed controller-config fields (e.g.,
bindAddress, minTLSVersion, cipherSuites) cause a hard failure; update any
imports or error handling if needed to accommodate UnmarshalStrict.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/metrics/server.go`:
- Around line 70-72: Run is currently listening on servingInfo.BindAddress but
the Kubernetes manifests still hardcode port 60000 for the metrics
container/service, so if BindAddress changes scraping will break; fix this by
plumbing the configured port through the deployment/service instead of
hardcoding 60000: extract the port from servingInfo.BindAddress (where listener
is created with net.Listen in Run), expose that port to the pod via the metrics
container's args or an env var, and update the Deployment/Service templates to
reference that value for containerPort and service port (or alternatively
enforce BindAddress to always use 60000); update the code that builds the
manifest/templates to substitute the configured port into the metrics container
and service spec so they remain in sync with servingInfo.BindAddress.

---

Duplicate comments:
In `@cmd/cluster-image-registry-operator/main.go`:
- Around line 60-61: The config unmarshalling uses kubeyaml.Unmarshal which
permissively ignores unknown fields; replace the call to
kubeyaml.Unmarshal(content, config) with kubeyaml.UnmarshalStrict(content,
config) and return the resulting error (same fmt.Errorf wrapping) so
unknown/typoed controller-config fields (e.g., bindAddress, minTLSVersion,
cipherSuites) cause a hard failure; update any imports or error handling if
needed to accommodate UnmarshalStrict.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6397d9ef-5df8-49e4-b48c-43bf88df88cd

📥 Commits

Reviewing files that changed from the base of the PR and between c87fa26 and 1106102.

📒 Files selected for processing (11)
  • cmd/cluster-image-registry-operator/main.go
  • cmd/cluster-image-registry-operator/main_test.go
  • manifests/07-operator-config.yaml
  • manifests/07-operator-ibm-cloud-managed.yaml
  • manifests/07-operator.yaml
  • pkg/metrics/server.go
  • pkg/metrics/server_test.go
  • pkg/operator/bootstrap.go
  • pkg/operator/bootstrap_test.go
  • pkg/operator/controller_test.go
  • test/e2e/emptydir_test.go
🚧 Files skipped from review as they are similar to previous changes (5)
  • test/e2e/emptydir_test.go
  • pkg/operator/controller_test.go
  • cmd/cluster-image-registry-operator/main_test.go
  • pkg/operator/bootstrap.go
  • pkg/operator/bootstrap_test.go

Comment thread pkg/metrics/server.go
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

2 similar comments
@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 18, 2026

@ricardomaraschini: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/label px-approved
/label docs-approved

@openshift-ci openshift-ci Bot added px-approved Signifies that Product Support has signed off on this PR docs-approved Signifies that Docs has signed off on this PR labels Mar 19, 2026
@gangwgr
Copy link
Copy Markdown
Contributor

gangwgr commented Mar 19, 2026

/verified by @gangwgr

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 19, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@gangwgr: This PR has been marked as verified by @gangwgr.

Details

In response to this:

/verified by @gangwgr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 19, 2026

@ricardomaraschini: This pull request references IR-350 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@flavianmissi
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 19, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flavianmissi, ricardomaraschini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [flavianmissi,ricardomaraschini]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ricardomaraschini
Copy link
Copy Markdown
Contributor Author

Even though this depends the openshift/cluster-version-operator#1322 it should not cause any issues without the latter. Let's get it in.

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 19, 2026
@openshift-merge-bot openshift-merge-bot Bot merged commit 3b0b821 into openshift:main Mar 19, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. docs-approved Signifies that Docs has signed off on this PR jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. px-approved Signifies that Product Support has signed off on this PR verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants