CNTRLPLANE-1137: Add support for external CNI provider Cilium#7423
CNTRLPLANE-1137: Add support for external CNI provider Cilium#7423openshift-merge-bot[bot] merged 4 commits intoopenshift:mainfrom
Conversation
|
@wewang58: This pull request references CNTRLPLANE-1137 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the
WalkthroughAdds Cilium v1.14.5 manifests and E2E support: CRD and deployment manifests, connectivity workloads, test utilities for installing/validating Cilium, framework wiring for an external CNI provider and asset reader, and a new Cilium connectivity test plus small test harness updates. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes ✨ Finishing touches🧪 Generate unit tests (beta)
Tip 🧪 Unit Test Generation v2 is now available!We have significantly improved our unit test generation capabilities. To enable: Add this to your reviews:
finishing_touches:
unit_tests:
enabled: trueTry it out by using the Have feedback? Share your thoughts on our Discord thread! Comment |
|
@heliubj18 I add all templates to assets, can you help to review it, thanks |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (6)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml (1)
61-64: Consider adding resource limits for completeness.The deployment specifies resource requests but no limits. While acceptable for e2e test infrastructure, adding limits would prevent unbounded resource consumption if the operator misbehaves.
🔎 Proposed enhancement
resources: requests: cpu: 100m memory: 512Mi + limits: + cpu: 500m + memory: 1Gitest/e2e/util/util.go (1)
4198-4211: Consider adding a timeout for HTTP requests.
http.Getuses the default HTTP client which has no timeout. In flaky network conditions, this could cause tests to hang indefinitely.🔎 Proposed fix to add HTTP timeout
+ httpClient := &http.Client{ + Timeout: 30 * time.Second, + } + if strings.HasPrefix(pathOrURL, "http://") || strings.HasPrefix(pathOrURL, "https://") { // Download YAML content - resp, err := http.Get(pathOrURL) + resp, err := httpClient.Get(pathOrURL) if err != nil { return fmt.Errorf("failed to download manifest from %s: %w", pathOrURL, err) }test/e2e/util/cilium.go (4)
161-161: UseCiliumNamespaceconstant for consistency.The constant
CiliumNamespaceis defined but a hardcoded string"cilium"is used here.🔎 Proposed fix
- err := guestClient.Get(ctx, crclient.ObjectKey{Name: "cilium"}, ciliumNs) + err := guestClient.Get(ctx, crclient.ObjectKey{Name: CiliumNamespace}, ciliumNs)
320-333: Consider adding parentheses for clarity.The condition logic is correct due to Go's operator precedence (
&&before||), but explicit parentheses would improve readability.🔎 Proposed fix
- if ds.Name == "cilium" || strings.HasPrefix(ds.Name, "cilium-") && !strings.Contains(ds.Name, "operator") { + if ds.Name == "cilium" || (strings.HasPrefix(ds.Name, "cilium-") && !strings.Contains(ds.Name, "operator")) {
470-493: Fixed sleep duration may be fragile.Using
time.Sleepwith a fixed duration doesn't adapt to varying cluster conditions. Consider polling for specific success indicators if the connectivity test provides them, or document why a fixed duration is necessary.
556-558: Redundant helper function - useptr.Toinstead.The
ptrBoolhelper is redundant sincek8s.io/utils/ptris already imported (line 28) andptr.To[bool](false)or simplyptr.To(false)can be used.🔎 Proposed fix
Remove
ptrBoolfunction and update the call site:- AllowPrivilegeEscalation: ptrBool(false), + AllowPrivilegeEscalation: ptr.To(false),And remove lines 556-558:
-func ptrBool(b bool) *bool { - return &b -}
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (20)
test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yamltest/e2e/assets/cilium/v1.14.5/connectivity-check.yamltest/e2e/cilium_network_policy_test.gotest/e2e/e2e_test.gotest/e2e/util/cilium.gotest/e2e/util/hypershift_framework.gotest/e2e/util/options.gotest/e2e/util/util.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yamltest/e2e/util/options.gotest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yamltest/e2e/util/hypershift_framework.gotest/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yamltest/e2e/assets/cilium/v1.14.5/connectivity-check.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yamltest/e2e/e2e_test.gotest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yamltest/e2e/cilium_network_policy_test.gotest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yamltest/e2e/util/util.gotest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yamltest/e2e/util/cilium.gotest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml
🧬 Code graph analysis (4)
test/e2e/util/options.go (1)
api/hypershift/v1beta1/hostedcluster_types.go (6)
NonePlatform(1191-1191)AWSPlatform(1188-1188)KubevirtPlatform(1200-1200)AzurePlatform(1203-1203)PowerVSPlatform(1206-1206)OpenStackPlatform(1209-1209)
test/e2e/util/hypershift_framework.go (5)
test/e2e/util/external_oidc.go (1)
ExtOIDCConfig(53-70)api/hypershift/v1beta1/hostedcluster_types.go (3)
NonePlatform(1191-1191)NetworkType(1166-1166)Other(1179-1179)support/util/visibility.go (1)
IsPrivateHC(34-44)test/e2e/util/util.go (1)
WaitForGuestClient(325-373)test/e2e/util/cilium.go (1)
InstallCilium(151-342)
test/e2e/cilium_network_policy_test.go (5)
api/hypershift/v1beta1/hostedcluster_types.go (1)
AzurePlatform(1203-1203)test/e2e/util/hypershift_framework.go (1)
NewHypershiftTest(66-78)support/azureutil/azureutil.go (1)
IsAroHCP(217-219)test/e2e/util/util.go (1)
WaitForGuestClient(325-373)test/e2e/util/cilium.go (1)
EnsureCiliumConnectivityTestResources(346-500)
test/e2e/util/cilium.go (3)
support/azureutil/azureutil.go (1)
IsAroHCP(217-219)test/e2e/util/util.go (3)
ApplyYAML(4194-4264)WaitForDeploymentAvailable(4163-4175)WaitForDaemonSetReady(4178-4190)support/util/util.go (1)
DeleteIfNeeded(142-144)
🪛 Checkov (3.2.334)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml
[medium] 1-75: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-75: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 1-75: Containers should not share the host network namespace
(CKV_K8S_19)
test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
[medium] 3-58: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 3-58: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 59-115: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 59-115: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 116-180: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 116-180: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 116-180: Containers should not share the host network namespace
(CKV_K8S_19)
[medium] 181-236: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 181-236: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 237-292: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 237-292: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 293-338: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 293-338: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 339-394: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 339-394: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 395-450: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 395-450: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 451-516: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 451-516: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 517-582: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 517-582: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 583-649: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 583-649: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 583-649: Containers should not share the host network namespace
(CKV_K8S_19)
[medium] 650-716: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 650-716: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 650-716: Containers should not share the host network namespace
(CKV_K8S_19)
[medium] 717-782: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 717-782: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 783-848: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 783-848: Minimize the admission of root containers
(CKV_K8S_23)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml
[medium] 1-82: Minimize wildcard use in Roles and ClusterRoles
(CKV_K8S_49)
🪛 golangci-lint (2.5.0)
test/e2e/util/cilium.go
[error] 228-228: : # github.com/openshift/hypershift/sync-global-pullsecret [github.com/openshift/hypershift/sync-global-pullsecret.test]
sync-global-pullsecret/sync-global-pullsecret_test.go:228:23: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:234:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:247:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:257:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:270:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:283:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:296:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:309:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:327:12: undefined: NewMockdbusConn
(typecheck)
🔇 Additional comments (26)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml (1)
1-13: LGTM!The RoleBinding correctly binds the cilium-olm ServiceAccount to the cilium-olm Role within the cilium namespace. The manifest structure is valid and follows Kubernetes RBAC conventions.
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yaml (1)
1-13: LGTM!The RoleBinding correctly enables leader election for the cilium-olm ServiceAccount by binding it to the leader-election Role within the cilium namespace. The manifest structure is valid.
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yaml (1)
1-12: LGTM! Note AI summary discrepancy.The ClusterRoleBinding correctly references the ClusterRole
cilium-cilium-olm(defined in cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml) and binds it to the cilium-olm ServiceAccount. The manifest structure is valid.Note: The AI summary incorrectly states this binds ClusterRole "cilium-cilium", but the code correctly references "cilium-cilium-olm".
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml (1)
1-11: LGTM!The Namespace configuration is appropriate for a system-level CNI namespace:
- Empty node-selector allows flexible pod scheduling
- Cluster logging and monitoring labels enable observability
- Run-level "0" indicates this is an early-lifecycle system namespace
The manifest structure is valid and follows OpenShift conventions.
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml (1)
14-28: This ClusterRole is sourced from the upstreamisovalent/olm-for-ciliumproject and follows the expected pattern for Helm operators, which require broad RBAC permissions to orchestrate application deployment and lifecycle management. The broad CRUD permissions on RBAC resources (roles, rolebindings, clusterroles, clusterrolebindings) are necessary for cilium-olm to create and manage Cilium's own RBAC requirements during cluster setup. Since this is a test fixture derived from upstream, the scope and permissions are already vetted by the Cilium project.test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml (1)
1-7: LGTM!The ServiceAccount manifest follows standard Kubernetes conventions and is correctly structured for use by the cilium-olm deployment.
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yaml (1)
1-14: LGTM!The Service manifest is correctly structured with appropriate port mapping (443→9443) for the cilium-olm component. The selector properly targets pods with the
name: cilium-olmlabel.test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yaml (1)
1-24: LGTM!The leader election Role follows standard Kubernetes patterns with appropriate permissions for managing ConfigMap-based leader election leases and creating events.
test/e2e/util/options.go (2)
94-95: LGTM!The
ExternalCNIProviderfield is properly documented and follows the existing pattern for optional feature flags in the Options struct.
206-212: LGTM!The
ExternalCNIProviderfield is correctly propagated toPlatformAgnosticOptions, maintaining consistency with how other platform-specific options are handled inDefaultClusterOptions.test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml (1)
1-82: LGTM - Standard Cilium operator RBAC.The wildcard permissions on
cilium.ioresources and CRDs are expected for a CNI operator that needs to manage its own custom resources. The remaining rules are appropriately scoped. This aligns with upstream Cilium installation patterns and is acceptable for e2e test infrastructure.test/e2e/cilium_network_policy_test.go (1)
27-52: LGTM - Well-structured test with proper skip conditions and cleanup.The test correctly:
- Runs in parallel
- Skips appropriately when not on Azure platform or when Cilium isn't the external CNI provider
- Performs ARO HCP environment check inside the test harness
- Defers cleanup of connectivity test resources
test/e2e/util/hypershift_framework.go (1)
51-52: LGTM - New fields extend platform options appropriately.The
ExternalCNIProviderfield follows the existing pattern established byExtOIDCConfig.test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml (1)
1-41: LGTM - Standard CiliumConfig CRD definition.The CRD follows proper structure with appropriate use of
x-kubernetes-preserve-unknown-fieldsfor the operator-managed spec and status fields.test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml (2)
1-57: Note: Unusual YAML document ordering is valid but unconventional.The
apiVersionandkindfields appear at the end of each document rather than the beginning. While valid YAML (field order doesn't matter), this is atypical and may confuse readers. Since this is marked as auto-generated from Cilium upstream, modifying it may complicate future updates.
116-180: Host networking is intentional for connectivity test scenarios.The
hostNetwork: truein echo-b-host and similar deployments is required for testing host-to-pod and cross-node connectivity scenarios. The static analysis warnings about hostNetwork and security contexts can be safely ignored for these e2e test assets.test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml (1)
1-12: LGTM - Properly binds ClusterRole to ServiceAccount.The binding correctly references the
cilium-ciliumClusterRole and thecilium-olmServiceAccount in theciliumnamespace, consistent with other manifests in this PR.test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml (1)
68-72: Host networking and broad tolerations are appropriate for CNI operators.The
hostNetwork: trueis required for CNI operators to configure the node's network stack. The broad toleration (operator: Exists) ensures the operator can run on control plane nodes if needed. Static analysis warnings can be safely ignored for this use case.test/e2e/e2e_test.go (1)
93-93: LGTM!The new CLI flag follows the established pattern for e2e test options and integrates cleanly with the existing flag definitions.
test/e2e/util/util.go (3)
4162-4175: LGTM!The helper function follows the established gomega Eventually pattern used elsewhere in this file and provides a clean interface for waiting on deployment readiness.
4177-4190: LGTM!The helper function correctly implements the wait pattern for DaemonSet readiness, consistent with
WaitForDeploymentAvailable.
4220-4264: LGTM!The YAML decoding and Server-Side Apply logic is well-implemented with proper handling of multi-document YAMLs, namespace scoping via RESTMapper, and clear error messages.
test/e2e/util/cilium.go (4)
33-63: LGTM!Constants are well-organized with clear naming and documentation. Reading
CiliumVersionfrom environment allows flexibility in CI configurations.
65-147: LGTM!The manifest helper functions are well-structured with appropriate PodSecurity labels and SCC capabilities for Cilium. The local/remote manifest fallback pattern provides good flexibility for testing.
502-528: LGTM!Good defensive coding with sensible defaults and graceful error handling. The IPv4 detection logic via
strings.Contains(clusterNet.CIDR, ".")is simple and effective.
560-629: LGTM!The
CreateCiliumConfigfunction creates a comprehensive Cilium configuration with appropriate settings for OpenShift integration. Debug mode being enabled is reasonable for e2e test observability.
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - secrets | ||
| verbs: | ||
| - '*' | ||
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - serviceaccounts | ||
| - configmaps | ||
| - secrets | ||
| - services | ||
| verbs: | ||
| - '*' | ||
| - apiGroups: | ||
| - apps | ||
| resources: | ||
| - deployments | ||
| - daemonsets | ||
| verbs: | ||
| - '*' | ||
| - apiGroups: | ||
| - monitoring.coreos.com | ||
| resources: | ||
| - servicemonitors | ||
| verbs: | ||
| - '*' |
There was a problem hiding this comment.
Review the use of wildcard RBAC permissions.
Multiple rules use wildcard verbs (*), granting full control over secrets, serviceaccounts, configmaps, services, deployments, daemonsets, and servicemonitors within the cilium namespace. While operators typically require elevated permissions, wildcards are overly permissive and violate the principle of least privilege.
Consider explicitly listing only the required verbs (e.g., get, list, watch, create, update, patch, delete) rather than using *. This reduces the attack surface if the operator is compromised.
Note: secrets appears in two separate rules (lines 42 and 50), which is redundant.
🤖 Prompt for AI Agents
In
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml
around lines 39–66, several RBAC rules use wildcard verbs ("*") and secrets is
duplicated; replace each "*" with the minimal explicit verbs the operator
actually needs (for example get, list, watch, create, update, patch, delete as
applicable per resource) and remove the duplicate secrets entry by consolidating
permissions into a single rule for secrets; ensure each apiGroup/resources block
only lists the precise verbs required for those resources to follow
least-privilege.
| guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster) | ||
| InstallCilium(t, context.Background(), guestClient, hostedCluster) | ||
| // wait hosted cluster ready | ||
| WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform) | ||
| WaitForImageRollout(t, context.Background(), h.client, hostedCluster) | ||
| ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute) |
There was a problem hiding this comment.
Use h.ctx instead of context.Background() for proper cancellation propagation.
The test framework already has h.ctx which is properly configured with cancellation. Using context.Background() here means these operations won't respect test cancellation or timeouts, potentially leaving resources orphaned if the test is aborted.
🔎 Proposed fix
if opts.NodePoolReplicas == 0 {
t.Fatal("NodePool replicas must be positive for Cilium to install.")
}
- guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster)
- InstallCilium(t, context.Background(), guestClient, hostedCluster)
+ guestClient := WaitForGuestClient(t, h.ctx, h.client, hostedCluster)
+ InstallCilium(t, h.ctx, guestClient, hostedCluster)
// wait hosted cluster ready
- WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform)
- WaitForImageRollout(t, context.Background(), h.client, hostedCluster)
- ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute)
+ WaitForNReadyNodes(t, h.ctx, guestClient, opts.NodePoolReplicas, platform)
+ WaitForImageRollout(t, h.ctx, h.client, hostedCluster)
+ ValidateHostedClusterConditions(t, h.ctx, h.client, hostedCluster, true, 10*time.Minute)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster) | |
| InstallCilium(t, context.Background(), guestClient, hostedCluster) | |
| // wait hosted cluster ready | |
| WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform) | |
| WaitForImageRollout(t, context.Background(), h.client, hostedCluster) | |
| ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute) | |
| guestClient := WaitForGuestClient(t, h.ctx, h.client, hostedCluster) | |
| InstallCilium(t, h.ctx, guestClient, hostedCluster) | |
| // wait hosted cluster ready | |
| WaitForNReadyNodes(t, h.ctx, guestClient, opts.NodePoolReplicas, platform) | |
| WaitForImageRollout(t, h.ctx, h.client, hostedCluster) | |
| ValidateHostedClusterConditions(t, h.ctx, h.client, hostedCluster, true, 10*time.Minute) |
🤖 Prompt for AI Agents
In test/e2e/util/hypershift_framework.go around lines 150 to 155, several calls
use context.Background() which bypasses the framework's cancellation; replace
each context.Background() with h.ctx (i.e., use h.ctx for WaitForGuestClient,
InstallCilium, WaitForNReadyNodes, WaitForImageRollout, and
ValidateHostedClusterConditions) so the operations inherit the test's context
and proper cancellation/timeouts.
There was a problem hiding this comment.
Actionable comments posted: 1
Fix all issues with AI Agents 🤖
In @test/e2e/util/util.go:
- Around line 4198-4211: The HTTP download uses the default client (http.Get)
without a timeout which can hang; replace the http.Get(pathOrURL) call with a
dedicated http.Client having a reasonable Timeout (e.g., 10s) and call
client.Get(pathOrURL) instead, then keep the existing resp.Body.Close, status
code check and io.ReadAll handling; locate the download block around the
http.Get usage (the pathOrURL/http.Get/yamlContent section) and swap in the
timeout-capable client to ensure the request fails fast on unresponsive hosts.
♻️ Duplicate comments (1)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yaml (1)
39-66: Wildcard RBAC and duplicatesecretsentry already flagged.This concern was raised in a previous review. The wildcard verbs and duplicate
secretsresource (Lines 42 and 50) should be addressed per the earlier feedback.
🧹 Nitpick comments (7)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml (1)
17-72: Consider adding a security context to restrict privilege escalation.The deployment lacks a
securityContextto explicitly disable privilege escalation. WhilehostNetwork: trueand the broad toleration are necessary for CNI operators, you can still harden the container:containers: - name: operator securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: trueSince this is test infrastructure for e2e validation, this is optional but would align with security best practices flagged by static analysis.
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml (1)
5-17: Wildcard permissions on CRDs are broader than necessary.The wildcard verbs (
*) oncilium.io/*resources are acceptable for the Cilium operator. However, the wildcard oncustomresourcedefinitions(Lines 12-17) grants ability to create/delete any CRD in the cluster, not just Cilium's.For test infrastructure this is acceptable, but in production environments consider restricting to specific CRD names or limiting verbs to what's actually needed (e.g.,
get,list,watch,create,update,patch).test/e2e/util/util.go (2)
4162-4174: Consider strengthening the readiness check.The current check only verifies that
ReadyReplicas > 0, which may not ensure the Deployment is fully ready. If the Deployment has multiple replicas, this could return success prematurely.🔎 Proposed improvement
func WaitForDeploymentAvailable(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { g := NewWithT(t) t.Logf("Waiting for deployment %s/%s to be ready", namespace, name) g.Eventually(func() bool { deployment := &appsv1.Deployment{} err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, deployment) if err != nil { t.Logf("Failed to get deployment %s/%s: %v", namespace, name, err) return false } - return deployment.Status.ReadyReplicas > 0 + // Check that all desired replicas are ready + return deployment.Status.ReadyReplicas > 0 && + deployment.Status.ReadyReplicas == deployment.Status.Replicas && + deployment.Status.ObservedGeneration >= deployment.Generation }, timeout, interval).Should(BeTrue(), fmt.Sprintf("deployment %s/%s should be ready", namespace, name)) }
4177-4189: Consider strengthening the DaemonSet readiness check.The current check only verifies that
NumberReady > 0, which doesn't ensure the DaemonSet has pods ready on all desired nodes.🔎 Proposed improvement
func WaitForDaemonSetReady(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { g := NewWithT(t) t.Logf("Waiting for DaemonSet %s/%s to be ready", namespace, name) g.Eventually(func() bool { ds := &appsv1.DaemonSet{} err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, ds) if err != nil { t.Logf("Failed to get DaemonSet %s/%s: %v", namespace, name, err) return false } - return ds.Status.NumberReady > 0 + // Check that all desired pods are ready and the rollout is complete + return ds.Status.NumberReady > 0 && + ds.Status.NumberReady == ds.Status.DesiredNumberScheduled && + ds.Status.ObservedGeneration >= ds.Generation }, timeout, interval).Should(BeTrue(), fmt.Sprintf("DaemonSet %s/%s should be ready", namespace, name)) }test/e2e/util/cilium.go (3)
259-307: Complex Eventually block with nested updates - consider simplifying.The Eventually block handles the case where the Cilium operator might overwrite the CiliumConfig, but the nested update logic makes it harder to debug failures. Consider extracting this to a helper function or adding more structured logging.
This is not blocking, but could improve maintainability for debugging test failures.
470-493: Replace fixed sleep with polling for test completion.The current implementation waits a fixed 60 seconds (line 474) and then only checks if pods are still running. This approach has limitations:
- Fixed waits are brittle - tests might pass/fail based on timing
- Checking only pod phase misses container-level failures (CrashLoopBackOff, errors)
- Could result in false positives if connectivity tests are still running
Consider polling for test completion status instead (e.g., checking container termination status or using a Job-based approach) and examining container logs for test results.
556-558: Use existing pointer helper instead of custom function.The repository already uses
k8s.io/utils/ptr(imported asptrin other files). Consider usingptr.To[bool](b)instead of maintaining a customptrBoolfunction.🔎 Proposed change
-func ptrBool(b bool) *bool { - return &b -}And update the usage at line 386:
- AllowPrivilegeEscalation: ptrBool(false), + AllowPrivilegeEscalation: ptr.To(false),
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (20)
test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yamltest/e2e/assets/cilium/v1.14.5/connectivity-check.yamltest/e2e/cilium_network_policy_test.gotest/e2e/e2e_test.gotest/e2e/util/cilium.gotest/e2e/util/hypershift_framework.gotest/e2e/util/options.gotest/e2e/util/util.go
🚧 Files skipped from review as they are similar to previous changes (10)
- test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00006-leader-election-rolebinding.yaml
- test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00010-cilium-cilium-olm-clusterrolebinding.yaml
- test/e2e/e2e_test.go
- test/e2e/util/options.go
- test/e2e/util/hypershift_framework.go
- test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00001-cilium-olm-serviceaccount.yaml
- test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00004-cilium-olm-leader-election-role.yaml
- test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00003-cilium-olm-service.yaml
- test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00008-cilium-cilium-olm-clusterrole.yaml
- test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00000-cilium-namespace.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
test/e2e/cilium_network_policy_test.gotest/e2e/util/util.gotest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yamltest/e2e/util/cilium.gotest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00005-cilium-olm-role.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yamltest/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yamltest/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
🧬 Code graph analysis (1)
test/e2e/util/cilium.go (2)
support/azureutil/azureutil.go (1)
IsAroHCP(217-219)test/e2e/util/util.go (3)
ApplyYAML(4194-4264)WaitForDeploymentAvailable(4163-4175)WaitForDaemonSetReady(4178-4190)
🪛 Checkov (3.2.334)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00002-cilium-olm-deployment.yaml
[medium] 1-75: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 1-75: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 1-75: Containers should not share the host network namespace
(CKV_K8S_19)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00009-cilium-cilium-clusterrole.yaml
[medium] 1-82: Minimize wildcard use in Roles and ClusterRoles
(CKV_K8S_49)
test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml
[medium] 3-58: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 3-58: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 59-115: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 59-115: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 116-180: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 116-180: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 116-180: Containers should not share the host network namespace
(CKV_K8S_19)
[medium] 181-236: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 181-236: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 237-292: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 237-292: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 293-338: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 293-338: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 339-394: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 339-394: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 395-450: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 395-450: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 451-516: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 451-516: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 517-582: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 517-582: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 583-649: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 583-649: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 583-649: Containers should not share the host network namespace
(CKV_K8S_19)
[medium] 650-716: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 650-716: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 650-716: Containers should not share the host network namespace
(CKV_K8S_19)
[medium] 717-782: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 717-782: Minimize the admission of root containers
(CKV_K8S_23)
[medium] 783-848: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 783-848: Minimize the admission of root containers
(CKV_K8S_23)
🪛 golangci-lint (2.5.0)
test/e2e/util/cilium.go
[error] 228-228: : # github.com/openshift/hypershift/sync-global-pullsecret [github.com/openshift/hypershift/sync-global-pullsecret.test]
sync-global-pullsecret/sync-global-pullsecret_test.go:228:23: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:234:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:247:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:257:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:270:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:283:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:296:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:309:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:327:12: undefined: NewMockdbusConn
(typecheck)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Red Hat Konflux / hypershift-release-mce-211-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-operator-main-on-pull-request
- GitHub Check: Red Hat Konflux / hypershift-cli-mce-211-on-pull-request
🔇 Additional comments (7)
test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00007-cilium-olm-rolebinding.yaml (1)
1-13: LGTM!Standard RoleBinding correctly binding the
cilium-olmRole to thecilium-olmServiceAccount in theciliumnamespace.test/e2e/assets/cilium/v1.14.5/cluster-network-03-cilium-ciliumconfigs-crd.yaml (1)
1-41: LGTM!Well-structured CRD definition for
CiliumConfigwith appropriatex-kubernetes-preserve-unknown-fieldsfor flexible operator configuration and status subresource enabled.test/e2e/cilium_network_policy_test.go (2)
43-51: LGTM!Good test structure with proper resource cleanup via
defer. The nested ARO HCP check correctly differentiates between generic Azure and ARO HCP-specific requirements.
17-25: No changes needed. The comment accurately describes the test implementation.EnsureCiliumConnectivityTestResourcesinternally checks Cilium pod readiness as its first step via theCheckCiliumPodsRunningsubtest before proceeding with subsequent setup operations. The documented steps align with the actual execution flow.test/e2e/assets/cilium/v1.14.5/connectivity-check.yaml (1)
1-2: Auto-generated test manifest - static analysis findings are expected.This is a standard Cilium connectivity test suite auto-generated by Makefile. The
hostNetwork: trueconfigurations and lack of explicit security contexts are intentional for testing various networking scenarios (host-to-pod, cross-node, nodeport access). The static analysis warnings about privilege escalation and host networking are false positives in this context.test/e2e/assets/cilium/v1.14.5/cluster-network-06-cilium-00011-cilium-cilium-clusterrolebinding.yaml (1)
1-12: LGTM!Standard ClusterRoleBinding manifest that correctly binds the cilium-cilium ClusterRole to the cilium-olm ServiceAccount in the cilium namespace.
test/e2e/util/cilium.go (1)
81-114: LGTM - Appropriately permissive SCC for CNI workload.The SecurityContextConstraints are correctly configured for a CNI provider like Cilium, which requires elevated privileges (host network, privileged containers, etc.). The permissions are scoped to specific service accounts in the cilium namespace.
|
@jparrill @mgencur @heliubj18 Submitted a new pr to make it easy to review, it's almost the same with the old pr, just move some templates to assets directory, can you review again, thanks. |
|
The AKS test fails with: (waits for OVN but this is Cilium) |
| t.Skip("Skipping test because it requires Azure platform") | ||
| } | ||
|
|
||
| if globalOpts.ExternalCNIProvider != "cilium" { |
There was a problem hiding this comment.
Can you extract "cilium" as const and re-use in the other places? It's used a few times in this PR.
There was a problem hiding this comment.
@mgencur Thanks reviewing, i will update others and test it.
There was a problem hiding this comment.
@mgencur can you help to see my updated the feature codes again? if any other needs to update,feel free to add comments, thanks a lot for reviewing.
| flag.BoolVar(&globalOpts.HOInstallationOptions.EnableCIDebugOutput, "e2e.ho-enable-ci-debug-output", false, "Install the HyperShift Operator with extra CI debug output enabled. This is a HyperShift Operator installation option") | ||
| flag.StringVar(&globalOpts.HOInstallationOptions.PlatformMonitoring, "e2e.platform-monitoring", "All", "The option for enabling platform cluster monitoring when installing the HyperShift Operator. Valid values are: None, OperatorOnly, All. This is a HyperShift Operator installation option") | ||
| flag.BoolVar(&globalOpts.RunUpgradeTest, "upgrade.run-tests", false, "Run HyperShift Operator upgrade test") | ||
| flag.StringVar(&globalOpts.ExternalCNIProvider, "e2e.external-cni-provider", "", "the option supported third cni provider: cilium, calico") |
There was a problem hiding this comment.
Does it make sense to mention "calico" here? Maybe something like this would be better: The option supports the following CNI providers: cilium
...and later we could add more if necessary.
| CiliumDefaultTimeout = 10 * time.Minute | ||
| CiliumLongTimeout = 20 * time.Minute | ||
| CiliumShortTimeout = 2 * time.Minute | ||
| CiliumDefaultPollInterval = 10 * time.Second |
There was a problem hiding this comment.
On the other PR, @jparrill was requesting those variables to be package private (AFAIK). #7077 (comment)
I'm just bring this here to speed up the loop.
| if _, err := os.Stat(localPath); err == nil { | ||
| manifests = append(manifests, localPath) | ||
| } else { | ||
| manifests = append(manifests, fmt.Sprintf("%s/%s", remoteBase, f)) |
There was a problem hiding this comment.
I suppose you included the cilium yaml files in this PR to prevent downloading the files every time (e.g. for disconnected envs). So, we have the files locally now, so why this backup for downloading the files?
It is unlikely that some files would be accidentally removed from the repo.
| func (h *hypershiftTest) before(hostedCluster *hyperv1.HostedCluster, opts *PlatformAgnosticOptions, platform hyperv1.PlatformType) { | ||
| h.Run("ValidateHostedCluster", func(t *testing.T) { | ||
| if platform != hyperv1.NonePlatform { | ||
| if platform != hyperv1.NonePlatform && hostedCluster.Spec.Networking.NetworkType != hyperv1.Other { |
There was a problem hiding this comment.
This is a related comment on the PR that will trigger the test suite: https://github.com/openshift/release/pull/71383/files#r2672253127 . PTAL. I think it's important to set --network-type=Other there.
There was a problem hiding this comment.
You mean if EXTERNAL_CNI_PROVIDER is not null, will add --e2e.external-cni-provider and --network-type=Other in https://github.com/openshift/release/pull/71383/files#diff-36e084ed3de5a17840678199ccb286a3ff6b2e47b9b6ebbbf011a26e43932f5cL106 right? i will update that pr.
I have ci pr: openshift/release#71383 is dependent on the pr. |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
test/e2e/util/hypershift_framework.go (1)
132-168: Cilium pre-install hook: fix context usage + avoid exact-node-count hang + don’tFatalon NodePoolReplicas==0.
context.Background()drops cancellation/timeouts from the test run.WaitForNReadyNodesrequires an exact node count; usingopts.NodePoolReplicascan hang if actual node count differs.- Comment says NodePoolReplicas can be 0 (e.g., TestNodePool), but current code hard-fails.
Proposed fix
@@ func (h *hypershiftTest) before(hostedCluster *hyperv1.HostedCluster, opts *PlatformAgnosticOptions, platform hyperv1.PlatformType) { h.Run("ValidateHostedCluster", func(t *testing.T) { + ctx := t.Context() if platform != hyperv1.NonePlatform && hostedCluster.Spec.Networking.NetworkType != hyperv1.Other { if opts.AWSPlatform.EndpointAccess == string(hyperv1.Private) { ValidatePrivateCluster(t, h.ctx, h.client, hostedCluster, opts) } else { ValidatePublicCluster(t, h.ctx, h.client, hostedCluster, opts) } @@ if opts.ExternalCNIProvider == CiliumCNIProvider { @@ if !util.IsPrivateHC(hostedCluster) { - if opts.NodePoolReplicas == 0 { - t.Fatal("NodePool replicas must be positive for Cilium to install.") - } + if opts.NodePoolReplicas == 0 { + t.Log("Skipping Cilium install during initial validation: no worker nodes configured yet (NodePoolReplicas=0).") + return + } if h.assetReader == nil { t.Fatal("AssetReader is required for Cilium installation. Call WithAssetReader() on the test instance.") } - guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster) - InstallCilium(t, context.Background(), guestClient, hostedCluster, h.assetReader) + guestClient := WaitForGuestClient(t, ctx, h.client, hostedCluster) + InstallCilium(t, ctx, guestClient, hostedCluster, h.assetReader) + + // Match node counting behavior used elsewhere (avoid hanging on "got X, want Y"). + expectedNodes := opts.NodePoolReplicas + if platform == hyperv1.AWSPlatform && len(opts.AWSPlatform.Zones) > 0 { + expectedNodes = expectedNodes * int32(len(opts.AWSPlatform.Zones)) + } // wait hosted cluster ready - WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform) - WaitForImageRollout(t, context.Background(), h.client, hostedCluster) - ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute) + WaitForNReadyNodes(t, ctx, guestClient, expectedNodes, platform) + WaitForImageRollout(t, ctx, h.client, hostedCluster) + ValidateHostedClusterConditions(t, ctx, h.client, hostedCluster, true, 10*time.Minute) } } }) }test/e2e/azure_scheduler_test.go (1)
42-59: Verify import availability forcontent.ReadFilecall.
Ensure thecontentpackage is imported in the file; if not, add the appropriate import used by other e2e tests callingWithAssetReader(content.ReadFile).
🤖 Fix all issues with AI agents
In @test/e2e/util/cilium.go:
- Around line 33-48: The code uses CiliumVersion (env) to build asset paths via
MustAsset which will panic if the asset path is invalid; update places that
construct assets (any code calling MustAsset or building paths with
CiliumVersion, e.g., the functions/blocks around the current top-level
CiliumVersion var and the usages later at lines ~137-205) to first guard that
CiliumVersion != "" and fail fast or skip tests when empty; specifically, before
any call to MustAsset or any asset path construction that interpolates
CiliumVersion, add a check that returns an error/skips the test (or uses a safe
lookup that returns an error) instead of calling MustAsset directly so you avoid
a panic.
- Around line 334-486: In EnsureCiliumConnectivityTestResources inside the
"WaitForConnectivityTestCompletion" block update the pod phase validation (the
loop iterating over podList.Items) so it does not treat completed Job pods as
failures: change the failure condition that currently checks pod.Status.Phase !=
corev1.PodRunning to accept corev1.PodSucceeded as a valid terminal success
(e.g., treat pods with Phase Running OR Succeeded as OK), or alternatively skip
pods owned by Job resources; adjust the construction of failedPods accordingly
so only pods not in Running or Succeeded are reported as failures.
In @test/e2e/util/util.go:
- Around line 4162-4190: The current wait helpers only check for >0 readiness
which can hide partial rollouts; update WaitForDeploymentAvailable to fetch the
Deployment, compute desiredReplicas := int32(1) or use *deployment.Spec.Replicas
if set, and assert deployment.Status.ObservedGeneration >= deployment.Generation
AND deployment.Status.AvailableReplicas == desiredReplicas (or ReadyReplicas ==
desiredReplicas), returning true only when fully available; similarly update
WaitForDaemonSetReady to assert ds.Status.ObservedGeneration >= ds.Generation
AND ds.Status.NumberReady == ds.Status.DesiredNumberScheduled (or
NumberAvailable == DesiredNumberScheduled) so partial readiness fails; include
the error and status details in the t.Logf messages for easier debugging and
keep the NewWithT(...).Eventually checks but replace the >0 checks with these
full-availability conditions.
- Around line 4192-4221: The ApplyYAML function uses http.Get which ignores the
provided ctx and has no timeout; change ApplyYAML to create an HTTP request
bound to the passed ctx (e.g., http.NewRequestWithContext(ctx, "GET", pathOrURL,
nil)) and use an http.Client with a reasonable Timeout (or rely on ctx
cancellation) to perform the request, then read resp.Body as before and still
defer resp.Body.Close(); keep the existing error handling for non-200 responses
and ensure the request uses ctx so CI cannot hang indefinitely.
🧹 Nitpick comments (3)
test/e2e/util/hypershift_framework.go (1)
42-54: NewExternalCNIProvideroption is fine, but please document/enforce supported values.
Right now anything other than"cilium"is silently treated as “off”; consider validating/normalizing input at option parsing time to avoid typos creating confusing runs.test/e2e/util/util.go (1)
4223-4269: Namespace defaulting can useclient.IsObjectNamespaced(...)(less RESTMapper plumbing).
This reduces per-object RESTMapping calls and keeps the code aligned with controller-runtime’s client API.test/e2e/util/cilium.go (1)
542-615:ptrBoolhelper is redundant withk8s.io/utils/ptr.
Optional cleanup: replaceptrBool(false)withptr.To(false)and drop the helper.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (9)
test/e2e/autoscaling_test.gotest/e2e/azure_scheduler_test.gotest/e2e/cilium_network_policy_test.gotest/e2e/control_plane_upgrade_test.gotest/e2e/create_cluster_test.gotest/e2e/e2e_test.gotest/e2e/util/cilium.gotest/e2e/util/hypershift_framework.gotest/e2e/util/util.go
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/e2e_test.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
test/e2e/create_cluster_test.gotest/e2e/cilium_network_policy_test.gotest/e2e/azure_scheduler_test.gotest/e2e/control_plane_upgrade_test.gotest/e2e/util/hypershift_framework.gotest/e2e/util/util.gotest/e2e/autoscaling_test.gotest/e2e/util/cilium.go
🧬 Code graph analysis (7)
test/e2e/create_cluster_test.go (1)
karpenter-operator/controllers/karpenter/assets/assets.go (1)
ReadFile(23-25)
test/e2e/cilium_network_policy_test.go (3)
test/e2e/util/cilium.go (2)
CiliumCNIProvider(40-40)EnsureCiliumConnectivityTestResources(336-486)test/e2e/util/hypershift_framework.go (1)
NewHypershiftTest(68-80)support/azureutil/azureutil.go (1)
IsAroHCP(217-219)
test/e2e/azure_scheduler_test.go (2)
karpenter-operator/controllers/karpenter/assets/assets.go (1)
ReadFile(23-25)contrib/gomaxprocs-webhook/cmd/root.go (1)
Execute(29-36)
test/e2e/control_plane_upgrade_test.go (2)
karpenter-operator/controllers/karpenter/assets/assets.go (1)
ReadFile(23-25)contrib/gomaxprocs-webhook/cmd/root.go (1)
Execute(29-36)
test/e2e/util/hypershift_framework.go (3)
support/assets/readasset.go (1)
AssetReader(17-17)test/e2e/util/cilium.go (2)
CiliumCNIProvider(40-40)InstallCilium(139-332)test/e2e/util/util.go (1)
WaitForGuestClient(325-373)
test/e2e/autoscaling_test.go (1)
karpenter-operator/controllers/karpenter/assets/assets.go (1)
ReadFile(23-25)
test/e2e/util/cilium.go (1)
support/azureutil/azureutil.go (1)
IsAroHCP(217-219)
🪛 golangci-lint (2.5.0)
test/e2e/util/cilium.go
[error] 228-228: : # github.com/openshift/hypershift/sync-global-pullsecret [github.com/openshift/hypershift/sync-global-pullsecret.test]
sync-global-pullsecret/sync-global-pullsecret_test.go:228:23: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:234:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:247:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:257:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:270:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:283:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:296:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:309:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:327:12: undefined: NewMockdbusConn
(typecheck)
🔇 Additional comments (7)
test/e2e/util/hypershift_framework.go (1)
57-85:WithAssetReader(...)is a clean injection point.
This keeps asset loading out of global state and makes tests easier to compose.test/e2e/util/cilium.go (1)
217-330: > Likely an incorrect or invalid review comment.test/e2e/cilium_network_policy_test.go (1)
5-52: Since the referenced file and packages cannot be verified in the available sources, manual verification by the developer or access to the actual branch/repository containing this code is required to confirm the build-break and validate the suggested fixes.test/e2e/autoscaling_test.go (1)
79-79: LGTM!The asset reader is correctly wired into the test builder chain, consistent with the broader refactoring pattern across e2e tests.
test/e2e/create_cluster_test.go (2)
2393-2394: LGTM!The asset reader integration follows the established pattern. The line break improves readability.
2623-2623: LGTM!Consistent with the asset reader pattern applied across the e2e test suite.
test/e2e/control_plane_upgrade_test.go (1)
76-76: LGTM!The asset reader is properly integrated into the control plane upgrade test, maintaining consistency with other e2e tests in this PR.
| var ( | ||
| // CiliumVersion is read from CILIUM_VERSION at runtime. When empty, the Cilium tests should skip. | ||
| CiliumVersion = os.Getenv("CILIUM_VERSION") | ||
| ) | ||
|
|
||
| const ( | ||
| // CiliumCNIProvider is the name of the Cilium CNI provider. | ||
| CiliumCNIProvider = "cilium" | ||
| // Generic timeouts and intervals for Cilium tests | ||
| CiliumDefaultTimeout = 10 * time.Minute | ||
| CiliumLongTimeout = 20 * time.Minute | ||
| CiliumShortTimeout = 2 * time.Minute | ||
| CiliumDefaultPollInterval = 10 * time.Second | ||
| CiliumLongPollInterval = 15 * time.Second | ||
| CiliumConnectivityWaitDuration = 60 * time.Second | ||
| ) |
There was a problem hiding this comment.
Guard CiliumVersion (env) before building asset paths; MustAsset will panic otherwise.
Proposed fix
func InstallCilium(t *testing.T, ctx context.Context, guestClient crclient.Client, hostedCluster *hyperv1.HostedCluster, reader assets.AssetReader) {
t.Run("InstallCilium", func(t *testing.T) {
+ if CiliumVersion == "" {
+ t.Skip("CILIUM_VERSION is not set; skipping Cilium installation")
+ }
+ if reader == nil {
+ t.Fatalf("asset reader is required for Cilium installation")
+ }
if !azureutil.IsAroHCP() {
t.Skip("test only supported on ARO HCP clusters")
}
@@
for _, filename := range manifestFiles {
manifestPath := fmt.Sprintf("assets/cilium/v%s/%s", CiliumVersion, filename)
t.Logf("Applying manifest from %s", manifestPath)
- yamlContent := assets.MustAsset(reader, manifestPath)
+ yamlContent := assets.MustAsset(reader, manifestPath)
err := ApplyYAMLBytes(ctx, guestClient, yamlContent)
g.Expect(err).ToNot(HaveOccurred(), "failed to apply manifest from %s", manifestPath)
}Also applies to: 137-205
🤖 Prompt for AI Agents
In @test/e2e/util/cilium.go around lines 33 - 48, The code uses CiliumVersion
(env) to build asset paths via MustAsset which will panic if the asset path is
invalid; update places that construct assets (any code calling MustAsset or
building paths with CiliumVersion, e.g., the functions/blocks around the current
top-level CiliumVersion var and the usages later at lines ~137-205) to first
guard that CiliumVersion != "" and fail fast or skip tests when empty;
specifically, before any call to MustAsset or any asset path construction that
interpolates CiliumVersion, add a check that returns an error/skips the test (or
uses a safe lookup that returns an error) instead of calling MustAsset directly
so you avoid a panic.
| // EnsureCiliumConnectivityTestResources performs the Cilium connectivity tests. | ||
| // It returns a cleanup function. | ||
| func EnsureCiliumConnectivityTestResources(t *testing.T, ctx context.Context, guestClient crclient.Client, reader assets.AssetReader) func() { | ||
| t.Run("CheckCiliumPodsRunning", func(t *testing.T) { | ||
| g := NewWithT(t) | ||
|
|
||
| t.Log("Check cilium pods to be running") | ||
| g.Eventually(func(g Gomega) { | ||
| podList := &corev1.PodList{} | ||
| err := guestClient.List(ctx, podList, crclient.InNamespace(ciliumNamespace)) | ||
| g.Expect(err).NotTo(HaveOccurred(), "failed to list Cilium pods") | ||
|
|
||
| g.Expect(podList.Items).NotTo(BeEmpty(), "no Cilium pods found yet") | ||
| t.Logf("Found %d Cilium pods", len(podList.Items)) | ||
|
|
||
| // Check all pods are Ready | ||
| g.Expect(podList.Items).To(HaveEach( | ||
| HaveField("Status.Conditions", ContainElement( | ||
| And( | ||
| HaveField("Type", corev1.PodReady), | ||
| HaveField("Status", corev1.ConditionTrue), | ||
| ), | ||
| )), | ||
| ), "not all Cilium pods are ready") | ||
| }, CiliumLongTimeout, CiliumDefaultPollInterval).Should(Succeed(), "all Cilium pods should be running") | ||
|
|
||
| t.Log("All Cilium pods are running") | ||
| }) | ||
|
|
||
| t.Run("CreateSecurityContextConstraints", func(t *testing.T) { | ||
| g := NewWithT(t) | ||
|
|
||
| t.Log("Creating SecurityContextConstraints for Cilium connectivity test") | ||
| scc := &securityv1.SecurityContextConstraints{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "cilium-test", | ||
| }, | ||
| AllowHostPorts: true, | ||
| AllowHostNetwork: true, | ||
| AllowHostDirVolumePlugin: false, | ||
| AllowHostIPC: false, | ||
| AllowHostPID: false, | ||
| AllowPrivilegeEscalation: ptrBool(false), | ||
| AllowPrivilegedContainer: false, | ||
| ReadOnlyRootFilesystem: false, | ||
| RequiredDropCapabilities: []corev1.Capability{}, | ||
| RunAsUser: securityv1.RunAsUserStrategyOptions{Type: securityv1.RunAsUserStrategyMustRunAsRange}, | ||
| SELinuxContext: securityv1.SELinuxContextStrategyOptions{Type: securityv1.SELinuxStrategyMustRunAs}, | ||
| Users: []string{fmt.Sprintf("system:serviceaccount:%s:%s", ciliumTestNamespace, ciliumTestServiceAccount)}, | ||
| } | ||
|
|
||
| err := guestClient.Create(ctx, scc) | ||
| if err != nil && !apierrors.IsAlreadyExists(err) { | ||
| g.Expect(err).ToNot(HaveOccurred(), "failed to create SCC") | ||
| } | ||
|
|
||
| t.Log("SecurityContextConstraints created successfully") | ||
| }) | ||
|
|
||
| t.Run("CreateTestNamespace", func(t *testing.T) { | ||
| g := NewWithT(t) | ||
|
|
||
| t.Log("Creating cilium-test namespace") | ||
| ns := &corev1.Namespace{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: ciliumTestNamespace, | ||
| Labels: map[string]string{"security.openshift.io/scc.podSecurityLabelSync": "false", | ||
| "pod-security.kubernetes.io/enforce": "privileged", | ||
| "pod-security.kubernetes.io/audit": "privileged", | ||
| "pod-security.kubernetes.io/warn": "privileged", | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| err := guestClient.Create(ctx, ns) | ||
| if err != nil && !apierrors.IsAlreadyExists(err) { | ||
| g.Expect(err).ToNot(HaveOccurred(), "failed to create namespace") | ||
| } | ||
|
|
||
| t.Log("Test namespace created successfully") | ||
| }) | ||
|
|
||
| t.Run("DeployCiliumConnectivityTest", func(t *testing.T) { | ||
| g := NewWithT(t) | ||
|
|
||
| t.Logf("Deploying Cilium connectivity test from version %s", CiliumVersion) | ||
|
|
||
| manifestPath := fmt.Sprintf("assets/cilium/v%s/connectivity-check.yaml", CiliumVersion) | ||
| yamlContent := assets.MustAsset(reader, manifestPath) | ||
|
|
||
| err := ApplyYAMLBytes(ctx, guestClient, yamlContent, ciliumTestNamespace) | ||
| g.Expect(err).ToNot(HaveOccurred(), "failed to apply connectivity test manifest") | ||
|
|
||
| t.Log("Connectivity test manifests applied successfully") | ||
| }) | ||
|
|
||
| t.Run("WaitForConnectivityTestPodsReady", func(t *testing.T) { | ||
| g := NewWithT(t) | ||
|
|
||
| t.Log("Waiting for connectivity test pods to be ready") | ||
| g.Eventually(func(g Gomega) { | ||
| podList := &corev1.PodList{} | ||
| err := guestClient.List(ctx, podList, crclient.InNamespace(ciliumTestNamespace)) | ||
| g.Expect(err).NotTo(HaveOccurred(), "failed to list pods") | ||
| g.Expect(podList.Items).NotTo(BeEmpty(), "no pods found in cilium-test namespace yet") | ||
|
|
||
| t.Logf("Found %d pods in cilium-test namespace", len(podList.Items)) | ||
|
|
||
| // Check all pods are Ready | ||
| g.Expect(podList.Items).To(HaveEach( | ||
| HaveField("Status.Conditions", ContainElement( | ||
| And( | ||
| HaveField("Type", corev1.PodReady), | ||
| HaveField("Status", corev1.ConditionTrue), | ||
| ), | ||
| )), | ||
| ), "some pods are not ready") | ||
| }, CiliumDefaultTimeout, CiliumLongPollInterval).Should(Succeed(), "all connectivity test pods should be ready") | ||
|
|
||
| t.Log("All connectivity test pods are ready") | ||
| }) | ||
|
|
||
| t.Run("WaitForConnectivityTestCompletion", func(t *testing.T) { | ||
| g := NewWithT(t) | ||
|
|
||
| t.Logf("Waiting %v for connectivity tests to run", CiliumConnectivityWaitDuration) | ||
| time.Sleep(CiliumConnectivityWaitDuration) | ||
|
|
||
| t.Log("Verifying all test pods are still running") | ||
| podList := &corev1.PodList{} | ||
| err := guestClient.List(ctx, podList, crclient.InNamespace(ciliumTestNamespace)) | ||
| g.Expect(err).ToNot(HaveOccurred(), "should be able to list test pods") | ||
|
|
||
| failedPods := []string{} | ||
| for _, pod := range podList.Items { | ||
| if pod.Status.Phase != corev1.PodRunning { | ||
| failedPods = append(failedPods, fmt.Sprintf("%s (phase: %s)", pod.Name, pod.Status.Phase)) | ||
| } | ||
| } | ||
|
|
||
| if len(failedPods) > 0 { | ||
| t.Errorf("Found %d pods not in Running phase: %v", len(failedPods), failedPods) | ||
| } else { | ||
| t.Logf("All %d connectivity test pods are running successfully", len(podList.Items)) | ||
| } | ||
| }) // Closes the anonymous function for "WaitForConnectivityTestCompletion" | ||
|
|
||
| t.Log("Cilium connectivity test completed successfully") | ||
|
|
||
| return func() { | ||
| CleanupCiliumConnectivityTestResources(ctx, t, guestClient) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, find the connectivity-check.yaml file referenced in the code
find . -name "connectivity-check.yaml" -type f 2>/dev/nullRepository: openshift/hypershift
Length of output: 184
🌐 Web query:
Cilium connectivity-check.yaml manifest Jobs Pods expected behavior
💡 Result:
Summary of what connectivity-check.yaml does and the expected behavior
-
What the manifest deploys
- A dedicated test namespace (commonly cilium-test) plus multiple Pods/Deployments that act as servers, clients, and host-netns helpers, Services for those servers, and CiliumNetworkPolicy objects used by the tests. It also creates Kubernetes Jobs (or test controller Pods) that orchestrate the connectivity scenarios and verify results. [1][2]
-
Expected runtime behavior
- Server Pods start and become Running and Ready (1/1). Services become addressable. Client Pods are created and attempt connections (curl, ping) to the servers across the different scenarios (pod-to-pod, pod-to-service, inter-node, encryption, etc.). Host-netns/helper Pods run tcpdump/host checks where needed. The test Jobs run the sequence of checks and complete with success if all checks pass; their Pods exit successfully (Job completes). If all tests pass, you should see all test Pods Running/Ready during checks and Jobs showing Completed. [1][2][3]
-
How to interpret results
- Jobs Completed (exit code 0) = connectivity scenarios passed.
- Failed Jobs or Pods in CrashLoop/Failed = specific scenario failures (look at Job/Pod logs for failing curl/ping/tcpdump actions).
- If Pods never become Ready or tcpdump/host-netns fails, the test will fail. Typical causes: network policies present in the namespace, Cilium not managing pod networking (pods started before Cilium), missing node-to-node ports/firewall rules, or cluster-wide policies interfering. [1][2][4]
-
Useful debugging steps
- Run in an isolated namespace with no other NetworkPolicy/Cilium policies.
- kubectl -n get pods, get jobs, kubectl logs and kubectl describe on failing pods/jobs.
- Check Cilium agent logs and that node-to-node ports (e.g., VXLAN/Geneve port) and ICMP are allowed if required. Use cilium connectivity test (CLI) for higher-level orchestration which also collects debugging info. [1][3][4]
Sources
[1] Cilium docs — Connectivity tests / troubleshooting.
[2] Cilium example connectivity-check/connectivity-check.yaml (examples & description).
[3] cilium connectivity test CLI docs.
[4] Troubleshooting notes about common failures (Cilium docs).
Connectivity test check incorrectly fails on completed Jobs.
The connectivity-check.yaml manifest deploys test Jobs that complete successfully (Succeeded phase) after all checks pass. However, the code at lines 434–443 reports any pod not in Running phase as a failure, which will incorrectly flag successfully completed test Jobs as failures. Update the pod-phase check to accept Running, Succeeded, or completed states as valid outcomes, or exclude completed Jobs from the failure check.
🤖 Prompt for AI Agents
In @test/e2e/util/cilium.go around lines 334 - 486, In
EnsureCiliumConnectivityTestResources inside the
"WaitForConnectivityTestCompletion" block update the pod phase validation (the
loop iterating over podList.Items) so it does not treat completed Job pods as
failures: change the failure condition that currently checks pod.Status.Phase !=
corev1.PodRunning to accept corev1.PodSucceeded as a valid terminal success
(e.g., treat pods with Phase Running OR Succeeded as OK), or alternatively skip
pods owned by Job resources; adjust the construction of failedPods accordingly
so only pods not in Running or Succeeded are reported as failures.
| // WaitForDeploymentAvailable waits for a deployment to be ready. | ||
| func WaitForDeploymentAvailable(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { | ||
| g := NewWithT(t) | ||
| t.Logf("Waiting for deployment %s/%s to be ready", namespace, name) | ||
| g.Eventually(func() bool { | ||
| deployment := &appsv1.Deployment{} | ||
| err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, deployment) | ||
| if err != nil { | ||
| t.Logf("Failed to get deployment %s/%s: %v", namespace, name, err) | ||
| return false | ||
| } | ||
| return deployment.Status.ReadyReplicas > 0 | ||
| }, timeout, interval).Should(BeTrue(), fmt.Sprintf("deployment %s/%s should be ready", namespace, name)) | ||
| } | ||
|
|
||
| // WaitForDaemonSetReady waits for a DaemonSet to be ready. | ||
| func WaitForDaemonSetReady(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { | ||
| g := NewWithT(t) | ||
| t.Logf("Waiting for DaemonSet %s/%s to be ready", namespace, name) | ||
| g.Eventually(func() bool { | ||
| ds := &appsv1.DaemonSet{} | ||
| err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, ds) | ||
| if err != nil { | ||
| t.Logf("Failed to get DaemonSet %s/%s: %v", namespace, name, err) | ||
| return false | ||
| } | ||
| return ds.Status.NumberReady > 0 | ||
| }, timeout, interval).Should(BeTrue(), fmt.Sprintf("DaemonSet %s/%s should be ready", namespace, name)) | ||
| } |
There was a problem hiding this comment.
Wait helpers are too weak (> 0 readiness) and can mask partial rollouts.
These helpers are used as “readiness gates”; they should typically assert full availability (observed generation, desired vs ready).
Proposed fix
func WaitForDeploymentAvailable(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) {
g := NewWithT(t)
t.Logf("Waiting for deployment %s/%s to be ready", namespace, name)
g.Eventually(func() bool {
deployment := &appsv1.Deployment{}
err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, deployment)
if err != nil {
t.Logf("Failed to get deployment %s/%s: %v", namespace, name, err)
return false
}
- return deployment.Status.ReadyReplicas > 0
+ desired := int32(1)
+ if deployment.Spec.Replicas != nil {
+ desired = *deployment.Spec.Replicas
+ }
+ return deployment.Status.ObservedGeneration >= deployment.Generation &&
+ deployment.Status.AvailableReplicas >= desired
}, timeout, interval).Should(BeTrue(), fmt.Sprintf("deployment %s/%s should be ready", namespace, name))
}
@@
func WaitForDaemonSetReady(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) {
g := NewWithT(t)
t.Logf("Waiting for DaemonSet %s/%s to be ready", namespace, name)
g.Eventually(func() bool {
ds := &appsv1.DaemonSet{}
err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, ds)
if err != nil {
t.Logf("Failed to get DaemonSet %s/%s: %v", namespace, name, err)
return false
}
- return ds.Status.NumberReady > 0
+ return ds.Status.ObservedGeneration >= ds.Generation &&
+ ds.Status.UpdatedNumberScheduled == ds.Status.DesiredNumberScheduled &&
+ ds.Status.NumberReady == ds.Status.DesiredNumberScheduled
}, timeout, interval).Should(BeTrue(), fmt.Sprintf("DaemonSet %s/%s should be ready", namespace, name))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // WaitForDeploymentAvailable waits for a deployment to be ready. | |
| func WaitForDeploymentAvailable(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { | |
| g := NewWithT(t) | |
| t.Logf("Waiting for deployment %s/%s to be ready", namespace, name) | |
| g.Eventually(func() bool { | |
| deployment := &appsv1.Deployment{} | |
| err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, deployment) | |
| if err != nil { | |
| t.Logf("Failed to get deployment %s/%s: %v", namespace, name, err) | |
| return false | |
| } | |
| return deployment.Status.ReadyReplicas > 0 | |
| }, timeout, interval).Should(BeTrue(), fmt.Sprintf("deployment %s/%s should be ready", namespace, name)) | |
| } | |
| // WaitForDaemonSetReady waits for a DaemonSet to be ready. | |
| func WaitForDaemonSetReady(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { | |
| g := NewWithT(t) | |
| t.Logf("Waiting for DaemonSet %s/%s to be ready", namespace, name) | |
| g.Eventually(func() bool { | |
| ds := &appsv1.DaemonSet{} | |
| err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, ds) | |
| if err != nil { | |
| t.Logf("Failed to get DaemonSet %s/%s: %v", namespace, name, err) | |
| return false | |
| } | |
| return ds.Status.NumberReady > 0 | |
| }, timeout, interval).Should(BeTrue(), fmt.Sprintf("DaemonSet %s/%s should be ready", namespace, name)) | |
| } | |
| // WaitForDeploymentAvailable waits for a deployment to be ready. | |
| func WaitForDeploymentAvailable(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { | |
| g := NewWithT(t) | |
| t.Logf("Waiting for deployment %s/%s to be ready", namespace, name) | |
| g.Eventually(func() bool { | |
| deployment := &appsv1.Deployment{} | |
| err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, deployment) | |
| if err != nil { | |
| t.Logf("Failed to get deployment %s/%s: %v", namespace, name, err) | |
| return false | |
| } | |
| desired := int32(1) | |
| if deployment.Spec.Replicas != nil { | |
| desired = *deployment.Spec.Replicas | |
| } | |
| return deployment.Status.ObservedGeneration >= deployment.Generation && | |
| deployment.Status.AvailableReplicas >= desired | |
| }, timeout, interval).Should(BeTrue(), fmt.Sprintf("deployment %s/%s should be ready", namespace, name)) | |
| } | |
| // WaitForDaemonSetReady waits for a DaemonSet to be ready. | |
| func WaitForDaemonSetReady(ctx context.Context, t *testing.T, client crclient.Client, name, namespace string, timeout, interval time.Duration) { | |
| g := NewWithT(t) | |
| t.Logf("Waiting for DaemonSet %s/%s to be ready", namespace, name) | |
| g.Eventually(func() bool { | |
| ds := &appsv1.DaemonSet{} | |
| err := client.Get(ctx, crclient.ObjectKey{Name: name, Namespace: namespace}, ds) | |
| if err != nil { | |
| t.Logf("Failed to get DaemonSet %s/%s: %v", namespace, name, err) | |
| return false | |
| } | |
| return ds.Status.ObservedGeneration >= ds.Generation && | |
| ds.Status.UpdatedNumberScheduled == ds.Status.DesiredNumberScheduled && | |
| ds.Status.NumberReady == ds.Status.DesiredNumberScheduled | |
| }, timeout, interval).Should(BeTrue(), fmt.Sprintf("DaemonSet %s/%s should be ready", namespace, name)) | |
| } |
🤖 Prompt for AI Agents
In @test/e2e/util/util.go around lines 4162 - 4190, The current wait helpers
only check for >0 readiness which can hide partial rollouts; update
WaitForDeploymentAvailable to fetch the Deployment, compute desiredReplicas :=
int32(1) or use *deployment.Spec.Replicas if set, and assert
deployment.Status.ObservedGeneration >= deployment.Generation AND
deployment.Status.AvailableReplicas == desiredReplicas (or ReadyReplicas ==
desiredReplicas), returning true only when fully available; similarly update
WaitForDaemonSetReady to assert ds.Status.ObservedGeneration >= ds.Generation
AND ds.Status.NumberReady == ds.Status.DesiredNumberScheduled (or
NumberAvailable == DesiredNumberScheduled) so partial readiness fails; include
the error and status details in the t.Logf messages for easier debugging and
keep the NewWithT(...).Eventually checks but replace the >0 checks with these
full-availability conditions.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
test/e2e/util/hypershift_framework.go (1)
160-165: Useh.ctxinstead ofcontext.Background()for proper cancellation propagation.These calls use
context.Background()which bypasses the test framework's cancellation. This means operations won't respect test timeouts or abort signals, potentially leaving resources orphaned if the test is cancelled.🔧 Proposed fix
- guestClient := WaitForGuestClient(t, context.Background(), h.client, hostedCluster) - InstallCilium(t, context.Background(), guestClient, hostedCluster, h.assetReader) + guestClient := WaitForGuestClient(t, h.ctx, h.client, hostedCluster) + InstallCilium(t, h.ctx, guestClient, hostedCluster, h.assetReader) // wait hosted cluster ready - WaitForNReadyNodes(t, context.Background(), guestClient, opts.NodePoolReplicas, platform) - WaitForImageRollout(t, context.Background(), h.client, hostedCluster) - ValidateHostedClusterConditions(t, context.Background(), h.client, hostedCluster, true, 10*time.Minute) + WaitForNReadyNodes(t, h.ctx, guestClient, opts.NodePoolReplicas, platform) + WaitForImageRollout(t, h.ctx, h.client, hostedCluster) + ValidateHostedClusterConditions(t, h.ctx, h.client, hostedCluster, true, 10*time.Minute)test/e2e/util/cilium.go (2)
33-36: GuardciliumVersionbefore use to prevent panics.
ciliumVersionis read from environment at package init time. IfCILIUM_VERSIONis unset,assets.MustAssetwill panic when constructing paths likeassets/cilium/v/.... The skip at line 141-143 only checksIsAroHCP(), not the version.🔧 Proposed fix - add guard in InstallCilium
func InstallCilium(t *testing.T, ctx context.Context, guestClient crclient.Client, hostedCluster *hyperv1.HostedCluster, reader assets.AssetReader) { t.Run("InstallCilium", func(t *testing.T) { + if ciliumVersion == "" { + t.Skip("CILIUM_VERSION is not set; skipping Cilium installation") + } if !azureutil.IsAroHCP() { t.Skip("test only supported on ARO HCP clusters") }
467-478: Pod phase check incorrectly fails completed Jobs.The connectivity-check.yaml manifest includes Kubernetes Jobs that complete successfully with
Succeededphase. The current check at line 469 reports any pod not inRunningphase as a failure, which incorrectly flags successfully completed test Jobs.🔧 Proposed fix - accept Succeeded phase for Job pods
failedPods := []string{} for _, pod := range podList.Items { - if pod.Status.Phase != corev1.PodRunning { + if pod.Status.Phase != corev1.PodRunning && pod.Status.Phase != corev1.PodSucceeded { failedPods = append(failedPods, fmt.Sprintf("%s (phase: %s)", pod.Name, pod.Status.Phase)) } }
🧹 Nitpick comments (3)
test/e2e/util/cilium.go (1)
542-544: Usek8s.io/utils/ptrinstead of customptrBoolhelper.The codebase already imports
k8s.io/utils/ptr(line 28) and usesptr.To(true)at line 93. Replace this custom helper withptr.To[bool]for consistency.♻️ Proposed fix
-func ptrBool(b bool) *bool { - return &b -}Then update line 376:
- AllowPrivilegeEscalation: ptrBool(false), + AllowPrivilegeEscalation: ptr.To(false),test/e2e/cilium_network_policy_test.go (2)
39-41: Use theciliumCNIProviderconstant instead of string literal.The constant
ciliumCNIProvideris defined incilium.go(line 40) but is unexported. Either export it asCiliumCNIProviderand use it here, or ensure consistency across the codebase.♻️ Suggested approach
In
cilium.go, export the constant:- ciliumCNIProvider = "cilium" + CiliumCNIProvider = "cilium"Then in this file:
- if globalOpts.ExternalCNIProvider != "cilium" { - t.Skipf("skip cilium connection test if e2e.external-cni-provider is not %s", "cilium") + if globalOpts.ExternalCNIProvider != e2eutil.CiliumCNIProvider { + t.Skipf("skip cilium connection test if e2e.external-cni-provider is not %s", e2eutil.CiliumCNIProvider) }
43-46: Redundant platform check - already verified at line 35.The
IsAroHCP()check at line 44 is performed after the Azure platform check at line 35. If the intent is to also verify ARO HCP specifically, consider combining these checks or clarifying why both are needed.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
📒 Files selected for processing (8)
.tekton/hypershift-cli-mce-211-pull-request.yaml.tekton/hypershift-cli-mce-211-push.yaml.tekton/hypershift-release-mce-211-pull-request.yaml.tekton/hypershift-release-mce-211-push.yamltest/e2e/cilium_network_policy_test.gotest/e2e/e2e_test.gotest/e2e/util/cilium.gotest/e2e/util/hypershift_framework.go
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
test/e2e/cilium_network_policy_test.gotest/e2e/util/hypershift_framework.gotest/e2e/e2e_test.gotest/e2e/util/cilium.go
🧬 Code graph analysis (1)
test/e2e/util/hypershift_framework.go (3)
test/e2e/util/external_oidc.go (1)
ExtOIDCConfig(53-70)test/e2e/util/util.go (3)
WaitForGuestClient(324-372)WaitForNReadyNodes(444-446)WaitForImageRollout(590-631)test/e2e/util/cilium.go (1)
InstallCilium(139-332)
🪛 golangci-lint (2.5.0)
test/e2e/util/cilium.go
[error] 228-228: : # github.com/openshift/hypershift/sync-global-pullsecret [github.com/openshift/hypershift/sync-global-pullsecret.test]
sync-global-pullsecret/sync-global-pullsecret_test.go:228:23: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:234:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:247:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:257:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:270:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:283:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:296:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:309:26: undefined: MockdbusConn
sync-global-pullsecret/sync-global-pullsecret_test.go:327:12: undefined: NewMockdbusConn
(typecheck)
🔇 Additional comments (8)
.tekton/hypershift-cli-mce-211-push.yaml (1)
40-48: LGTM — consistent with other pipeline updates in this PR..tekton/hypershift-release-mce-211-push.yaml (1)
43-51: LGTM — consistent with other pipeline updates in this PR..tekton/hypershift-release-mce-211-pull-request.yaml (1)
40-48: LGTM — consistent with other pipeline updates in this PR.All four MCE 2.11 Tekton pipeline configurations now consistently reference
pipelines/common.yaml..tekton/hypershift-cli-mce-211-pull-request.yaml (1)
40-48: LGTM — migration to maintained common pipeline.The switch from
pipelines/common_mce_2.11.yamltopipelines/common.yamlis valid. The external pipeline reference (https://github.com/stolostron/konflux-build-catalog/main/pipelines/common.yaml) exists and is a well-maintained Tekton pipeline for multi-arch container builds.test/e2e/util/hypershift_framework.go (1)
82-85: LGTM on the fluent builder pattern.The
WithAssetReadermethod follows idiomatic Go patterns for optional configuration and integrates cleanly with the existing test harness.test/e2e/util/cilium.go (1)
67-116: LGTM on namespace and SCC manifest definitions.The PodSecurity labels and SCC configuration follow OpenShift best practices for privileged workloads like Cilium.
test/e2e/e2e_test.go (1)
93-93: LGTM on the new CLI flag.The flag binding is consistent with other platform-specific flags and the description clearly indicates supported providers.
test/e2e/cilium_network_policy_test.go (1)
49-51: No issues found. The code correctly usescontent.ReadFile, which is a package-level variable declared intest/e2e/assets.goand accessible throughout thee2epackage without an explicit import.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
| t.Run("DeployCiliumConnectivityTest", func(t *testing.T) { | ||
| g := NewWithT(t) | ||
|
|
||
| t.Logf("Deploying Cilium connectivity test from version %s", ciliumVersion) | ||
|
|
||
| manifestPath := fmt.Sprintf("assets/cilium/v%s/connectivity-check.yaml", ciliumVersion) | ||
| yamlContent := assets.MustAsset(reader, manifestPath) | ||
|
|
||
| err := ApplyYAMLBytes(ctx, guestClient, yamlContent, ciliumTestNamespace) | ||
| g.Expect(err).ToNot(HaveOccurred(), "failed to apply connectivity test manifest") | ||
|
|
||
| t.Log("Connectivity test manifests applied successfully") | ||
| }) |
There was a problem hiding this comment.
Add ciliumVersion guard before accessing assets in connectivity test.
Similar to InstallCilium, this function uses ciliumVersion at line 421 without validation, which will cause a panic if the environment variable is unset.
🔧 Proposed fix
t.Run("DeployCiliumConnectivityTest", func(t *testing.T) {
g := NewWithT(t)
+ if ciliumVersion == "" {
+ t.Skip("CILIUM_VERSION is not set; skipping connectivity test deployment")
+ }
+
t.Logf("Deploying Cilium connectivity test from version %s", ciliumVersion)
manifestPath := fmt.Sprintf("assets/cilium/v%s/connectivity-check.yaml", ciliumVersion)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| t.Run("DeployCiliumConnectivityTest", func(t *testing.T) { | |
| g := NewWithT(t) | |
| t.Logf("Deploying Cilium connectivity test from version %s", ciliumVersion) | |
| manifestPath := fmt.Sprintf("assets/cilium/v%s/connectivity-check.yaml", ciliumVersion) | |
| yamlContent := assets.MustAsset(reader, manifestPath) | |
| err := ApplyYAMLBytes(ctx, guestClient, yamlContent, ciliumTestNamespace) | |
| g.Expect(err).ToNot(HaveOccurred(), "failed to apply connectivity test manifest") | |
| t.Log("Connectivity test manifests applied successfully") | |
| }) | |
| t.Run("DeployCiliumConnectivityTest", func(t *testing.T) { | |
| g := NewWithT(t) | |
| if ciliumVersion == "" { | |
| t.Skip("CILIUM_VERSION is not set; skipping connectivity test deployment") | |
| } | |
| t.Logf("Deploying Cilium connectivity test from version %s", ciliumVersion) | |
| manifestPath := fmt.Sprintf("assets/cilium/v%s/connectivity-check.yaml", ciliumVersion) | |
| yamlContent := assets.MustAsset(reader, manifestPath) | |
| err := ApplyYAMLBytes(ctx, guestClient, yamlContent, ciliumTestNamespace) | |
| g.Expect(err).ToNot(HaveOccurred(), "failed to apply connectivity test manifest") | |
| t.Log("Connectivity test manifests applied successfully") | |
| }) |
4770c6d to
bba7a39
Compare
|
/lgtm |
|
@wewang58 Thanks for the updates! |
|
@wewang58: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override "Red Hat Konflux / hypershift-operator-main-enterprise-contract / hypershift-operator-main" |
|
@wewang58: wewang58 unauthorized: /override is restricted to Repo administrators, approvers in top level OWNERS file, and the following github teams:openshift: openshift-release-oversight openshift-staff-engineers openshift-sustaining-engineers. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
|
/retest-required |
|
@wewang58 if you rebase, those konflux jobs should pass |
It will lose approve and lgtm right? will ask others again |
|
Let's rebase. I will put lgtm after that. |
Thanks, i will rebase it. |
Why: To enable e2e testing of HyperShift clusters using Cilium as the CNI provider. How: - Added `ExternalCNIProvider` flag and option to the e2e framework. - Implemented logic to install Cilium and skip default network validation when enabled. - Added utility functions `ApplyYAMLFromURL` and wait helpers for Deployments/DaemonSets. - Included Cilium-specific installation logic and network policy tests.
…utilities Refactors Cilium test utilities to use support/assets.AssetReader pattern instead of direct file path manipulation. Makes internal constants and helper functions private, and adds WithAssetReader() method to the test framework for proper asset injection.
The previous pipeline 'pipelines/common_mce_2.11.yaml' uses tasks that are now expired/unsupported. Switched to 'pipelines/common.yaml' which is up-to-date.
…plication Consolidate wait functions and use Eventually patterns for more reliable test polling. Signed-off-by: wen wang <wewang@redhat.com>
|
/lgtm |
|
/retest-required |
|
/test e2e-aws-4-21 |
|
@wewang58: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/verified by @wewang58 |
|
@wewang58: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Yay!! |
What this PR does / why we need it:
This PR implements cilium installation and check basic network policy scenarios test in aks env. close previous pr to make it clear.
Which issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/CNTRLPLANE-1137
Special notes for your reviewer:
ExternalCNIProviderflag and option to the e2e framework.ApplyYAMLFromURLand wait helpers for Deployments/DaemonSets.Checklist: