Skip to content

Conversation

@sgalsaleh
Copy link
Member

What this PR does / why we need it:

Which issue(s) this PR fixes:

Does this PR require a test?

Does this PR require a release note?


Does this PR require documentation?

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

This PR has been released (on staging) and is available for download with a embedded-cluster-smoke-test-staging-app license ID.

Online Installer:

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci/appver-dev-a494ae9" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Airgap Installer (may take a few minutes before the airgap bundle is built):

curl "https://staging.replicated.app/embedded/embedded-cluster-smoke-test-staging-app/ci-airgap/appver-dev-a494ae9?airgap=true" -H "Authorization: $EC_SMOKE_TEST_LICENSE_ID" -o embedded-cluster-smoke-test-staging-app-ci.tgz

Happy debugging!

@sgalsaleh sgalsaleh force-pushed the stop-using-kots branch 2 times, most recently from d130bdc to 1b48b5c Compare December 8, 2025 17:00
@sgalsaleh sgalsaleh changed the title Stop using kots Stop using kots to install the app Dec 8, 2025
@sgalsaleh sgalsaleh marked this pull request as ready for review December 8, 2025 19:17
@sgalsaleh sgalsaleh requested a review from emosbaugh December 8, 2025 19:17
return fmt.Errorf("create metadata client: %w", err)
}
m.mcli = mcli
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Potential nil pointer dereference in namespace reconciler

In setupClients(), mcli (metadata client) is only created when restClientGetter != nil. If kubernetesEnvSettings is nil, mcli remains nil. Later, when hostCABundlePath is non-empty, ensureCAConfigmap calls adminconsole.EnsureCAConfigmap which uses mcli.Resource(...) without a nil check. This will cause a nil pointer dereference panic. The ensureCAConfigmap function checks hostCABundlePath but not mcli.

Additional Locations (1)

Fix in Cursor Fix in Web

// Install the app
err := c.appInstallManager.Install(ctx, configValues)
// Install the app with installable charts
err = c.appInstallManager.Install(ctx, installableCharts, appConfigValues, opts.RegistrySettings, opts.HostCABundlePath)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Data race from closure capturing outer err variable

The goroutine at line 115 uses assignment err = ... instead of short declaration err := ..., causing the closure to capture and modify the err variable from the outer InstallApp function scope. This creates a data race since the outer function returns immediately after starting the goroutine (at line 131), while the goroutine continues to write to the captured err variable. This is inconsistent with the pattern used in upgrade.go (line 74) which correctly uses err := to declare a local variable within the goroutine.

Fix in Cursor Fix in Web

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be fixed

Archive: []byte("chart-archive-data"),
Values: map[string]any{"key": "value"},
},
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test creates InstallableHelmChart with nil CR field

The test data creates InstallableHelmChart structs without setting the CR field, leaving it nil. While this test uses mocks and doesn't exercise the real installHelmChart code path, the getChartDisplayName function calls chart.CR.GetName() without nil checking. If a chart with a nil CR were passed to the real install function, it would panic. The production ExtractInstallableHelmCharts always sets CR, but the test data doesn't reflect this contract.

Fix in Cursor Fix in Web

Values: map[string]kotsv1beta1.ConfigValue{},
},
configValues: types.AppConfigValues{
"key1": {Value: "value1"},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Test for empty config values uses non-empty values

The test case named "handles empty config values" was incorrectly refactored and now passes non-empty config values ("key1": {Value: "value1"}). The original test used an empty map to verify the code handles empty config values correctly. The comment at line 826 still says "Get and verify secret was created even with empty values", confirming the test intent was to test empty values. This test regression means empty config value handling is no longer tested, potentially masking bugs in that code path.

Fix in Cursor Fix in Web

func (r *NamespaceReconciler) reconcileNamespace(ctx context.Context, namespace string) error {
// Create namespace if it doesn't exist
ns := &corev1.Namespace{}
err := r.kcli.Get(ctx, client.ObjectKey{Name: namespace}, ns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you pass the context into kube client functions and cancel it could cause operations to be shutdown abruptly. it seems like the way you are using this you will need a way to gracefully shutdown the reconciler.

if err != nil {
return fmt.Errorf("start namespace reconciler: %w", err)
}
defer nsReconciler.Stop()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a race condition that will cancel the reconciler abruptly without allowing it to finish its operations.

case <-ctx.Done():
return
case <-ticker.C:
r.reconcile(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you pass a background context here to allow for graceful shutdowns?

// 2. Immediately ensures image pull secrets and other resources in all watched namespaces
// 3. Starts background polling to reconcile namespaces periodically
// Returns a cancellable namespace reconciler instance.
func runNamespaceReconciler(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you split this up into two functions, one to instantiate it and another to run it that runs in the foreground you could easily gracefully shut it down in the caller.

r.logger.WithError(err).Warn("failed to list namespaces")
return
}
namespaces = make([]string, 0, len(nsList.Items))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if a namespace is not yet created that is in the watchedNamespaces list? it will not get created with its image pull secret and config map. would it be better to make this a set and append listed namespaces? it seems like there is a different behavior whether or not there is a * here in the list and im not sure which is correct but i think we should make it consistent.

secret = &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: r.registrySettings.ImagePullSecretName,
Namespace: namespace,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be consistent in our labels so we can keep track of things we install? How about a utility function?

https://github.com/replicatedhq/embedded-cluster/pull/3272/files#diff-f4ba24a618e296ceae22063eef12b28cfe4c76729e40d615fd464a4bededefb8R123-R129

r.logger.Infof(format, args...)
}

return adminconsole.EnsureCAConfigmap(ctx, addonstypes.LogFunc(logFn), r.kcli, r.mcli, namespace, r.hostCABundlePath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couldnt you pass r.logger.Infof here directly as an argument?

}
}

func TestCalculateRegistrySettings(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a test missing for GetRegistrySettings?

Copy link
Member

@emosbaugh emosbaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants