VMDistributed CR: orchestrate VMCluster upgrades#1556
VMDistributed CR: orchestrate VMCluster upgrades#1556AndrewChubatiuk merged 297 commits intoVictoriaMetrics:masterfrom
Conversation
|
initially thought distributed CR is needed for full distributed setup management, but looks like it only performs version upgrade. In this case just curious why we need different CRs for VM, VT and VL? |
eaeacd4 to
c02b24c
Compare
|
Yes, so far we're focusing on upgrades - existing CRs provide sufficient flexibility IMO - and we didn't get a request for other actions so far.
VL and VT don't have agents (yet) so their specs would be different. However we can reuse the same approach and probably even some helper functions |
Haleygo
left a comment
There was a problem hiding this comment.
Yes, so far we're focusing on upgrades - existing CRs provide sufficient flexibility IMO - and we didn't get a request for other actions so far.
I believe users would expect to modify the vmcluster spec value or apply extra flags to the vmclusters.
And since vmclusterSpec.ClusterVersion is optional, users could specify component versions inside vmclusterSpec which overrides the vmclusterSpec.ClusterVersion.
And currently, it seems VMDistributedCluster only covers a limited scenario where resources like vmcluster, vmuser, vmauth are defined and configured as needed.
Could you please provide an example of how to config them to achieve similar topology described in victoria-metrics-distributed chart? I expect VMDistributedCluster to be supported there when released.
Yup, setting generic |
280b2e6 to
04b44f9
Compare
04b44f9 to
1336f73
Compare
1336f73 to
c3b3e24
Compare
a5c6693 to
43e7344
Compare
4efee35 to
5a92268
Compare
Add config/examples/vmdistributed-with-label-selector.yaml to demonstrate how to use labelSelector in VMDistributed spec to target existing VMAgent.
There was a problem hiding this comment.
1 issue found across 7 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="config/examples/vmdistributed-overrides-version.yaml">
<violation number="1" location="config/examples/vmdistributed-overrides-version.yaml:53">
P2: This example references `vmcluster-$ZONE`, which doesn’t exist in the manifest and won’t resolve to a real VMCluster name. Since the example is meant to be copy‑paste runnable, use explicit names (or keep per‑zone refs) instead of an unresolved placeholder.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.) [FEEDBACK_USED]</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
@cubic-dev-ai review this PR |
@AndrewChubatiuk I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
7 issues found across 73 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/controller/operator/factory/finalize/vmdistributed.go">
<violation number="1" location="internal/controller/operator/factory/finalize/vmdistributed.go:18">
P2: The function comment is misleading. This function **preserves** referenced VMClusters by removing owner references (disowning them), not "removes all objects". Someone reading this comment would expect the opposite behavior.
(Based on your team's feedback about documenting exported structs and public methods with accurate descriptions.) [FEEDBACK_USED]</violation>
</file>
<file name="api/operator/v1alpha1/vmdistributed_types.go">
<violation number="1" location="api/operator/v1alpha1/vmdistributed_types.go:55">
P2: Inconsistent json tag: `CommonZone` is marked `+optional` but lacks `omitempty` in the json tag, unlike all other optional fields in this struct. This causes the empty struct to always be serialized.</violation>
<violation number="2" location="api/operator/v1alpha1/vmdistributed_types.go:443">
P2: The `Validate` method doesn't check for duplicate zone names. Since zone names can be used for `%ZONE%` substitution in commonZone spec, duplicates could cause ambiguous behavior. Consider adding a uniqueness check.</violation>
</file>
<file name="config/examples/vmdistributed-with-label-selector.yaml">
<violation number="1" location="config/examples/vmdistributed-with-label-selector.yaml:6">
P2: This example isn’t self-contained: it references a VMAgent (via labelSelector) and a VMAuth (by name) that aren’t defined in the manifest. The team’s docs feedback requires examples to be complete and runnable as copy‑paste demos. Please add the referenced resources (or inline them) so the example can be applied without extra missing objects.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.) [FEEDBACK_USED]</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmcluster.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmcluster.go:88">
P2: Comment is misleading - this function waits for a configurable status, not specifically `UpdateStatusOperational`. Update the comment to accurately reflect the generic behavior.
(Based on your team's feedback about adding accurate comments when logic is non-obvious.) [FEEDBACK_USED]</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmdistributed.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmdistributed.go:241">
P2: Using environment variable to conditionally ignore errors is risky. If `E2E_TEST=true` is accidentally set in production, critical errors would be silently ignored. Consider using build tags, a dedicated test mode configuration, or dependency injection for testability instead.
(Based on your team's feedback about adding comments when logic is non-obvious.) [FEEDBACK_USED]</violation>
</file>
<file name="config/examples/vmdistributed-vmagent.yaml">
<violation number="1" location="config/examples/vmdistributed-vmagent.yaml:4">
P2: The example defines VMClusters separately but references them in VMDistributed using `name`, which the API reserves for creating new clusters. Use `ref` to point to the existing VMCluster resources (or drop the standalone VMCluster manifests) so the example is consistent and runnable.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| metadata: | ||
| name: with-label-selector | ||
| spec: | ||
| vmagent: |
There was a problem hiding this comment.
P2: This example isn’t self-contained: it references a VMAgent (via labelSelector) and a VMAuth (by name) that aren’t defined in the manifest. The team’s docs feedback requires examples to be complete and runnable as copy‑paste demos. Please add the referenced resources (or inline them) so the example can be applied without extra missing objects.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At config/examples/vmdistributed-with-label-selector.yaml, line 6:
<comment>This example isn’t self-contained: it references a VMAgent (via labelSelector) and a VMAuth (by name) that aren’t defined in the manifest. The team’s docs feedback requires examples to be complete and runnable as copy‑paste demos. Please add the referenced resources (or inline them) so the example can be applied without extra missing objects.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.) </comment>
<file context>
@@ -0,0 +1,25 @@
+metadata:
+ name: with-label-selector
+spec:
+ vmagent:
+ labelSelector:
+ matchLabels:
</file context>
|
@cubic-dev-ai review this PR |
@AndrewChubatiuk I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
3 issues found across 74 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="test/e2e/vmdistributed_test.go">
<violation number="1" location="test/e2e/vmdistributed_test.go:331">
P2: Duplicate Eventually block: This check for VMDistributed becoming operational is identical to the one at lines 281-284 that already succeeded. This redundant code adds unnecessary test execution time and appears to be a copy-paste error or incomplete refactoring.</violation>
</file>
<file name="internal/controller/operator/factory/finalize/vmdistributed.go">
<violation number="1" location="internal/controller/operator/factory/finalize/vmdistributed.go:23">
P1: Potential nil pointer dereference: `zone.VMCluster` can be nil (it's a pointer type with `omitempty`), but the code accesses `zone.VMCluster.Ref` without a nil check. This will panic when a zone has no VMCluster defined. Use the existing `isRefSet()` helper method which safely handles nil receivers.</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmagent.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmagent.go:340">
P1: New URLs appended during VMAgent update don't preserve auth config from `writeSpecMap`. When a new cluster is added to an existing VMAgent, any authentication configuration (BasicAuth, OAuth2, TLS, etc.) specified in the CR spec for that URL will be lost. This is inconsistent with the creation case (lines 210-218) and existing URL handling (lines 323-333), both of which check `writeSpecMap`.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
2 issues found across 6 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/controller/operator/factory/vmdistributed/vmagent.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmagent.go:242">
P1: Bug: Spec comparison always returns equal because it compares after assignment. The line `vmagentObj.Spec = vmagentSpec` assigns the new spec before this comparison, so `vmagentObj.Spec` IS `vmagentSpec` - they will always be DeepEqual. Spec changes to existing VMAgents will never be applied.
Store the original spec before assignment and compare against that, or move the comparison before the assignment.</violation>
</file>
<file name="api/operator/v1alpha1/vmdistributed_types.go">
<violation number="1" location="api/operator/v1alpha1/vmdistributed_types.go:76">
P2: Missing `omitempty` in JSON tag for optional field. The `RemoteWrite` field is marked `+optional` but lacks `omitempty`, causing nil values to serialize as `null` instead of being omitted. This is inconsistent with other optional pointer fields in this file.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
|
@cubic-dev-ai review this PR |
@AndrewChubatiuk I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
4 issues found across 74 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="test/e2e/vmdistributed_test.go">
<violation number="1" location="test/e2e/vmdistributed_test.go:1263">
P2: Inconsistent pattern for checking NotFound errors. Use the same `MatchError(k8serrors.IsNotFound, "IsNotFound")` pattern used elsewhere in this file (lines 56, 1250, 1255, 1270, etc.) for consistency and cleaner code.</violation>
</file>
<file name="config/examples/vmdistributed-vmagent.yaml">
<violation number="1" location="config/examples/vmdistributed-vmagent.yaml:35">
P2: The example manifest references VMClusters that are not defined anywhere in the file, so it is not runnable as a copy‑paste demo. Include sample VMCluster resources (or inline specs under each zone) so the example is self‑contained.</violation>
</file>
<file name="test/e2e/suite/suite.go">
<violation number="1" location="test/e2e/suite/suite.go:160">
P3: Remove the duplicate "VMAUTHDEFAULT" entry in resourceEnvsPrefixes to avoid redundant env var setup and clarify the intended list.</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmagent.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmagent.go:203">
P1: Index mismatch: `vmClusters` is sorted by `observedGeneration` in `fetchVMClusters`, but this loop assumes indices correspond to `cr.Spec.Zones` order. This causes URLs to be assigned to wrong remoteWrite configurations, potentially routing writes to incorrect clusters.
Consider either: (1) not sorting vmClusters in `fetchVMClusters`, (2) maintaining a mapping between zone index and cluster, or (3) matching clusters to remoteWrites by a stable identifier rather than index.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| return nil, fmt.Errorf("failed to unmarshal.spec.zones[*].remoteWrite of VMDistributed=%s/%s: %w", cr.Name, cr.Namespace, err) | ||
| } | ||
|
|
||
| for i := range vmClusters { |
There was a problem hiding this comment.
P1: Index mismatch: vmClusters is sorted by observedGeneration in fetchVMClusters, but this loop assumes indices correspond to cr.Spec.Zones order. This causes URLs to be assigned to wrong remoteWrite configurations, potentially routing writes to incorrect clusters.
Consider either: (1) not sorting vmClusters in fetchVMClusters, (2) maintaining a mapping between zone index and cluster, or (3) matching clusters to remoteWrites by a stable identifier rather than index.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/controller/operator/factory/vmdistributed/vmagent.go, line 203:
<comment>Index mismatch: `vmClusters` is sorted by `observedGeneration` in `fetchVMClusters`, but this loop assumes indices correspond to `cr.Spec.Zones` order. This causes URLs to be assigned to wrong remoteWrite configurations, potentially routing writes to incorrect clusters.
Consider either: (1) not sorting vmClusters in `fetchVMClusters`, (2) maintaining a mapping between zone index and cluster, or (3) matching clusters to remoteWrites by a stable identifier rather than index.</comment>
<file context>
@@ -0,0 +1,274 @@
+ return nil, fmt.Errorf("failed to unmarshal.spec.zones[*].remoteWrite of VMDistributed=%s/%s: %w", cr.Name, cr.Namespace, err)
+ }
+
+ for i := range vmClusters {
+ vmagentSpec.RemoteWrite[i].URL = remoteWriteURL(vmClusters[i])
+ }
</file context>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="test/e2e/vmdistributed_test.go">
<violation number="1" location="test/e2e/vmdistributed_test.go:1375">
P2: The global cleanup flags are set to true but never reset, so later tests will incorrectly expect NotFound on VMAgent/VMAuth deletion and fail. Reset them after this test to avoid cross-test pollution.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
The `expectedVMAgentToBeRemoved` and `expectedVMAuthToBeRemoved` flags were not being reset after the `VMDistributed` deletion test. This commit adds a `DeferCleanup` function to ensure these flags are reset to `false` after the test, preventing potential interference with subsequent tests.
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="test/e2e/vmdistributed_test.go">
<violation number="1" location="test/e2e/vmdistributed_test.go:124">
P3: The new comment in createVMAgent refers to VMAuth, but the code resets expectedVMAgentToBeRemoved. Update the comment to match the VMAgent flag being reset.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| }, | ||
| }, | ||
| } | ||
| // Reset expectedVMAuthToBeRemoved so that it would not leak into other tests |
There was a problem hiding this comment.
P3: The new comment in createVMAgent refers to VMAuth, but the code resets expectedVMAgentToBeRemoved. Update the comment to match the VMAgent flag being reset.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/e2e/vmdistributed_test.go, line 124:
<comment>The new comment in createVMAgent refers to VMAuth, but the code resets expectedVMAgentToBeRemoved. Update the comment to match the VMAgent flag being reset.</comment>
<file context>
@@ -121,6 +121,10 @@ func createVMAuth(ctx context.Context, k8sClient client.Client, name, namespace
},
},
}
+ // Reset expectedVMAuthToBeRemoved so that it would not leak into other tests
+ DeferCleanup(func() {
+ expectedVMAuthToBeRemoved = false
</file context>
|
@cubic-dev-ai review this PR |
@AndrewChubatiuk I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
7 issues found across 74 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="internal/controller/operator/factory/vmdistributed/vmagent.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmagent.go:229">
P2: Sorting logic for new remote write URLs defaults to index 0, causing unstable ordering. When a new URL is added that doesn't exist in the current VMAgent spec, the map lookup returns 0, making new entries sort to the beginning with undefined relative order among themselves. Consider using `ok` idiom to detect missing keys and assign them indices after existing entries.</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/util_test.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/util_test.go:31">
P3: Swap assert.Equal argument order so failure messages correctly show expected vs. actual values.</violation>
</file>
<file name="internal/controller/operator/vmdistributed_controller_test.go">
<violation number="1" location="internal/controller/operator/vmdistributed_controller_test.go:46">
P2: Handle non-NotFound errors from the Get call; otherwise test failures (e.g., RBAC/connection errors) are silently ignored and the test can pass with an invalid state.</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmdistributed.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmdistributed.go:248">
P1: Logic bug: when there's only one VMCluster, reads are disabled before update (line 164-175) but never re-enabled because this condition requires `len(vmClusters) > 1`. Single-cluster deployments would have reads permanently disabled after updates.</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmcluster.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmcluster.go:58">
P2: Silently continuing to poll when VMCluster is deleted may hide real issues. If the cluster is not found during the wait, consider returning an immediate error rather than waiting until timeout, which would provide clearer feedback that the cluster was unexpectedly deleted.</violation>
</file>
<file name="docs/resources/vmdistributed.md">
<violation number="1" location="docs/resources/vmdistributed.md:105">
P2: This example references pre-existing VMCluster objects but doesn’t define them or note that they must already exist, so it isn’t runnable as a copy‑paste demo. Consider adding the VMCluster manifests (or an explicit note) alongside the example.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.) [FEEDBACK_USED]</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmcluster_test.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmcluster_test.go:178">
P3: These assertions assume a deterministic order even though the sort is unstable when ObservedGeneration values are equal (both are 0 here), which can make the test flaky. Prefer order-insensitive assertions or explicitly pick the inline cluster before asserting its spec.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| got, err := fetchVMClusters(context.Background(), rclient, cr) | ||
| assert.NoError(t, err) | ||
| assert.Len(t, got, 2) | ||
| assert.Equal(t, "ref", got[0].Name) |
There was a problem hiding this comment.
P3: These assertions assume a deterministic order even though the sort is unstable when ObservedGeneration values are equal (both are 0 here), which can make the test flaky. Prefer order-insensitive assertions or explicitly pick the inline cluster before asserting its spec.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/controller/operator/factory/vmdistributed/vmcluster_test.go, line 178:
<comment>These assertions assume a deterministic order even though the sort is unstable when ObservedGeneration values are equal (both are 0 here), which can make the test flaky. Prefer order-insensitive assertions or explicitly pick the inline cluster before asserting its spec.</comment>
<file context>
@@ -0,0 +1,234 @@
+ got, err := fetchVMClusters(context.Background(), rclient, cr)
+ assert.NoError(t, err)
+ assert.Len(t, got, 2)
+ assert.Equal(t, "ref", got[0].Name)
+ assert.Equal(t, "inline", got[1].Name)
+ assert.Equal(t, inlineSpec.ClusterVersion, got[1].Spec.ClusterVersion)
</file context>
|
@cubic-dev-ai review this PR |
@AndrewChubatiuk I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
6 issues found across 74 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="test/e2e/vmdistributed_test.go">
<violation number="1" location="test/e2e/vmdistributed_test.go:124">
P2: Misleading comment: says `expectedVMAuthToBeRemoved` but code resets `expectedVMAgentToBeRemoved`. This copy-paste error could confuse maintainers.</violation>
</file>
<file name="config/examples/vmdistributed-with-label-selector.yaml">
<violation number="1" location="config/examples/vmdistributed-with-label-selector.yaml:7">
P2: This example isn’t runnable as-is: it selects an existing VMAgent via labelSelector and references a VMAuth by name, but the file only defines the VMDistributed resource. Please include the referenced VMAgent/VMAuth manifests (or inline them in the example) so users can apply the example without having to guess missing resources.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.) [FEEDBACK_USED]</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmdistributed.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmdistributed.go:241">
P2: Test-specific logic (`E2E_TEST` env check) embedded in production code. This silently ignores errors during E2E tests, which could mask real issues. Consider instead making the VMAgent metrics check configurable via the CR spec or using a skip annotation, rather than an environment variable.</violation>
</file>
<file name="internal/controller/operator/factory/vmdistributed/vmcluster.go">
<violation number="1" location="internal/controller/operator/factory/vmdistributed/vmcluster.go:77">
P2: Non-stable sort may cause non-deterministic VMCluster ordering. When multiple VMClusters have the same `ObservedGeneration`, their relative order will be undefined across reconciliations. Consider using `sort.SliceStable` or adding a secondary sort key (e.g., by name) to ensure consistent ordering for rolling updates.</violation>
</file>
<file name="docs/resources/vmdistributed.md">
<violation number="1" location="docs/resources/vmdistributed.md:52">
P2: The “inline VMCluster specifications” example isn’t runnable because vmagent/vmauth are only named (no inline spec or accompanying manifests). Add inline specs or define the referenced VMAgent/VMAuth resources so users can apply the example as-is.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.) [FEEDBACK_USED]</violation>
<violation number="2" location="docs/resources/vmdistributed.md:60">
P2: The VMCluster example uses a non-existent `spec` subfield under `vmstorage/vmselect/vminsert`, so it won’t validate. Remove the extra `spec` nesting to match the actual VMCluster schema.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| }, | ||
| }, | ||
| } | ||
| // Reset expectedVMAuthToBeRemoved so that it would not leak into other tests |
There was a problem hiding this comment.
P2: Misleading comment: says expectedVMAuthToBeRemoved but code resets expectedVMAgentToBeRemoved. This copy-paste error could confuse maintainers.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At test/e2e/vmdistributed_test.go, line 124:
<comment>Misleading comment: says `expectedVMAuthToBeRemoved` but code resets `expectedVMAgentToBeRemoved`. This copy-paste error could confuse maintainers.</comment>
<file context>
@@ -0,0 +1,1391 @@
+ },
+ },
+ }
+ // Reset expectedVMAuthToBeRemoved so that it would not leak into other tests
+ DeferCleanup(func() {
+ expectedVMAuthToBeRemoved = false
</file context>
| for _, vmAgentObj := range vmAgentObjs { | ||
| if err := waitForVMClusterVMAgentMetrics(ctx, httpClient, vmAgentObj, vmAgentFlushDeadlineDeadline, defaultVMAgentCheckInterval, rclient); err != nil { | ||
| // Ignore this error when running e2e tests - vmagent pods will be unreachable from outside of the cluster | ||
| if os.Getenv("E2E_TEST") == "true" { |
There was a problem hiding this comment.
P2: Test-specific logic (E2E_TEST env check) embedded in production code. This silently ignores errors during E2E tests, which could mask real issues. Consider instead making the VMAgent metrics check configurable via the CR spec or using a skip annotation, rather than an environment variable.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At internal/controller/operator/factory/vmdistributed/vmdistributed.go, line 241:
<comment>Test-specific logic (`E2E_TEST` env check) embedded in production code. This silently ignores errors during E2E tests, which could mask real issues. Consider instead making the VMAgent metrics check configurable via the CR spec or using a skip annotation, rather than an environment variable.</comment>
<file context>
@@ -0,0 +1,280 @@
+ for _, vmAgentObj := range vmAgentObjs {
+ if err := waitForVMClusterVMAgentMetrics(ctx, httpClient, vmAgentObj, vmAgentFlushDeadlineDeadline, defaultVMAgentCheckInterval, rclient); err != nil {
+ // Ignore this error when running e2e tests - vmagent pods will be unreachable from outside of the cluster
+ if os.Getenv("E2E_TEST") == "true" {
+ continue
+ }
</file context>
| metadata: | ||
| name: my-distributed-cluster | ||
| spec: | ||
| vmagent: |
There was a problem hiding this comment.
P2: The “inline VMCluster specifications” example isn’t runnable because vmagent/vmauth are only named (no inline spec or accompanying manifests). Add inline specs or define the referenced VMAgent/VMAuth resources so users can apply the example as-is.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.)
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/resources/vmdistributed.md, line 52:
<comment>The “inline VMCluster specifications” example isn’t runnable because vmagent/vmauth are only named (no inline spec or accompanying manifests). Add inline specs or define the referenced VMAgent/VMAuth resources so users can apply the example as-is.
(Based on your team's feedback about ensuring documentation examples are complete and runnable.) </comment>
<file context>
@@ -0,0 +1,195 @@
+metadata:
+ name: my-distributed-cluster
+spec:
+ vmagent:
+ name: my-distributed-vmagent
+ vmauth:
</file context>
Add a new CR -
VMDistributed- so that multiple VMClusters can be upgraded in an orchestrated fashion, ensuring the read VMAuth is disabled before upgrade and the VMAgent (if available) doesn't have pending bytes to send.Fixes #1515
This CR can refer to VMClusters using one of two possible ways:
refproperty and changes applied usingspecnameandspecpropertiesEither way, settings in VMDistributed would be applied to target VMClusters, overriding their existing settings if necessary.
Current implementation scope:
See #1515 (comment) for agreed limitations for v1alpha1 version:
TODO:
Keeping original commits for review as its useful to show how the feature was developed