CSPL-4577 Codex Agentic DX & Harness Foundation by vivekr-splunk · Pull Request #1738 · splunk/splunk-operator

vivekr-splunk · 2026-02-27T08:19:14Z

Summary

Establish a production-grade agentic DX foundation for SOK: AGENTS docs, skills, dev harness scripts, test generator improvements, upgrade helpers, and OSS governance + review hygiene.

Key Changes

AGENTS docs for core repo areas
8 Codex skills with scope + DoD + output contract
Shared scripts/dev harness entrypoints
Testcase generator + agent docs
Upgrade verification helpers in testenv
PR template, CODEOWNERS, and governance docs

Testing and Verification

Not run (workflow changes only).

Suggested local gates:

scripts/dev/pr_check.sh
scripts/dev/unit.sh

Related Issues

Epic: CSPL-4577
Stories: CSPL-4578, CSPL-4579, CSPL-4580, CSPL-4581, CSPL-4582, CSPL-4583

github-actions · 2026-02-27T08:19:26Z

CLA Assistant Lite bot CLA Assistant Lite bot All contributors have signed the COC ✍️ ✅

Copilot

Pull request overview

This is a comprehensive pull request that establishes a production-grade agentic development experience (DX) foundation for the Splunk Operator for Kubernetes (SOK). The PR adds extensive documentation, helper scripts, test generation tools, and governance files to support both human and agent-assisted development workflows.

Changes:

Added AGENTS.md documentation files across key repository areas (test/, pkg/splunk/enterprise/, internal/controller/, api/)
Created 8 Codex skills with detailed workflow documentation in .agents/skills/
Added shared development scripts under scripts/dev/ for common workflows (unit tests, linting, PR checks, etc.)
Enhanced test infrastructure with operator upgrade verification helpers in test/testenv/
Created Python test case generator (scripts/generate_testcase.py) with YAML/JSON spec support
Added verification scripts for CRD, RBAC, and bundle outputs
Established governance framework (CODEOWNERS, GOVERNANCE.md, CODE_OF_CONDUCT.md, SECURITY.md, SUPPORT.md, MAINTAINERS.md)
Added comprehensive agent-focused documentation under docs/agent/ (CRD_MAP.md, RECONCILE_FLOW.md, TEST_MATRIX.md, etc.)
Created PR and issue templates
Added GitHub Actions workflow for PR checks
Updated Makefile with verification targets
Removed unused code (telemetry functions, IngestorReady function) and simplified app verification logic

Reviewed changes

Copilot reviewed 65 out of 65 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
test/testenv/verificationutils.go	Added upgrade verification helpers (VerifyOperatorImage, VerifyPodImageContains, etc.); removed telemetry and IngestorReady functions; cleaned up unused imports
test/testenv/testcaseenv.go	Added UpdateOperatorImage method and GetOperatorName getter for upgrade testing
test/AGENTS.md	Documentation for test directory structure and conventions
templates/pull_request.md	PR template with standardized sections
scripts/verify_repo.sh	Main verification orchestration script with multiple verification modes
scripts/verify_crd.sh	CRD/RBAC generation verification script
scripts/verify_bundle.sh	Bundle and Helm CRD verification script
scripts/generate_testcase.py	Python tool to generate integration and KUTTL test scaffolds from YAML/JSON specs
scripts/dev/*.sh	Developer harness scripts for common workflows (unit tests, linting, PR checks, kind smoke tests, etc.)
scripts/debug_reconcile.sh	Debug data collection script for reconciliation issues
.agents/skills/*/SKILL.md	8 skill documentation files defining workflows for various development tasks
docs/agent/*.md	Agent-focused documentation (CRD mapping, reconcile flow, test patterns, release flow, operations)
pkg/splunk/enterprise/AGENTS.md	Documentation for core operator logic directory
internal/controller/AGENTS.md	Documentation for reconciler directory
api/AGENTS.md	Documentation for CRD types directory
CODEOWNERS	Code ownership assignments (with TODO for team name)
GOVERNANCE.md	Project governance model
CODE_OF_CONDUCT.md	Contributor Covenant code of conduct
SECURITY.md	Security vulnerability reporting policy
SUPPORT.md	Support channels documentation
MAINTAINERS.md	List of current maintainers
README.md	Enhanced with agent harness section
Makefile	Added verify, verify-crd, verify-bundle, and verify-repo targets
.github/workflows/pr-check.yml	GitHub Actions workflow for PR verification
.github/pull_request_template.md	Enhanced PR template with suggested local gates
docs/CONTRIBUTING.md	Updated with agentic development workflow section and cleaned maintainers list

Comments suppressed due to low confidence (1)

test/testenv/verificationutils.go:216

The function IngestorReady has been removed but it is still being called in multiple test files. The function is used in:

test/index_and_ingestion_separation/index_and_ingestion_separation_test.go (lines 114, 226, 302)

This removal will cause compilation failures. Either restore the function or update all call sites to use an alternative verification function.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T08:28:35Z

test/testenv/verificationutils.go

+func VerifyOperatorImage(ctx context.Context, testenvInstance *TestCaseEnv, expectedImage string) {
+	_ = ctx // reserved for future use


The context parameter ctx is declared but immediately discarded with _ = ctx. If the context is reserved for future use, consider removing it from the current implementation and adding it back when needed. Alternatively, if it's kept for API consistency, add a comment explaining this decision.

Copilot · 2026-02-27T08:28:36Z

scripts/generate_testcase.py

+        install_step = f\"\"\"---\napiVersion: kuttl.dev/v1beta1\nkind: TestStep\ncommands:\n  - command: {install_cmd}\n    namespaced: true\n\"\"\"\n+        ready_assert = \"\"\"---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: splunk-operator-controller-manager\nstatus:\n  readyReplicas: 1\n  availableReplicas: 1\n\"\"\"\n+        upgrade_step = f\"\"\"---\napiVersion: kuttl.dev/v1beta1\nkind: TestStep\ncommands:\n  - command: {upgrade_cmd}\n    namespaced: true\n\"\"\"\n+        image_check_cmd = (\n+            f\"kubectl -n ${{{namespace_env}}} get deploy splunk-operator-controller-manager \"\n+            f\"-o jsonpath='{{{{.spec.template.spec.containers[?(@.name==\\\\\\\"manager\\\\\\\")].image}}}}' \"\n+            f\"| grep -q \\\"${{{operator_image_new_env}}}\\\"\"\n+        )\n+        image_assert_step = f\"\"\"---\napiVersion: kuttl.dev/v1beta1\nkind: TestStep\ncommands:\n  - command: {image_check_cmd}\n    namespaced: true\n\"\"\"\n+\n+        write_text(test_dir / \"00-install.yaml\", install_step, force)\n+        write_text(test_dir / \"01-assert-operator-ready.yaml\", ready_assert, force)\n+        write_text(test_dir / \"02-upgrade.yaml\", upgrade_step, force)\n+        write_text(test_dir / \"03-assert-operator-image.yaml\", image_assert_step, force)\n+
+    for index, cr in enumerate(crs):
+        api_version = cr.get("apiVersion", "")
+        kind = cr.get("kind", "")
+        cr_name = cr.get("name", "")
+        cr_path = Path(cr.get("path", "")).expanduser()
+        if not api_version or not kind or not cr_name:
+            print("[ERROR] crs entries must include apiVersion, kind, and name")
+            sys.exit(1)
+        if not cr_path.is_absolute():
+            cr_path = (repo_root / cr_path).resolve()
+        if not cr_path.exists():
+            print(f"[ERROR] CR manifest not found: {cr_path}")
+            sys.exit(1)
+        deploy_index = index if not upgrade_enabled else index + 4
+        deploy_name = f"{deploy_index:02d}-deploy-{slugify(kind)}.yaml"
+        deploy_target = test_dir / deploy_name
+        if deploy_target.exists() and not force:


Lines 178-195 contain malformed Python string syntax. The strings appear to use escaped quotes and newlines (f\"\"\"...\n\"\"\") instead of proper triple-quoted strings (f"""..."""). This will cause a Python syntax error. Each string variable assignment (install_step, ready_assert, upgrade_step, image_check_cmd, image_assert_step) needs to use proper Python triple-quoted string syntax.

Copilot · 2026-02-27T08:28:36Z

test/testenv/verificationutils.go

+// VerifyOperatorImage verifies the operator pod is running the expected image
+func VerifyOperatorImage(ctx context.Context, testenvInstance *TestCaseEnv, expectedImage string) {
+	_ = ctx // reserved for future use
+	var ns string
+	if testenvInstance.clusterWideOperator != "true" {
+		ns = testenvInstance.GetName()
+	} else {
+		ns = "splunk-operator"
 	}
-	logf.Log.Info("GetTelemetryLastSubmissionTime: retrieved status", "status", statusVal)
+	timeout := time.Duration(SpecifiedTestTimeout) * time.Second
+	gomega.Eventually(func() bool {
+		operatorPod := GetOperatorPodName(testenvInstance)
+		if operatorPod == "" {
+			logf.Log.Info("Operator pod not found yet", "namespace", ns)
+			return false
+		}
+		return podImageContains(ns, operatorPod, expectedImage)
+	}, timeout, PollInterval).Should(gomega.Equal(true))
+}

-	var status telemetryStatus
-	if err := json.Unmarshal([]byte(statusVal), &status); err != nil {
-		logf.Log.Error(err, "GetTelemetryLastSubmissionTime: failed to unmarshal status", "statusVal", statusVal)
-		return ""
-	}
-	return status.LastTransmission
+// VerifyPodImageContains verifies the pod is running a container image that contains expectedImage
+func VerifyPodImageContains(ns string, podName string, expectedImage string) {
+	timeout := time.Duration(SpecifiedTestTimeout) * time.Second
+	gomega.Eventually(func() bool {
+		return podImageContains(ns, podName, expectedImage)
+	}, timeout, PollInterval).Should(gomega.Equal(true))
 }

-// VerifyTelemetry checks that the telemetry ConfigMap has a non-empty lastTransmission field in its status key.
-func VerifyTelemetry(ctx context.Context, deployment *Deployment, prevVal string) {
-	logf.Log.Info("VerifyTelemetry: start")
+// VerifySplunkPodImagesContain verifies all Splunk pods (excluding operator) are running expected image
+func VerifySplunkPodImagesContain(ns string, expectedImage string) {
+	timeout := time.Duration(SpecifiedTestTimeout) * time.Second
 	gomega.Eventually(func() bool {
-		currentVal := GetTelemetryLastSubmissionTime(ctx, deployment)
-		if currentVal != "" && currentVal != prevVal {
-			logf.Log.Info("VerifyTelemetry: success", "previous", prevVal, "current", currentVal)
-			return true
+		pods := DumpGetPods(ns)
+		checked := 0
+		for _, pod := range pods {
+			if !isSplunkWorkloadPod(pod) {
+				continue
+			}
+			checked++
+			if !podImageContains(ns, pod, expectedImage) {
+				return false
+			}
 		}
-		return false
-	}, deployment.GetTimeout(), PollInterval).Should(gomega.Equal(true))
+		if checked == 0 {
+			logf.Log.Info("No Splunk pods found yet", "namespace", ns)
+			return false
+		}
+		return true
+	}, timeout, PollInterval).Should(gomega.Equal(true))
 }

-// TriggerTelemetrySubmission updates or adds the 'test_submission' key in the telemetry ConfigMap with a JSON value containing a random number.
-func TriggerTelemetrySubmission(ctx context.Context, deployment *Deployment) {
-	const (
-		configMapName = "splunk-operator-manager-telemetry"
-		testKey       = "test_submission"
-	)
-
-	// Generate a random number
-	rand.Seed(time.Now().UnixNano())
-	randomNumber := rand.Intn(1000)
-
-	// Create the JSON value
-	jsonValue, err := json.Marshal(map[string]int{"value": randomNumber})
-	if err != nil {
-		logf.Log.Error(err, "Failed to marshal JSON value")
-		return
+// podImageContains checks if any container image or imageID on the pod contains expectedImage
+func podImageContains(ns string, podName string, expectedImage string) bool {
+	if podName == "" {
+		logf.Log.Info("Pod name is empty; cannot verify image", "namespace", ns)
+		return false
 	}
-
-	// Update the ConfigMap
-	cm := &corev1.ConfigMap{}
-	err = deployment.testenv.GetKubeClient().Get(ctx, client.ObjectKey{Name: configMapName, Namespace: "splunk-operator"}, cm)
+	output, err := exec.Command("kubectl", "get", "pods", "-n", ns, podName, "-o", "json").Output()
 	if err != nil {
-		logf.Log.Error(err, "Failed to get ConfigMap")
-		return
+		cmd := fmt.Sprintf("kubectl get pods -n %s %s -o json", ns, podName)
+		logf.Log.Error(err, "Failed to execute command", "command", cmd)
+		return false
 	}
-
-	// Update the test_submission key
-	cm.Data[testKey] = string(jsonValue)
-	err = deployment.testenv.GetKubeClient().Update(ctx, cm)
+	restResponse := PodDetailsStruct{}
+	err = json.Unmarshal([]byte(output), &restResponse)
 	if err != nil {
-		logf.Log.Error(err, "Failed to update ConfigMap")
-		return
+		logf.Log.Error(err, "Failed to parse pod JSON")
+		return false
+	}
+	found := false
+	images := []string{}
+	for _, status := range restResponse.Status.ContainerStatuses {
+		if status.Image != "" {
+			images = append(images, status.Image)
+		}
+		if status.ImageID != "" {
+			images = append(images, status.ImageID)
+		}
+		if strings.Contains(status.Image, expectedImage) || strings.Contains(status.ImageID, expectedImage) {
+			found = true
+		}
 	}
+	logf.Log.Info("Pod image check", "pod", podName, "expected", expectedImage, "found", found, "images", images)
+	return found
+}
+
+func isOperatorPod(podName string) bool {
+	return strings.HasPrefix(podName, "splunk-op") || strings.HasPrefix(podName, "splunk-operator")
+}

-	logf.Log.Info("Successfully updated telemetry ConfigMap", "key", testKey, "value", jsonValue)
+func isSplunkWorkloadPod(podName string) bool {
+	if podName == "" {
+		return false
+	}
+	if isOperatorPod(podName) {
+		return false
+	}
+	return strings.HasPrefix(podName, "splunk-")
 }


The functions GetTelemetryLastSubmissionTime, VerifyTelemetry, and TriggerTelemetrySubmission have been removed but they are still being called in multiple test files:

test/custom_resource_crud/custom_resource_crud_c3_test.go (lines 72, 86, 87)

test/custom_resource_crud/custom_resource_crud_m4_test.go (lines 68, 86, 87)

test/custom_resource_crud/custom_resource_crud_s1_test.go (lines 68, 76, 77)

This removal will cause compilation failures. Either restore these functions or update all call sites to remove the telemetry verification steps.

Copilot · 2026-02-27T08:28:36Z

test/testenv/testcaseenv.go

+	containerIndex := 0
+	for i, container := range operator.Spec.Template.Spec.Containers {
+		if container.Name == "manager" {
+			containerIndex = i
+			break
+		}
+	}
+	operator.Spec.Template.Spec.Containers[containerIndex].Image = image


The function assumes that if the "manager" container is not found in the loop, it defaults to containerIndex = 0. However, if there are no containers or the "manager" container doesn't exist, this could update the wrong container. Consider adding validation to ensure the "manager" container is found before proceeding with the update.

Copilot · 2026-02-27T08:28:36Z

CODEOWNERS

+# TODO: Replace this placeholder team with real owners.
+* @splunk/splunk-operator-for-kubernetes


The CODEOWNERS file contains a TODO comment indicating that the placeholder team needs to be replaced with actual owners. This should be addressed before merging to ensure proper code ownership is established.

vivekr-splunk added 6 commits February 27, 2026 08:11

CSPL-4578 Add AGENTS docs

517f0e8

CSPL-4579 Add core Codex skills

fe753f5

CSPL-4580 Add harness dev scripts

1f991c8

CSPL-4582 Add upgrade verification helpers

5c35827

CSPL-4581 Enhance harness and testcase generator

55d9c7b

CSPL-4583 Add PR template, ownership, and governance docs

49bbd49

vivekr-splunk requested a review from Copilot February 27, 2026 08:19

Copilot started reviewing on behalf of vivekr-splunk February 27, 2026 08:20 View session

vivekr-splunk force-pushed the cspl-4577 branch from 00d517a to 49bbd49 Compare February 27, 2026 08:20

Copilot AI reviewed Feb 27, 2026

View reviewed changes

vivekr-splunk added 2 commits February 27, 2026 14:12

CSPL-4581 Fix CI failures in testcase harness

64f0843

CSPL-4579 Remove biased terms from skill doc

425d160

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSPL-4577 Codex Agentic DX & Harness Foundation#1738

CSPL-4577 Codex Agentic DX & Harness Foundation#1738
vivekr-splunk wants to merge 8 commits intodevelopfrom
cspl-4577

vivekr-splunk commented Feb 27, 2026

Uh oh!

github-actions bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		func VerifyOperatorImage(ctx context.Context, testenvInstance *TestCaseEnv, expectedImage string) {
		_ = ctx // reserved for future use

		# TODO: Replace this placeholder team with real owners.
		* @splunk/splunk-operator-for-kubernetes

Conversation

vivekr-splunk commented Feb 27, 2026

Summary

Key Changes

Testing and Verification

Related Issues

Uh oh!

github-actions bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Feb 27, 2026 •

edited

Loading