CSPL-4577 Codex Agentic DX & Harness Foundation#1738
CSPL-4577 Codex Agentic DX & Harness Foundation#1738vivekr-splunk wants to merge 8 commits intodevelopfrom
Conversation
|
CLA Assistant Lite bot CLA Assistant Lite bot All contributors have signed the COC ✍️ ✅ |
00d517a to
49bbd49
Compare
There was a problem hiding this comment.
Pull request overview
This is a comprehensive pull request that establishes a production-grade agentic development experience (DX) foundation for the Splunk Operator for Kubernetes (SOK). The PR adds extensive documentation, helper scripts, test generation tools, and governance files to support both human and agent-assisted development workflows.
Changes:
- Added AGENTS.md documentation files across key repository areas (test/, pkg/splunk/enterprise/, internal/controller/, api/)
- Created 8 Codex skills with detailed workflow documentation in
.agents/skills/ - Added shared development scripts under
scripts/dev/for common workflows (unit tests, linting, PR checks, etc.) - Enhanced test infrastructure with operator upgrade verification helpers in
test/testenv/ - Created Python test case generator (
scripts/generate_testcase.py) with YAML/JSON spec support - Added verification scripts for CRD, RBAC, and bundle outputs
- Established governance framework (CODEOWNERS, GOVERNANCE.md, CODE_OF_CONDUCT.md, SECURITY.md, SUPPORT.md, MAINTAINERS.md)
- Added comprehensive agent-focused documentation under
docs/agent/(CRD_MAP.md, RECONCILE_FLOW.md, TEST_MATRIX.md, etc.) - Created PR and issue templates
- Added GitHub Actions workflow for PR checks
- Updated Makefile with verification targets
- Removed unused code (telemetry functions, IngestorReady function) and simplified app verification logic
Reviewed changes
Copilot reviewed 65 out of 65 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/testenv/verificationutils.go | Added upgrade verification helpers (VerifyOperatorImage, VerifyPodImageContains, etc.); removed telemetry and IngestorReady functions; cleaned up unused imports |
| test/testenv/testcaseenv.go | Added UpdateOperatorImage method and GetOperatorName getter for upgrade testing |
| test/AGENTS.md | Documentation for test directory structure and conventions |
| templates/pull_request.md | PR template with standardized sections |
| scripts/verify_repo.sh | Main verification orchestration script with multiple verification modes |
| scripts/verify_crd.sh | CRD/RBAC generation verification script |
| scripts/verify_bundle.sh | Bundle and Helm CRD verification script |
| scripts/generate_testcase.py | Python tool to generate integration and KUTTL test scaffolds from YAML/JSON specs |
| scripts/dev/*.sh | Developer harness scripts for common workflows (unit tests, linting, PR checks, kind smoke tests, etc.) |
| scripts/debug_reconcile.sh | Debug data collection script for reconciliation issues |
| .agents/skills/*/SKILL.md | 8 skill documentation files defining workflows for various development tasks |
| docs/agent/*.md | Agent-focused documentation (CRD mapping, reconcile flow, test patterns, release flow, operations) |
| pkg/splunk/enterprise/AGENTS.md | Documentation for core operator logic directory |
| internal/controller/AGENTS.md | Documentation for reconciler directory |
| api/AGENTS.md | Documentation for CRD types directory |
| CODEOWNERS | Code ownership assignments (with TODO for team name) |
| GOVERNANCE.md | Project governance model |
| CODE_OF_CONDUCT.md | Contributor Covenant code of conduct |
| SECURITY.md | Security vulnerability reporting policy |
| SUPPORT.md | Support channels documentation |
| MAINTAINERS.md | List of current maintainers |
| README.md | Enhanced with agent harness section |
| Makefile | Added verify, verify-crd, verify-bundle, and verify-repo targets |
| .github/workflows/pr-check.yml | GitHub Actions workflow for PR verification |
| .github/pull_request_template.md | Enhanced PR template with suggested local gates |
| docs/CONTRIBUTING.md | Updated with agentic development workflow section and cleaned maintainers list |
Comments suppressed due to low confidence (1)
test/testenv/verificationutils.go:216
- The function
IngestorReadyhas been removed but it is still being called in multiple test files. The function is used in:
- test/index_and_ingestion_separation/index_and_ingestion_separation_test.go (lines 114, 226, 302)
This removal will cause compilation failures. Either restore the function or update all call sites to use an alternative verification function.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func VerifyOperatorImage(ctx context.Context, testenvInstance *TestCaseEnv, expectedImage string) { | ||
| _ = ctx // reserved for future use |
There was a problem hiding this comment.
The context parameter ctx is declared but immediately discarded with _ = ctx. If the context is reserved for future use, consider removing it from the current implementation and adding it back when needed. Alternatively, if it's kept for API consistency, add a comment explaining this decision.
| install_step = f\"\"\"---\napiVersion: kuttl.dev/v1beta1\nkind: TestStep\ncommands:\n - command: {install_cmd}\n namespaced: true\n\"\"\"\n+ ready_assert = \"\"\"---\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: splunk-operator-controller-manager\nstatus:\n readyReplicas: 1\n availableReplicas: 1\n\"\"\"\n+ upgrade_step = f\"\"\"---\napiVersion: kuttl.dev/v1beta1\nkind: TestStep\ncommands:\n - command: {upgrade_cmd}\n namespaced: true\n\"\"\"\n+ image_check_cmd = (\n+ f\"kubectl -n ${{{namespace_env}}} get deploy splunk-operator-controller-manager \"\n+ f\"-o jsonpath='{{{{.spec.template.spec.containers[?(@.name==\\\\\\\"manager\\\\\\\")].image}}}}' \"\n+ f\"| grep -q \\\"${{{operator_image_new_env}}}\\\"\"\n+ )\n+ image_assert_step = f\"\"\"---\napiVersion: kuttl.dev/v1beta1\nkind: TestStep\ncommands:\n - command: {image_check_cmd}\n namespaced: true\n\"\"\"\n+\n+ write_text(test_dir / \"00-install.yaml\", install_step, force)\n+ write_text(test_dir / \"01-assert-operator-ready.yaml\", ready_assert, force)\n+ write_text(test_dir / \"02-upgrade.yaml\", upgrade_step, force)\n+ write_text(test_dir / \"03-assert-operator-image.yaml\", image_assert_step, force)\n+ | ||
| for index, cr in enumerate(crs): | ||
| api_version = cr.get("apiVersion", "") | ||
| kind = cr.get("kind", "") | ||
| cr_name = cr.get("name", "") | ||
| cr_path = Path(cr.get("path", "")).expanduser() | ||
| if not api_version or not kind or not cr_name: | ||
| print("[ERROR] crs entries must include apiVersion, kind, and name") | ||
| sys.exit(1) | ||
| if not cr_path.is_absolute(): | ||
| cr_path = (repo_root / cr_path).resolve() | ||
| if not cr_path.exists(): | ||
| print(f"[ERROR] CR manifest not found: {cr_path}") | ||
| sys.exit(1) | ||
| deploy_index = index if not upgrade_enabled else index + 4 | ||
| deploy_name = f"{deploy_index:02d}-deploy-{slugify(kind)}.yaml" | ||
| deploy_target = test_dir / deploy_name | ||
| if deploy_target.exists() and not force: |
There was a problem hiding this comment.
Lines 178-195 contain malformed Python string syntax. The strings appear to use escaped quotes and newlines (f\"\"\"...\n\"\"\") instead of proper triple-quoted strings (f"""..."""). This will cause a Python syntax error. Each string variable assignment (install_step, ready_assert, upgrade_step, image_check_cmd, image_assert_step) needs to use proper Python triple-quoted string syntax.
| // VerifyOperatorImage verifies the operator pod is running the expected image | ||
| func VerifyOperatorImage(ctx context.Context, testenvInstance *TestCaseEnv, expectedImage string) { | ||
| _ = ctx // reserved for future use | ||
| var ns string | ||
| if testenvInstance.clusterWideOperator != "true" { | ||
| ns = testenvInstance.GetName() | ||
| } else { | ||
| ns = "splunk-operator" | ||
| } | ||
| logf.Log.Info("GetTelemetryLastSubmissionTime: retrieved status", "status", statusVal) | ||
| timeout := time.Duration(SpecifiedTestTimeout) * time.Second | ||
| gomega.Eventually(func() bool { | ||
| operatorPod := GetOperatorPodName(testenvInstance) | ||
| if operatorPod == "" { | ||
| logf.Log.Info("Operator pod not found yet", "namespace", ns) | ||
| return false | ||
| } | ||
| return podImageContains(ns, operatorPod, expectedImage) | ||
| }, timeout, PollInterval).Should(gomega.Equal(true)) | ||
| } | ||
|
|
||
| var status telemetryStatus | ||
| if err := json.Unmarshal([]byte(statusVal), &status); err != nil { | ||
| logf.Log.Error(err, "GetTelemetryLastSubmissionTime: failed to unmarshal status", "statusVal", statusVal) | ||
| return "" | ||
| } | ||
| return status.LastTransmission | ||
| // VerifyPodImageContains verifies the pod is running a container image that contains expectedImage | ||
| func VerifyPodImageContains(ns string, podName string, expectedImage string) { | ||
| timeout := time.Duration(SpecifiedTestTimeout) * time.Second | ||
| gomega.Eventually(func() bool { | ||
| return podImageContains(ns, podName, expectedImage) | ||
| }, timeout, PollInterval).Should(gomega.Equal(true)) | ||
| } | ||
|
|
||
| // VerifyTelemetry checks that the telemetry ConfigMap has a non-empty lastTransmission field in its status key. | ||
| func VerifyTelemetry(ctx context.Context, deployment *Deployment, prevVal string) { | ||
| logf.Log.Info("VerifyTelemetry: start") | ||
| // VerifySplunkPodImagesContain verifies all Splunk pods (excluding operator) are running expected image | ||
| func VerifySplunkPodImagesContain(ns string, expectedImage string) { | ||
| timeout := time.Duration(SpecifiedTestTimeout) * time.Second | ||
| gomega.Eventually(func() bool { | ||
| currentVal := GetTelemetryLastSubmissionTime(ctx, deployment) | ||
| if currentVal != "" && currentVal != prevVal { | ||
| logf.Log.Info("VerifyTelemetry: success", "previous", prevVal, "current", currentVal) | ||
| return true | ||
| pods := DumpGetPods(ns) | ||
| checked := 0 | ||
| for _, pod := range pods { | ||
| if !isSplunkWorkloadPod(pod) { | ||
| continue | ||
| } | ||
| checked++ | ||
| if !podImageContains(ns, pod, expectedImage) { | ||
| return false | ||
| } | ||
| } | ||
| return false | ||
| }, deployment.GetTimeout(), PollInterval).Should(gomega.Equal(true)) | ||
| if checked == 0 { | ||
| logf.Log.Info("No Splunk pods found yet", "namespace", ns) | ||
| return false | ||
| } | ||
| return true | ||
| }, timeout, PollInterval).Should(gomega.Equal(true)) | ||
| } | ||
|
|
||
| // TriggerTelemetrySubmission updates or adds the 'test_submission' key in the telemetry ConfigMap with a JSON value containing a random number. | ||
| func TriggerTelemetrySubmission(ctx context.Context, deployment *Deployment) { | ||
| const ( | ||
| configMapName = "splunk-operator-manager-telemetry" | ||
| testKey = "test_submission" | ||
| ) | ||
|
|
||
| // Generate a random number | ||
| rand.Seed(time.Now().UnixNano()) | ||
| randomNumber := rand.Intn(1000) | ||
|
|
||
| // Create the JSON value | ||
| jsonValue, err := json.Marshal(map[string]int{"value": randomNumber}) | ||
| if err != nil { | ||
| logf.Log.Error(err, "Failed to marshal JSON value") | ||
| return | ||
| // podImageContains checks if any container image or imageID on the pod contains expectedImage | ||
| func podImageContains(ns string, podName string, expectedImage string) bool { | ||
| if podName == "" { | ||
| logf.Log.Info("Pod name is empty; cannot verify image", "namespace", ns) | ||
| return false | ||
| } | ||
|
|
||
| // Update the ConfigMap | ||
| cm := &corev1.ConfigMap{} | ||
| err = deployment.testenv.GetKubeClient().Get(ctx, client.ObjectKey{Name: configMapName, Namespace: "splunk-operator"}, cm) | ||
| output, err := exec.Command("kubectl", "get", "pods", "-n", ns, podName, "-o", "json").Output() | ||
| if err != nil { | ||
| logf.Log.Error(err, "Failed to get ConfigMap") | ||
| return | ||
| cmd := fmt.Sprintf("kubectl get pods -n %s %s -o json", ns, podName) | ||
| logf.Log.Error(err, "Failed to execute command", "command", cmd) | ||
| return false | ||
| } | ||
|
|
||
| // Update the test_submission key | ||
| cm.Data[testKey] = string(jsonValue) | ||
| err = deployment.testenv.GetKubeClient().Update(ctx, cm) | ||
| restResponse := PodDetailsStruct{} | ||
| err = json.Unmarshal([]byte(output), &restResponse) | ||
| if err != nil { | ||
| logf.Log.Error(err, "Failed to update ConfigMap") | ||
| return | ||
| logf.Log.Error(err, "Failed to parse pod JSON") | ||
| return false | ||
| } | ||
| found := false | ||
| images := []string{} | ||
| for _, status := range restResponse.Status.ContainerStatuses { | ||
| if status.Image != "" { | ||
| images = append(images, status.Image) | ||
| } | ||
| if status.ImageID != "" { | ||
| images = append(images, status.ImageID) | ||
| } | ||
| if strings.Contains(status.Image, expectedImage) || strings.Contains(status.ImageID, expectedImage) { | ||
| found = true | ||
| } | ||
| } | ||
| logf.Log.Info("Pod image check", "pod", podName, "expected", expectedImage, "found", found, "images", images) | ||
| return found | ||
| } | ||
|
|
||
| func isOperatorPod(podName string) bool { | ||
| return strings.HasPrefix(podName, "splunk-op") || strings.HasPrefix(podName, "splunk-operator") | ||
| } | ||
|
|
||
| logf.Log.Info("Successfully updated telemetry ConfigMap", "key", testKey, "value", jsonValue) | ||
| func isSplunkWorkloadPod(podName string) bool { | ||
| if podName == "" { | ||
| return false | ||
| } | ||
| if isOperatorPod(podName) { | ||
| return false | ||
| } | ||
| return strings.HasPrefix(podName, "splunk-") | ||
| } |
There was a problem hiding this comment.
The functions GetTelemetryLastSubmissionTime, VerifyTelemetry, and TriggerTelemetrySubmission have been removed but they are still being called in multiple test files:
- test/custom_resource_crud/custom_resource_crud_c3_test.go (lines 72, 86, 87)
- test/custom_resource_crud/custom_resource_crud_m4_test.go (lines 68, 86, 87)
- test/custom_resource_crud/custom_resource_crud_s1_test.go (lines 68, 76, 77)
This removal will cause compilation failures. Either restore these functions or update all call sites to remove the telemetry verification steps.
| containerIndex := 0 | ||
| for i, container := range operator.Spec.Template.Spec.Containers { | ||
| if container.Name == "manager" { | ||
| containerIndex = i | ||
| break | ||
| } | ||
| } | ||
| operator.Spec.Template.Spec.Containers[containerIndex].Image = image |
There was a problem hiding this comment.
The function assumes that if the "manager" container is not found in the loop, it defaults to containerIndex = 0. However, if there are no containers or the "manager" container doesn't exist, this could update the wrong container. Consider adding validation to ensure the "manager" container is found before proceeding with the update.
| # TODO: Replace this placeholder team with real owners. | ||
| * @splunk/splunk-operator-for-kubernetes |
There was a problem hiding this comment.
The CODEOWNERS file contains a TODO comment indicating that the placeholder team needs to be replaced with actual owners. This should be addressed before merging to ensure proper code ownership is established.
Summary
Establish a production-grade agentic DX foundation for SOK: AGENTS docs, skills, dev harness scripts, test generator improvements, upgrade helpers, and OSS governance + review hygiene.
Key Changes
Testing and Verification
Not run (workflow changes only).
Suggested local gates:
Related Issues