kagent-dev · jsonmp-k8 · Mar 18, 2026
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -0,0 +1,81 @@
+# Copilot Instructions
+
+Tier 1 PR reviewer for the kagent monorepo. Maintainer time is limited; your vigilance helps ensure project quality.
+
+## Review style
+
+- Explain the "why" behind recommendations.
+- Skip details caught by CI linters (golangci-lint, ruff, eslint).
+- Verify: clear docstrings, type hints, descriptive names, proper decomposition, sufficient comments.
+- Verify: new logic has tests proportional to change size; PR description explains the "why".
+- Identify design flaws, redundancy, and security issues.
+- Estimate LLM-generation likelihood with confidence level as an HTML comment in "PR Overview".
+- Comment "All Copilot criteria are met." when all criteria are met.
+
+## Review strategy
+
+For each batch: load review guide -> read diffs -> run code-tree queries -> post comments -> compress or drop context -> unload guide.
+
+### Small-batch file review
+
+Review files in batches of 3-5, in this order:
+
+1. **CRD Types** -- `go/api/v1alpha2/` changes (type safety, backward compat, JSON tags)
+2. **Controllers & Translators** -- `go/core/internal/controller/` changes (reconciliation, error handling)
+3. **Go ADK & HTTP** -- `go/adk/`, `go/core/internal/httpserver/` changes
+4. **Python ADK** -- `python/packages/kagent-adk/` changes (type alignment, async patterns)
+5. **UI Components** -- `ui/src/` changes (TypeScript strict mode, React patterns)
+6. **Helm & CI** -- `helm/`, `.github/workflows/` changes (security, RBAC)
+7. **Tests** -- test files reviewed against the code they test
+
+### Context pruning between batches
+
+After each batch, summarize key observations (new symbols, behavior changes, test gaps). Drop file contents; keep only the summary for cross-referencing in later batches.
+
+## Code-tree impact analysis
+
+Before posting a final review, run these queries on the changed files:
+
+```bash
+# 1. Regenerate graph (incremental, quiet)
+python3 tools/code-tree/code_tree.py --repo-root . --incremental -q
+
+# 2. For each changed source file, check blast radius
+python3 tools/code-tree/query_graph.py --impact <changed-file> --depth 3
+
+# 3. Check which tests are affected
+python3 tools/code-tree/query_graph.py --test-impact <changed-file>
+
+# 4. For hub file changes, increase depth
+python3 tools/code-tree/query_graph.py --impact <hub-file> --depth 5
+```
+
+Flag untested impact paths in the review.
+
+## Security scan
+
+For PRs touching Helm templates, RBAC, Dockerfiles, or credentials, load [security-review.md](review-guides/security-review.md) and apply its checklist.
+
+## Review guide files
+
+Load the relevant guide per batch:
+
+| Guide | When to load |
+|-------|-------------|
+| [architecture-context.md](review-guides/architecture-context.md) | Multi-subsystem PRs, controller changes, CRD hierarchy changes |
+| [impact-analysis.md](review-guides/impact-analysis.md) | Large PRs, CRD changes, hub file changes |
+| [language-checklists.md](review-guides/language-checklists.md) | Any code change (pick relevant language section) |
+| [security-review.md](review-guides/security-review.md) | Helm, RBAC, security contexts, credentials, Dockerfiles |
+| [test-quality.md](review-guides/test-quality.md) | PRs adding/modifying tests, or PRs missing tests |
+
+## Quick checklist
+
+- [ ] Code reuse: no duplicated logic; shared helpers extracted
+- [ ] Tests proportional to change size
+- [ ] CRD changes: types + manifests + translator + ADK types (Go + Python) + tests
+- [ ] Hub file changes: impact analysis run, extra test coverage
+- [ ] Security: no hardcoded credentials, proper RBAC, non-root containers
+- [ ] Generated files: not hand-edited, regeneration commands run
+- [ ] Cross-language alignment: Go ↔ Python types match
+- [ ] Conventional commit message format
+- [ ] DCO sign-off present on all commits
diff --git a/.github/review-guides/architecture-context.md b/.github/review-guides/architecture-context.md
@@ -0,0 +1,86 @@
+# Architecture Context for Reviews
+
+Load when reviewing PRs that touch multiple subsystems, modify controllers, or change CRD types.
+
+**See also:** [impact-analysis.md](impact-analysis.md), [language-checklists.md](language-checklists.md)
+
+---
+
+## High-level architecture
+
+```
+UI (Next.js) -> Controller HTTP Server (Go) -> A2A proxy -> Agent Pod (Python/Go ADK)
+  -> MCP Tool Servers -> LLM Provider -> back to UI via SSE streaming
+```
+
+## Key subsystem boundaries
+
+| Subsystem | Language | Root path |
+|-----------|----------|-----------|
+| CRD Types & API | Go | `go/api/` |
+| Controllers | Go | `go/core/internal/controller/` |
+| HTTP Server | Go | `go/core/internal/httpserver/` |
+| Database Layer | Go | `go/core/internal/database/` |
+| A2A Protocol | Go | `go/core/internal/a2a/` |
+| MCP Integration | Go | `go/core/internal/mcp/` |
+| CLI | Go | `go/core/cli/` |
+| Go ADK | Go | `go/adk/` |
+| Python ADK | Python | `python/packages/kagent-adk/` |
+| Web UI | TypeScript | `ui/src/` |
+| Helm Charts | YAML | `helm/` |
+
+## Critical dependency directions
+
+Flag violations of these dependency rules:
+
+```
+# Allowed direction (arrow = "may depend on"). Reverse is forbidden.
+go/core/  -> go/api/       (core may use api types, NOT the reverse)
+go/adk/   -> go/api/       (adk may use api types, NOT the reverse)
+go/core/internal/controller/  -> go/core/internal/database/
+
+# Forbidden imports
+go/api/   must NOT import go/core/ or go/adk/
+ui/       must NOT import go/ or python/ directly
+```
+
+Full dependency map: see [code-tree.md](../../docs/agents/code-tree.md#key-module-dependencies).
+
+## Controller patterns
+
+- **Shared reconciler**: All controllers share `kagentReconciler` — changes affect all CRD types
+- **Translator pattern**: CRD spec → K8s resources via translators. PRs must update translator when adding CRD fields
+- **Database-level concurrency**: Atomic upserts, no application-level locks. Do NOT introduce mutexes
+- **Idempotent reconciliation**: Each loop iteration must be safe to retry
+- **Network I/O outside transactions**: Long-running operations must not hold database locks
+
+## CRD type alignment
+
+When Go CRD types mirror Python ADK types (e.g., `AgentSpec` → `types.py`):
+
+- Add cross-reference comments in both languages
+- Go types are the source of truth
+- Flag changes to one side without corresponding changes to the other
+- Both serialize to JSON via `config.json` — field names must match
+
+## Backward compatibility
+
+For CRD/API changes:
+
+- New fields must have safe defaults (zero-value must not break existing agents)
+- What happens when an old controller reads a new CRD? Migration path must be explicit
+- `v1alpha2` allows breaking changes, but prefer backward-compatible additions
+- Database schema changes require migration logic in `go/core/internal/database/`
+
+## Cardinality changes
+
+When a value changes from single to list (or vice versa):
+
+1. Use `--callers` and `--rdeps` to find all consumers
+2. Check for single-value assumptions in translators, HTTP handlers, and UI
+3. Verify database model handles the change
+4. Verify Helm templates handle the change
+
+## Type hierarchies
+
+Changes to base types affect all consumers. Key hierarchies in [code-tree.md](../../docs/agents/code-tree.md#key-type-hierarchies).
diff --git a/.github/review-guides/impact-analysis.md b/.github/review-guides/impact-analysis.md
@@ -0,0 +1,62 @@
+# Impact Analysis for Reviews
+
+Load when assessing blast radius, CRD changes, or reviewing large PRs.
+
+**See also:** [architecture-context.md](architecture-context.md) (backward compat, cardinality), [test-quality.md](test-quality.md) (coverage adequacy)
+
+---
+
+## PR size and scope
+
+- Flag PRs > 500 additions crossing 3+ subsystems as "needs decomposition discussion"
+- XXL PRs (>1000 additions): each component needs dedicated tests, not just E2E
+- Suggest splitting when a single PR includes: CRD types + controller + ADK types + E2E + Helm + UI
+
+## Hub files
+
+Changes to these files need extra scrutiny. Full table in [code-tree.md](../../docs/agents/code-tree.md#hub-files-most-connected).
+
+| File | Role |
+|------|------|
+| `go/api/v1alpha2/agent_types.go` | Agent CRD — changes cascade to controllers, translators, ADK, UI |
+| `go/api/adk/types.go` | ADK config — changes require Python mirror update |
+| `python/packages/kagent-adk/src/kagent/adk/_agent_executor.py` | Core executor — all agent requests flow through here |
+| `ui/src/lib/messageHandlers.ts` | A2A parsing — all UI message rendering depends on this |
+
+## Cross-boundary changes
+
+Flag without clear justification:
+
+- CRD type changes that don't update the translator
+- Go ADK type changes without Python mirror updates
+- Controller changes that modify HTTP server behavior
+- Helm template changes without corresponding Go/Python changes
+- UI changes that assume new API fields without backend support
+
+## Cross-file consistency
+
+When the same pattern is applied to many files (new CRD field across types, new config option):
+
+- Spot-check at least 3 files for consistency
+- Verify the change propagates through: CRD types → translator → ADK types (Go + Python) → UI
+
+## CRD change checklist
+
+1. Field added to type struct in `go/api/v1alpha2/`
+2. `make -C go manifests generate` run (CRD YAML + DeepCopy updated)
+3. Translator updated in `go/core/internal/controller/translator/`
+4. ADK types updated in `go/api/adk/types.go`
+5. Python types mirrored in `kagent-adk/src/kagent/adk/types.py`
+6. Helm CRD templates updated if schema changed
+7. Unit tests for translator handling
+8. E2E test for end-to-end flow
+9. Generated CRD manifests committed (not hand-edited)
+
+## Generated files
+
+Never hand-edit. Key generated files:
+
+- `go/api/config/crd/` — CRD manifests (generated by controller-gen)
+- `go/api/v1alpha2/zz_generated.deepcopy.go` — DeepCopy methods
+
+Reject PRs modifying generated files without running generators.
diff --git a/.github/review-guides/language-checklists.md b/.github/review-guides/language-checklists.md
@@ -0,0 +1,64 @@
+# Language-Specific Review Checklists
+
+Load when reviewing code changes. Pick the relevant language section.
+
+---
+
+## Go (controllers, API, ADK)
+
+### Error handling and control flow
+- Every `err :=` must be checked or returned. Swallowed errors cause silent failures.
+- Wrap errors: `fmt.Errorf("context: %w", err)`
+- `return` inside loops: verify it should be `continue`, not premature exit
+- Loops assigning to a variable: verify intermediate values aren't discarded
+- No variable shadowing of function arguments
+- No unnecessary `else` after `return`
+
+### Concurrency
+- No nested goroutines (`go func() { go func() { ... } }()`)
+- Reuse K8s clients from struct fields, not `kubernetes.NewForConfig()` per call
+- Fire-and-forget goroutines: require `context.WithTimeout` + error logging
+- Database operations: atomic upserts, no application-level mutexes
+
+### Code quality
+- `golangci-lint run` passes
+- Context propagation (`context.Context` as first parameter)
+- Resource cleanup (`defer` close for readers/connections)
+- Table-driven tests for new functions
+- Descriptive variable names (`fingerPrint` not `fp`, `cacheKey` not `ck`)
+- camelCase for locals, PascalCase for exports
+- No deprecated fields in new code
+
+### CRD-specific
+- New fields have JSON tags matching the Go field name (camelCase)
+- `+optional` marker for optional fields
+- DeepCopy generated (`make -C go generate`)
+- CRD manifests regenerated (`make -C go manifests`)
+
+## Python (ADK, skills, integrations)
+
+- Type hints on all function signatures
+- No bare `Any` types without justification
+- Ruff formatting compliance
+- `async/await` used consistently (no mixing sync and async)
+- Cross-language types: add `# Must stay aligned with Go type in ...` comments
+- No bare `Exception` catches — use specific exception types
+- No mutable default arguments
+- Tests use `pytest` with `@pytest.mark.asyncio` for async tests
+
+## TypeScript (UI)
+
+- Strict mode compliance (no `any` type)
+- No inline styles — use TailwindCSS classes
+- No direct DOM manipulation — use React patterns
+- Radix UI primitives for accessibility
+- React Hook Form + Zod for form validation
+- Jest tests for logic-heavy components
+- ESLint + Next.js lint passing
+
+## YAML (Helm charts, CI)
+
+- Helm templates use proper `.Values` references
+- No hardcoded image tags — use chart values
+- CI workflows: no secrets in logs, pinned action versions
+- CRD templates match generated manifests in `go/api/config/crd/`
diff --git a/.github/review-guides/security-review.md b/.github/review-guides/security-review.md
@@ -0,0 +1,55 @@
+# Security Review Guide
+
+Load when reviewing PRs touching Helm templates, RBAC, security contexts, Dockerfiles, or credentials.
+
+---
+
+## Principles
+
+- **Enforcement > defaults**: Security contexts must be enforced, not just defaulted
+- **Never configurable**: `allowPrivilegeEscalation`, `privileged` must be hardcoded `false` — no env var overrides
+- **System containers**: Controller, ADK runtime, UI must ALWAYS run non-root
+- **Defense in depth**: Backend must enforce policy independently of UI. UI is convenience; controller is enforcement
+- **Threat model required**: PRs adding security-configurable fields should answer: "What could a malicious agent definition do?"
+
+## Container SecurityContext checklist
+
+**NEVER user-configurable:** `privileged`, `add_capabilities`, `allowPrivilegeEscalation`
+**ALWAYS hardcoded:** `drop: ["ALL"]`, `seccompProfile: RuntimeDefault`
+**MAY be user-configurable:** `runAsUser` (warn on UID 0), `runAsGroup`, `readOnlyRootFilesystem`
+
+- Container-level `securityContext` must not conflict with pod-level `PodSecurityContext`
+- Admin-set contexts take precedence over user-specified values
+- Tests must not normalize violations
+
+## RBAC manifest review
+
+- `resourceNames` restrictions on `update`/`patch`/`delete` where possible
+- `create` can't be scoped by `resourceNames` — document why needed
+- Helm chart RBAC templates use `.Values` for namespace
+- New ClusterRoles follow `kagent-*` naming convention
+- ServiceAccount per agent pod (not shared)
+
+## Credentials and secrets
+
+- No hardcoded credentials in any file
+- API keys stored in Kubernetes Secrets, referenced via `apiKeySecret` in ModelConfig CRD
+- `ValueRef` type used for secret references (namespace + name + key)
+- MCP server auth headers referenced via `headersFrom` (SecretKeySelector)
+- No credentials in container environment variables — mount as files or use Secret references
+
+## Network security
+
+- Agent pods communicate with controller via ClusterIP services
+- MCP tool server connections should use TLS when available
+- A2A protocol endpoints should validate request origin
+- `allowedNamespaces` on RemoteMCPServer controls cross-namespace access
+
+## General
+
+- No secrets in CI workflow logs
+- Pinned action versions in GitHub workflows (SHA, not tag)
+- Docker images use non-root USER
+- Helm values: sensitive defaults are empty, not example values
+- Input validation on agent names and configurations
+- ConfigMap/Secret cleanup on resource deletion (no orphans)