feat: self-improving agentic workflow — scenarios 85-87#5
Conversation
Agent creates new MCP tools as workflow pipelines, uses them in subsequent iterations. Real Ollama + Gemma 4.
Real Ollama + Gemma 4, Docker Compose, Gherkin features, e2e tests. Agent adds FTS5 search, pagination, rate limiting, logging.
Agent audits, plans, and iteratively improves the application like an agile team. 5 iterations, git tracking, API self-testing.
- Register 85-self-improving-api in scenarios.json
- Remove backward dependsOn: [router] from db module in base-app.yaml
- Add agent healthcheck to docker-compose.yaml (port 8081)
- Publish agent port 8081 in docker-compose.yaml
- Replace hardcoded admin secret with ${WORKFLOW_ADMIN_SECRET:-...} env var
- Fix Makefile clean target (remove incorrect rm -f /data/... paths)
- Fix gofmt violation in command_safety_test.go
- Replace containsString reimplementation with strings.Contains
- Replace indexOfString reimplementation with strings.Index
- Fix e2e test: remove polling of unpublished agent /status endpoint,
use docker compose logs/ps to detect agent completion instead
- Add k8s/namespace.yaml, k8s/pvc.yaml (app-data, ollama-data, agent-repo)
- Fix k8s/deployment.yaml: add agent readiness probe, use Secret for
admin-secret, add service.yaml with both app+agent ports
…, fix compose working dir in e2e
Corrects the admin secret env var name throughout scenario 85 config, docker-compose, k8s deployment, and guardrails tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add k8s/secret.yaml: defines self-improving-api-secrets Secret so both containers don't fail with CreateContainerConfigError on apply - Remove dependsOn: [router] from db module in k8s/configmap.yaml (same fix already applied to config/base-app.yaml) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l updates Replace template | default nil params with proven COALESCE(NULLIF(?, ''), col) SQL pattern so partial updates preserve existing field values without nil injection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dead assignment (cmd reassigned but never executed) replaced by cmd2. Cleans up the intermediate SUM subquery that was superseded by the user-table count check. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds three new “self-improving agentic workflow” scenarios (85–87) with Docker-based runtime setups, guardrails-focused configs, Gherkin features, and Go tests to validate end-to-end autonomous iteration, MCP tool creation/usage, and self-improving deployment flows.
Changes:
- Scenario 85: Introduces a self-improving task CRUD API scenario with guardrails, self-improve steps, Docker Compose, and k8s manifests.
- Scenario 86: Adds a self-extending MCP scenario where an agent creates and then uses new MCP tools (
task_analytics,task_forecast), plus E2E validation and seeded SQLite data. - Scenario 87: Adds an autonomous agile agent scenario with an audit→plan→validate→deploy→verify→commit loop, iteration tracking, and E2E validation.
Reviewed changes
Copilot reviewed 51 out of 51 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| scenarios/87-autonomous-agile-agent/tests/iteration_tracking_test.go | Adds config-structure and guardrails validation tests for scenario 87. |
| scenarios/87-autonomous-agile-agent/tests/e2e_test.go | Adds Docker Compose E2E test for scenario 87 iteration loop and outcomes. |
| scenarios/87-autonomous-agile-agent/scenario.yaml | Declares scenario 87 metadata, components, and validation intent. |
| scenarios/87-autonomous-agile-agent/README.md | Documents scenario 87 purpose, architecture, and how to run it. |
| scenarios/87-autonomous-agile-agent/Makefile | Provides convenience targets to run scenario 87 (up/test/e2e/logs). |
| scenarios/87-autonomous-agile-agent/features/guardrails_autonomous.feature | Gherkin coverage for autonomous guardrails behavior. |
| scenarios/87-autonomous-agile-agent/features/git_history.feature | Gherkin coverage for meaningful git progression across iterations. |
| scenarios/87-autonomous-agile-agent/features/autonomous_iteration.feature | Gherkin coverage for iteration loop behavior and stopping conditions. |
| scenarios/87-autonomous-agile-agent/features/api_interaction.feature | Gherkin coverage for agent API verification behavior. |
| scenarios/87-autonomous-agile-agent/docker-compose.yaml | Defines Ollama/app/agent Compose stack for scenario 87. |
| scenarios/87-autonomous-agile-agent/config/base-app.yaml | Base task CRUD config used as the starting point for scenario 87. |
| scenarios/87-autonomous-agile-agent/config/agent-config.yaml | Agent provider/guardrails plus the autonomous improvement pipeline. |
| scenarios/86-self-extending-mcp/tests/mcp_tool_usage_test.go | Adds structural tests for scenario 86 base app + compose expectations. |
| scenarios/86-self-extending-mcp/tests/mcp_tool_creation_test.go | Adds tests asserting tool-creation pipeline and guardrails config. |
| scenarios/86-self-extending-mcp/tests/iteration_test.go | Adds wfctl validation tests and basic iteration structure checks. |
| scenarios/86-self-extending-mcp/tests/e2e_test.go | Adds E2E test to validate MCP tools appear and can be invoked. |
| scenarios/86-self-extending-mcp/scenario.yaml | Declares scenario 86 metadata, components, and validation intent. |
| scenarios/86-self-extending-mcp/README.md | Documents scenario 86 purpose, architecture, seed data, and how to run it. |
| scenarios/86-self-extending-mcp/Makefile | Provides convenience targets to run scenario 86 (up/test/e2e/logs). |
| scenarios/86-self-extending-mcp/features/use_new_tool.feature | Gherkin coverage for calling newly created MCP tools. |
| scenarios/86-self-extending-mcp/features/iterate_tooling.feature | Gherkin coverage for iterating from analytics tool to forecast tool. |
| scenarios/86-self-extending-mcp/features/guardrails_mcp_creation.feature | Gherkin coverage for guardrails during MCP tool creation. |
| scenarios/86-self-extending-mcp/features/create_mcp_tool.feature | Gherkin coverage for the initial MCP tool creation lifecycle. |
| scenarios/86-self-extending-mcp/docker-compose.yaml | Defines Ollama/app/agent Compose stack for scenario 86 (with seed SQL). |
| scenarios/86-self-extending-mcp/config/seed-data.sql | Adds seed dataset for analytics/forecasting behaviors. |
| scenarios/86-self-extending-mcp/config/base-app.yaml | Base task CRUD config for scenario 86 with seed-compatible schema usage. |
| scenarios/86-self-extending-mcp/config/agent-config.yaml | Agent provider/guardrails and tool-creation pipeline for scenario 86. |
| scenarios/85-self-improving-api/tests/guardrails_test.go | Adds guardrails configuration assertions for scenario 85. |
| scenarios/85-self-improving-api/tests/e2e_test.go | Adds scenario 85 E2E test and partial post-iteration validation checks. |
| scenarios/85-self-improving-api/tests/deploy_strategy_test.go | Adds tests for hot_reload deploy strategy and step ordering. |
| scenarios/85-self-improving-api/tests/config_validation_test.go | Adds wfctl validation + structural checks for scenario 85 configs. |
| scenarios/85-self-improving-api/tests/command_safety_test.go | Adds command-policy allowlist and bypass-pattern documentation tests. |
| scenarios/85-self-improving-api/scripts/pull-model.sh | Script to wait for Ollama and pull the Gemma 4 model. |
| scenarios/85-self-improving-api/scenario.yaml | Declares scenario 85 metadata, components, and validation intent. |
| scenarios/85-self-improving-api/README.md | Documents scenario 85 purpose, architecture, guardrails, and test targets. |
| scenarios/85-self-improving-api/Makefile | Provides convenience targets for running scenario 85 (up/test/clean). |
| scenarios/85-self-improving-api/k8s/secret.yaml | Adds k8s secret manifest for scenario 85 deployment. |
| scenarios/85-self-improving-api/k8s/pvc.yaml | Adds PVCs for app data, Ollama model data, and agent repo storage. |
| scenarios/85-self-improving-api/k8s/ollama-deployment.yaml | Adds Ollama deployment/service manifests for scenario 85. |
| scenarios/85-self-improving-api/k8s/namespace.yaml | Adds namespace manifest for scenario 85. |
| scenarios/85-self-improving-api/k8s/deployment.yaml | Adds combined app+agent deployment and service for scenario 85. |
| scenarios/85-self-improving-api/k8s/configmap.yaml | Adds ConfigMap containing base app config for scenario 85. |
| scenarios/85-self-improving-api/features/self_improve_iteration.feature | Gherkin coverage for iteration progress and convergence in scenario 85. |
| scenarios/85-self-improving-api/features/self_improve_guardrails.feature | Gherkin coverage for guardrails enforcement in scenario 85. |
| scenarios/85-self-improving-api/features/self_improve_deploy.feature | Gherkin coverage for deployment strategy and git tracking. |
| scenarios/85-self-improving-api/features/self_improve_custom_code.feature | Gherkin coverage for custom Yaegi module workflow. |
| scenarios/85-self-improving-api/features/self_improve_config.feature | Gherkin coverage for config modification/validation workflow. |
| scenarios/85-self-improving-api/docker-compose.yaml | Defines Ollama/app/agent Compose stack for scenario 85. |
| scenarios/85-self-improving-api/config/base-app.yaml | Base task CRUD config used as the starting point for scenario 85. |
| scenarios/85-self-improving-api/config/agent-config.yaml | Agent provider/guardrails and self-improvement pipeline config. |
| scenarios.json | Registers scenarios 85–87 in the scenario index/manifest. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func waitForHealth(t *testing.T, url string, timeout time.Duration) { | ||
| t.Helper() | ||
| deadline := time.Now().Add(timeout) | ||
| for time.Now().Before(deadline) { | ||
| resp, err := http.Get(url) //nolint:noctx | ||
| if err == nil && resp.StatusCode == http.StatusOK { | ||
| resp.Body.Close() | ||
| return | ||
| } | ||
| time.Sleep(pollInterval) |
There was a problem hiding this comment.
waitForHealth only closes the response body on the success path. When the endpoint returns a non-200, the body is left open, which can leak connections and cause the poll loop to hang/flap under load. Ensure the response body is closed for all non-nil responses (including non-200) and consider adding a request timeout so a single hung GET can’t exceed the polling interval.
| func countIterCommits(gitLog string) int { | ||
| n := 0 | ||
| for _, line := range strings.Split(strings.TrimSpace(gitLog), "\n") { | ||
| if line != "" && !strings.Contains(line, "initial") { | ||
| n++ | ||
| } | ||
| } | ||
| return n |
There was a problem hiding this comment.
countIterCommits treats every non-empty log line that doesn’t contain the literal substring "initial" as an iteration commit. This can overcount (e.g., if the initial commit message doesn’t include that exact word/casing, or if there are other non-iteration commits). Prefer matching the iteration commit prefix produced by this scenario (e.g., "feat(iter-" or another deterministic marker) so the E2E assertions measure the intended behavior.
| t.Log("Step 1: verifying base app health") | ||
| waitForURL(t, appBaseURL+"/healthz", e2eTimeout) | ||
|
|
||
| t.Log("Step 2: verifying base app CRUD responds") | ||
| verifyBaseCRUD(t) | ||
|
|
||
| t.Log("Step 3: waiting for agent to create MCP tools") | ||
| waitForMCPTool(t, "task_analytics", e2eTimeout) | ||
| waitForMCPTool(t, "task_forecast", e2eTimeout) | ||
|
|
There was a problem hiding this comment.
The E2E test waits for MCP tools to appear, but it never triggers the only pipeline that would create them (mcp_tool_creation_loop is HTTP-triggered at /create-tools). As written, this will likely time out unless some implicit startup behavior exists. Trigger the pipeline explicitly (e.g., POST agentBaseURL+"/create-tools") before waiting for tool registration, or update the scenario so tool creation runs automatically.
| // Wait for agent to finish by watching its container exit or checking logs | ||
| // for a completion marker (up to 20 minutes). | ||
| t.Log("Waiting for self-improvement agent to complete...") | ||
| waitForAgentCompletion(t, dir, 20*time.Minute) | ||
|
|
||
| // Verify improved app has expected new capabilities | ||
| t.Run("improved_app_has_search", func(t *testing.T) { | ||
| resp, err := http.Get(appURL + "/tasks/search?q=test") | ||
| if err != nil || resp.StatusCode == http.StatusNotFound { | ||
| t.Skip("search endpoint not yet implemented by agent") | ||
| } | ||
| defer resp.Body.Close() | ||
| if resp.StatusCode != http.StatusOK { | ||
| t.Errorf("GET /tasks/search: expected 200, got %d", resp.StatusCode) |
There was a problem hiding this comment.
This E2E flow never triggers the self_improvement_loop pipeline (configured as an HTTP trigger at /improve in agent-config.yaml) and instead waits for the agent container to exit or log a completion marker. If the workflow process is long-running (typical for an HTTP server), the container may never exit and the test won’t exercise the improvement loop deterministically. Consider explicitly POSTing to /improve and then asserting on concrete outcomes (git commits, endpoints, etc.) like scenario 87 does.
| // Wait for agent to finish by watching its container exit or checking logs | |
| // for a completion marker (up to 20 minutes). | |
| t.Log("Waiting for self-improvement agent to complete...") | |
| waitForAgentCompletion(t, dir, 20*time.Minute) | |
| // Verify improved app has expected new capabilities | |
| t.Run("improved_app_has_search", func(t *testing.T) { | |
| resp, err := http.Get(appURL + "/tasks/search?q=test") | |
| if err != nil || resp.StatusCode == http.StatusNotFound { | |
| t.Skip("search endpoint not yet implemented by agent") | |
| } | |
| defer resp.Body.Close() | |
| if resp.StatusCode != http.StatusOK { | |
| t.Errorf("GET /tasks/search: expected 200, got %d", resp.StatusCode) | |
| // Explicitly trigger the self-improvement loop via its HTTP entrypoint and | |
| // wait for a concrete externally observable outcome. | |
| t.Log("Triggering self-improvement agent via /improve...") | |
| req, err := http.NewRequest(http.MethodPost, appURL+"/improve", nil) | |
| if err != nil { | |
| t.Fatalf("create POST /improve request: %v", err) | |
| } | |
| resp, err := http.DefaultClient.Do(req) | |
| if err != nil { | |
| t.Fatalf("POST /improve: %v", err) | |
| } | |
| defer resp.Body.Close() | |
| if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusAccepted && resp.StatusCode != http.StatusNoContent { | |
| body, _ := io.ReadAll(resp.Body) | |
| t.Fatalf("POST /improve: expected 200, 202, or 204, got %d: %s", resp.StatusCode, strings.TrimSpace(string(body))) | |
| } | |
| t.Log("Waiting for improved search endpoint to become available...") | |
| deadline := time.Now().Add(20 * time.Minute) | |
| for { | |
| if time.Now().After(deadline) { | |
| t.Fatal("self-improvement did not expose /tasks/search within 20 minutes after POST /improve") | |
| } | |
| searchResp, err := http.Get(appURL + "/tasks/search?q=test") | |
| if err == nil { | |
| func() { | |
| defer searchResp.Body.Close() | |
| if searchResp.StatusCode == http.StatusOK { | |
| return | |
| } | |
| }() | |
| if searchResp.StatusCode == http.StatusOK { | |
| break | |
| } | |
| } | |
| time.Sleep(5 * time.Second) | |
| } | |
| // Verify improved app has expected new capabilities | |
| t.Run("improved_app_has_search", func(t *testing.T) { | |
| resp, err := http.Get(appURL + "/tasks/search?q=test") | |
| if err != nil { | |
| t.Fatalf("GET /tasks/search: %v", err) | |
| } | |
| defer resp.Body.Close() | |
| if resp.StatusCode != http.StatusOK { | |
| body, _ := io.ReadAll(resp.Body) | |
| t.Fatalf("GET /tasks/search: expected 200, got %d: %s", resp.StatusCode, strings.TrimSpace(string(body))) |
| - ./config:/data/config | ||
| environment: | ||
| - WORKFLOW_ADMIN_SECRET=scenario-87-admin-secret | ||
| command: ["-config", "/data/config/base-app.yaml", "-data-dir", "/data"] | ||
| depends_on: |
There was a problem hiding this comment.
WORKFLOW_ADMIN_SECRET is hard-coded in docker-compose.yaml. Even for draft scenarios, committing secrets/credentials is risky and makes it harder to override in CI. Prefer reading from the environment (e.g., WORKFLOW_ADMIN_SECRET=${WORKFLOW_ADMIN_SECRET:-...}) or an .env file, similar to how scenario 85 handles WFCTL_ADMIN_SECRET.
| - app-data:/data | ||
| - ./config:/data/config | ||
| environment: | ||
| - WORKFLOW_ADMIN_SECRET=scenario-86-admin-secret | ||
| - SEED_SQL=/data/config/seed-data.sql | ||
| command: ["-config", "/data/config/base-app.yaml", "-data-dir", "/data"] | ||
| depends_on: | ||
| ollama: |
There was a problem hiding this comment.
WORKFLOW_ADMIN_SECRET is hard-coded in docker-compose.yaml. Even for draft scenarios, committing secrets/credentials is risky and makes it harder to override in CI. Prefer reading from the environment (e.g., WORKFLOW_ADMIN_SECRET=${WORKFLOW_ADMIN_SECRET:-...}) or an .env file, and keep the compose file secret-free.
- scenario 87 e2e: close response body for all non-nil responses in waitForHealth (fixes leak on non-200)
- scenario 87 e2e: match iteration commits on deterministic "feat(iter-" prefix instead of fragile "!initial" heuristic
- scenario 86 e2e: POST to /create-tools before polling for MCP tool registration
- scenario 85 e2e: POST to /improve explicitly before polling for agent completion
- scenario 86 docker-compose: use \${WORKFLOW_ADMIN_SECRET:-scenario-86-admin-secret} env substitution
- scenario 87 docker-compose: use \${WORKFLOW_ADMIN_SECRET:-scenario-87-admin-secret} env substitution
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
task_analytics,task_forecast), then uses them in subsequent iterations.Each scenario includes: Gherkin features, Go e2e tests, Docker Compose (Ollama + Gemma 4), config validation, guardrails testing, Makefile, k8s manifests.
Design
See: workflow
docs/plans/2026-04-13-self-improving-agentic-workflow-design.mdDependencies
🤖 Generated with Claude Code