Skip to content

feat: self-improving agentic workflow — scenarios 85-87#5

Merged
intel352 merged 15 commits into
mainfrom
feat/self-improving-scenarios
Apr 13, 2026
Merged

feat: self-improving agentic workflow — scenarios 85-87#5
intel352 merged 15 commits into
mainfrom
feat/self-improving-scenarios

Conversation

@intel352
Copy link
Copy Markdown
Contributor

Summary

  • Scenario 85 (self-improving API): Agent adds FTS5 search, pagination, rate limiting, and structured logging to a basic task CRUD API. Includes custom Yaegi module. Tests all 3 deploy strategies.
  • Scenario 86 (self-extending MCP): Agent creates new MCP tools as workflow pipelines (task_analytics, task_forecast), then uses them in subsequent iterations.
  • Scenario 87 (autonomous agile agent): Agent has full autonomy to audit, plan, and iteratively improve the application like an agile team. 5 iterations with git tracking and API self-testing.

Each scenario includes: Gherkin features, Go e2e tests, Docker Compose (Ollama + Gemma 4), config validation, guardrails testing, Makefile, k8s manifests.

Design

See: workflow docs/plans/2026-04-13-self-improving-agentic-workflow-design.md

Dependencies

🤖 Generated with Claude Code

intel352 and others added 14 commits April 13, 2026 04:47
Agent creates new MCP tools as workflow pipelines, uses them in
subsequent iterations. Real Ollama + Gemma 4.
Real Ollama + Gemma 4, Docker Compose, Gherkin features, e2e tests.
Agent adds FTS5 search, pagination, rate limiting, logging.
Agent audits, plans, and iteratively improves the application like
an agile team. 5 iterations, git tracking, API self-testing.
- Register 85-self-improving-api in scenarios.json
- Remove backward dependsOn: [router] from db module in base-app.yaml
- Add agent healthcheck to docker-compose.yaml (port 8081)
- Publish agent port 8081 in docker-compose.yaml
- Replace hardcoded admin secret with ${WORKFLOW_ADMIN_SECRET:-...} env var
- Fix Makefile clean target (remove incorrect rm -f /data/... paths)
- Fix gofmt violation in command_safety_test.go
- Replace containsString reimplementation with strings.Contains
- Replace indexOfString reimplementation with strings.Index
- Fix e2e test: remove polling of unpublished agent /status endpoint,
  use docker compose logs/ps to detect agent completion instead
- Add k8s/namespace.yaml, k8s/pvc.yaml (app-data, ollama-data, agent-repo)
- Fix k8s/deployment.yaml: add agent readiness probe, use Secret for
  admin-secret, add service.yaml with both app+agent ports
Corrects the admin secret env var name throughout scenario 85 config,
docker-compose, k8s deployment, and guardrails tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add k8s/secret.yaml: defines self-improving-api-secrets Secret so both
  containers don't fail with CreateContainerConfigError on apply
- Remove dependsOn: [router] from db module in k8s/configmap.yaml
  (same fix already applied to config/base-app.yaml)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…l updates

Replace template | default nil params with proven COALESCE(NULLIF(?, ''), col)
SQL pattern so partial updates preserve existing field values without nil injection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dead assignment (cmd reassigned but never executed) replaced by cmd2.
Cleans up the intermediate SUM subquery that was superseded by the
user-table count check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 13, 2026 09:20
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds three new “self-improving agentic workflow” scenarios (85–87) with Docker-based runtime setups, guardrails-focused configs, Gherkin features, and Go tests to validate end-to-end autonomous iteration, MCP tool creation/usage, and self-improving deployment flows.

Changes:

  • Scenario 85: Introduces a self-improving task CRUD API scenario with guardrails, self-improve steps, Docker Compose, and k8s manifests.
  • Scenario 86: Adds a self-extending MCP scenario where an agent creates and then uses new MCP tools (task_analytics, task_forecast), plus E2E validation and seeded SQLite data.
  • Scenario 87: Adds an autonomous agile agent scenario with an audit→plan→validate→deploy→verify→commit loop, iteration tracking, and E2E validation.

Reviewed changes

Copilot reviewed 51 out of 51 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
scenarios/87-autonomous-agile-agent/tests/iteration_tracking_test.go Adds config-structure and guardrails validation tests for scenario 87.
scenarios/87-autonomous-agile-agent/tests/e2e_test.go Adds Docker Compose E2E test for scenario 87 iteration loop and outcomes.
scenarios/87-autonomous-agile-agent/scenario.yaml Declares scenario 87 metadata, components, and validation intent.
scenarios/87-autonomous-agile-agent/README.md Documents scenario 87 purpose, architecture, and how to run it.
scenarios/87-autonomous-agile-agent/Makefile Provides convenience targets to run scenario 87 (up/test/e2e/logs).
scenarios/87-autonomous-agile-agent/features/guardrails_autonomous.feature Gherkin coverage for autonomous guardrails behavior.
scenarios/87-autonomous-agile-agent/features/git_history.feature Gherkin coverage for meaningful git progression across iterations.
scenarios/87-autonomous-agile-agent/features/autonomous_iteration.feature Gherkin coverage for iteration loop behavior and stopping conditions.
scenarios/87-autonomous-agile-agent/features/api_interaction.feature Gherkin coverage for agent API verification behavior.
scenarios/87-autonomous-agile-agent/docker-compose.yaml Defines Ollama/app/agent Compose stack for scenario 87.
scenarios/87-autonomous-agile-agent/config/base-app.yaml Base task CRUD config used as the starting point for scenario 87.
scenarios/87-autonomous-agile-agent/config/agent-config.yaml Agent provider/guardrails plus the autonomous improvement pipeline.
scenarios/86-self-extending-mcp/tests/mcp_tool_usage_test.go Adds structural tests for scenario 86 base app + compose expectations.
scenarios/86-self-extending-mcp/tests/mcp_tool_creation_test.go Adds tests asserting tool-creation pipeline and guardrails config.
scenarios/86-self-extending-mcp/tests/iteration_test.go Adds wfctl validation tests and basic iteration structure checks.
scenarios/86-self-extending-mcp/tests/e2e_test.go Adds E2E test to validate MCP tools appear and can be invoked.
scenarios/86-self-extending-mcp/scenario.yaml Declares scenario 86 metadata, components, and validation intent.
scenarios/86-self-extending-mcp/README.md Documents scenario 86 purpose, architecture, seed data, and how to run it.
scenarios/86-self-extending-mcp/Makefile Provides convenience targets to run scenario 86 (up/test/e2e/logs).
scenarios/86-self-extending-mcp/features/use_new_tool.feature Gherkin coverage for calling newly created MCP tools.
scenarios/86-self-extending-mcp/features/iterate_tooling.feature Gherkin coverage for iterating from analytics tool to forecast tool.
scenarios/86-self-extending-mcp/features/guardrails_mcp_creation.feature Gherkin coverage for guardrails during MCP tool creation.
scenarios/86-self-extending-mcp/features/create_mcp_tool.feature Gherkin coverage for the initial MCP tool creation lifecycle.
scenarios/86-self-extending-mcp/docker-compose.yaml Defines Ollama/app/agent Compose stack for scenario 86 (with seed SQL).
scenarios/86-self-extending-mcp/config/seed-data.sql Adds seed dataset for analytics/forecasting behaviors.
scenarios/86-self-extending-mcp/config/base-app.yaml Base task CRUD config for scenario 86 with seed-compatible schema usage.
scenarios/86-self-extending-mcp/config/agent-config.yaml Agent provider/guardrails and tool-creation pipeline for scenario 86.
scenarios/85-self-improving-api/tests/guardrails_test.go Adds guardrails configuration assertions for scenario 85.
scenarios/85-self-improving-api/tests/e2e_test.go Adds scenario 85 E2E test and partial post-iteration validation checks.
scenarios/85-self-improving-api/tests/deploy_strategy_test.go Adds tests for hot_reload deploy strategy and step ordering.
scenarios/85-self-improving-api/tests/config_validation_test.go Adds wfctl validation + structural checks for scenario 85 configs.
scenarios/85-self-improving-api/tests/command_safety_test.go Adds command-policy allowlist and bypass-pattern documentation tests.
scenarios/85-self-improving-api/scripts/pull-model.sh Script to wait for Ollama and pull the Gemma 4 model.
scenarios/85-self-improving-api/scenario.yaml Declares scenario 85 metadata, components, and validation intent.
scenarios/85-self-improving-api/README.md Documents scenario 85 purpose, architecture, guardrails, and test targets.
scenarios/85-self-improving-api/Makefile Provides convenience targets for running scenario 85 (up/test/clean).
scenarios/85-self-improving-api/k8s/secret.yaml Adds k8s secret manifest for scenario 85 deployment.
scenarios/85-self-improving-api/k8s/pvc.yaml Adds PVCs for app data, Ollama model data, and agent repo storage.
scenarios/85-self-improving-api/k8s/ollama-deployment.yaml Adds Ollama deployment/service manifests for scenario 85.
scenarios/85-self-improving-api/k8s/namespace.yaml Adds namespace manifest for scenario 85.
scenarios/85-self-improving-api/k8s/deployment.yaml Adds combined app+agent deployment and service for scenario 85.
scenarios/85-self-improving-api/k8s/configmap.yaml Adds ConfigMap containing base app config for scenario 85.
scenarios/85-self-improving-api/features/self_improve_iteration.feature Gherkin coverage for iteration progress and convergence in scenario 85.
scenarios/85-self-improving-api/features/self_improve_guardrails.feature Gherkin coverage for guardrails enforcement in scenario 85.
scenarios/85-self-improving-api/features/self_improve_deploy.feature Gherkin coverage for deployment strategy and git tracking.
scenarios/85-self-improving-api/features/self_improve_custom_code.feature Gherkin coverage for custom Yaegi module workflow.
scenarios/85-self-improving-api/features/self_improve_config.feature Gherkin coverage for config modification/validation workflow.
scenarios/85-self-improving-api/docker-compose.yaml Defines Ollama/app/agent Compose stack for scenario 85.
scenarios/85-self-improving-api/config/base-app.yaml Base task CRUD config used as the starting point for scenario 85.
scenarios/85-self-improving-api/config/agent-config.yaml Agent provider/guardrails and self-improvement pipeline config.
scenarios.json Registers scenarios 85–87 in the scenario index/manifest.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +61 to +70
func waitForHealth(t *testing.T, url string, timeout time.Duration) {
t.Helper()
deadline := time.Now().Add(timeout)
for time.Now().Before(deadline) {
resp, err := http.Get(url) //nolint:noctx
if err == nil && resp.StatusCode == http.StatusOK {
resp.Body.Close()
return
}
time.Sleep(pollInterval)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitForHealth only closes the response body on the success path. When the endpoint returns a non-200, the body is left open, which can leak connections and cause the poll loop to hang/flap under load. Ensure the response body is closed for all non-nil responses (including non-200) and consider adding a request timeout so a single hung GET can’t exceed the polling interval.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7a45100

Comment on lines +129 to +136
func countIterCommits(gitLog string) int {
n := 0
for _, line := range strings.Split(strings.TrimSpace(gitLog), "\n") {
if line != "" && !strings.Contains(line, "initial") {
n++
}
}
return n
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

countIterCommits treats every non-empty log line that doesn’t contain the literal substring "initial" as an iteration commit. This can overcount (e.g., if the initial commit message doesn’t include that exact word/casing, or if there are other non-iteration commits). Prefer matching the iteration commit prefix produced by this scenario (e.g., "feat(iter-" or another deterministic marker) so the E2E assertions measure the intended behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7a45100

Comment on lines +32 to +41
t.Log("Step 1: verifying base app health")
waitForURL(t, appBaseURL+"/healthz", e2eTimeout)

t.Log("Step 2: verifying base app CRUD responds")
verifyBaseCRUD(t)

t.Log("Step 3: waiting for agent to create MCP tools")
waitForMCPTool(t, "task_analytics", e2eTimeout)
waitForMCPTool(t, "task_forecast", e2eTimeout)

Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The E2E test waits for MCP tools to appear, but it never triggers the only pipeline that would create them (mcp_tool_creation_loop is HTTP-triggered at /create-tools). As written, this will likely time out unless some implicit startup behavior exists. Trigger the pipeline explicitly (e.g., POST agentBaseURL+"/create-tools") before waiting for tool registration, or update the scenario so tool creation runs automatically.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7a45100

Comment on lines +80 to +93
// Wait for agent to finish by watching its container exit or checking logs
// for a completion marker (up to 20 minutes).
t.Log("Waiting for self-improvement agent to complete...")
waitForAgentCompletion(t, dir, 20*time.Minute)

// Verify improved app has expected new capabilities
t.Run("improved_app_has_search", func(t *testing.T) {
resp, err := http.Get(appURL + "/tasks/search?q=test")
if err != nil || resp.StatusCode == http.StatusNotFound {
t.Skip("search endpoint not yet implemented by agent")
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("GET /tasks/search: expected 200, got %d", resp.StatusCode)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This E2E flow never triggers the self_improvement_loop pipeline (configured as an HTTP trigger at /improve in agent-config.yaml) and instead waits for the agent container to exit or log a completion marker. If the workflow process is long-running (typical for an HTTP server), the container may never exit and the test won’t exercise the improvement loop deterministically. Consider explicitly POSTing to /improve and then asserting on concrete outcomes (git commits, endpoints, etc.) like scenario 87 does.

Suggested change
// Wait for agent to finish by watching its container exit or checking logs
// for a completion marker (up to 20 minutes).
t.Log("Waiting for self-improvement agent to complete...")
waitForAgentCompletion(t, dir, 20*time.Minute)
// Verify improved app has expected new capabilities
t.Run("improved_app_has_search", func(t *testing.T) {
resp, err := http.Get(appURL + "/tasks/search?q=test")
if err != nil || resp.StatusCode == http.StatusNotFound {
t.Skip("search endpoint not yet implemented by agent")
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("GET /tasks/search: expected 200, got %d", resp.StatusCode)
// Explicitly trigger the self-improvement loop via its HTTP entrypoint and
// wait for a concrete externally observable outcome.
t.Log("Triggering self-improvement agent via /improve...")
req, err := http.NewRequest(http.MethodPost, appURL+"/improve", nil)
if err != nil {
t.Fatalf("create POST /improve request: %v", err)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
t.Fatalf("POST /improve: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK && resp.StatusCode != http.StatusAccepted && resp.StatusCode != http.StatusNoContent {
body, _ := io.ReadAll(resp.Body)
t.Fatalf("POST /improve: expected 200, 202, or 204, got %d: %s", resp.StatusCode, strings.TrimSpace(string(body)))
}
t.Log("Waiting for improved search endpoint to become available...")
deadline := time.Now().Add(20 * time.Minute)
for {
if time.Now().After(deadline) {
t.Fatal("self-improvement did not expose /tasks/search within 20 minutes after POST /improve")
}
searchResp, err := http.Get(appURL + "/tasks/search?q=test")
if err == nil {
func() {
defer searchResp.Body.Close()
if searchResp.StatusCode == http.StatusOK {
return
}
}()
if searchResp.StatusCode == http.StatusOK {
break
}
}
time.Sleep(5 * time.Second)
}
// Verify improved app has expected new capabilities
t.Run("improved_app_has_search", func(t *testing.T) {
resp, err := http.Get(appURL + "/tasks/search?q=test")
if err != nil {
t.Fatalf("GET /tasks/search: %v", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
t.Fatalf("GET /tasks/search: expected 200, got %d: %s", resp.StatusCode, strings.TrimSpace(string(body)))

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7a45100

Comment on lines +20 to +24
- ./config:/data/config
environment:
- WORKFLOW_ADMIN_SECRET=scenario-87-admin-secret
command: ["-config", "/data/config/base-app.yaml", "-data-dir", "/data"]
depends_on:
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKFLOW_ADMIN_SECRET is hard-coded in docker-compose.yaml. Even for draft scenarios, committing secrets/credentials is risky and makes it harder to override in CI. Prefer reading from the environment (e.g., WORKFLOW_ADMIN_SECRET=${WORKFLOW_ADMIN_SECRET:-...}) or an .env file, similar to how scenario 85 handles WFCTL_ADMIN_SECRET.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7a45100

Comment on lines +24 to +31
- app-data:/data
- ./config:/data/config
environment:
- WORKFLOW_ADMIN_SECRET=scenario-86-admin-secret
- SEED_SQL=/data/config/seed-data.sql
command: ["-config", "/data/config/base-app.yaml", "-data-dir", "/data"]
depends_on:
ollama:
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKFLOW_ADMIN_SECRET is hard-coded in docker-compose.yaml. Even for draft scenarios, committing secrets/credentials is risky and makes it harder to override in CI. Prefer reading from the environment (e.g., WORKFLOW_ADMIN_SECRET=${WORKFLOW_ADMIN_SECRET:-...}) or an .env file, and keep the compose file secret-free.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7a45100

- scenario 87 e2e: close response body for all non-nil responses in waitForHealth (fixes leak on non-200)
- scenario 87 e2e: match iteration commits on deterministic "feat(iter-" prefix instead of fragile "!initial" heuristic
- scenario 86 e2e: POST to /create-tools before polling for MCP tool registration
- scenario 85 e2e: POST to /improve explicitly before polling for agent completion
- scenario 86 docker-compose: use \${WORKFLOW_ADMIN_SECRET:-scenario-86-admin-secret} env substitution
- scenario 87 docker-compose: use \${WORKFLOW_ADMIN_SECRET:-scenario-87-admin-secret} env substitution

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@intel352 intel352 enabled auto-merge (squash) April 13, 2026 13:33
@intel352 intel352 disabled auto-merge April 13, 2026 17:05
@intel352 intel352 merged commit c166eea into main Apr 13, 2026
7 checks passed
@intel352 intel352 deleted the feat/self-improving-scenarios branch April 13, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants