Skip to content

feat: self-improving agentic workflow — agent plugin#6

Merged
intel352 merged 13 commits into
masterfrom
self-improve/test-agent
Apr 13, 2026
Merged

feat: self-improving agentic workflow — agent plugin#6
intel352 merged 13 commits into
masterfrom
self-improve/test-agent

Conversation

@intel352
Copy link
Copy Markdown
Contributor

Summary

  • Add blackboard for structured artifact exchange between agents in multi-phase review pipelines (SQLite-backed, SSE broadcast)
  • Add command safety engine with shell AST analysis (mvdan.cc/sh) — detects destructive commands, pipe-to-shell, script execution, encoded commands, here-doc injection, process substitution
  • Add hierarchical guardrails module (agent.guardrails) with scoped rules (agent > team > model > provider), glob tool access patterns, immutable config sections, command policy
  • Wire guardrails into executor tool execution loop
  • Add self-improvement steps: step.self_improve_validate, step.self_improve_diff, step.self_improve_deploy (hot_reload / git_pr / canary)
  • Add step.lsp_diagnose for in-process LSP diagnostics
  • Add multi-agent review pipeline pattern with InputFromBlackboard injection (system_prompt_append / user_message modes)

Design

See: workflow docs/plans/2026-04-13-self-improving-agentic-workflow-design.md

Dependencies

🤖 Generated with Claude Code

intel352 and others added 12 commits April 13, 2026 02:42
Uses mvdan.cc/sh to parse commands and detect: destructive operations,
pipe-to-shell, script execution, encoded commands, chained dangerous.
Supports allowlist and blocklist modes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ith step types

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- parseArtifactTime: try sub-second format first, fall back to seconds-only
- BlackboardPostStep: generate UUID before Post() so art.ID is non-empty in step output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Scoped rules (agent > team > model > provider), glob-based tool
access, immutable config sections, command safety integration.
GuardrailsModule implements executor.TrustEvaluator for direct
wiring into the executor tool execution path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on detection

Closes spec gaps identified in review:
- Here-doc injection: detect shell commands with heredoc redirects (bash << EOF)
- Process substitution: detect shell commands using <(cmd) as input (bash <(curl ...))
- Variable expansion: detect ANSI-C $'\x72\x6d' hex/octal obfuscation
- TestAnalyzer_BlocklistMode, TestAnalyzer_HereDocInjection,
  TestAnalyzer_ProcessSubstitution, TestAnalyzer_VariableExpansionTricks added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
findGuardrailsModule() searches service registry for any GuardrailsModule.
Before each tool call: Evaluate() checks tool access, EvaluateCommand()
checks command safety for shell tools. Blocked calls return error result
without reaching the tool registry.

Tests verify: registry lookup, disallowed tool blocking, dangerous command
blocking via the same code path used in the agent loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…deploy strategies

- step.self_improve_validate: YAML parse + immutability constraint check + graceful MCP skip
- step.self_improve_diff: text diff with blackboard posting support
- step.self_improve_deploy: hot_reload / git_pr / canary strategies with mandatory pre-deploy validation gate
- step.lsp_diagnose: LSPProvider interface wrapper as pipeline step
- review_pipeline.go: ReviewPipelineConfig, InputFromBlackboard, InjectBlackboardInput()
- step.agent_execute: input_from_blackboard config injects blackboard artifacts into agent context
- step.agent_execute: guardrails executor integration (findGuardrailsModule already present)
- plugin.go: registered all 4 new step factories

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ails into root executor step

- Remove BlockedPatterns check from checkDestructive() — patterns already
  checked pre-parse, so running them again per-CallExpr caused duplicate risk
  entries (e.g. 2 destructive risks for 'rm -rf /')
- Change isAllowed() from HasPrefix to exact match so 'golang-migrate' is
  not permitted when only 'go' is in the allowlist
- Add 6 tests covering: NoDuplicateRisks, AllowlistNoHasPrefixBypass,
  HereDocInjection, ProcessSubstitution, VariableExpansionTricks, BlocklistMode
- Wire TrustEvaluator into root step_agent_execute.go via interface-based
  service registry lookup (avoids circular import with orchestrator package)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mand analyzer

mkfs appeared in both DefaultPolicy BlockedPatterns (pre-parse raw check) and
alwaysDestructive map (AST walk per-CallExpr), producing duplicate Risk entries.
Fix: remove mkfs from alwaysDestructive since BlockedPatterns already covers it.
fdisk and wipefs remain in alwaysDestructive (not in BlockedPatterns).

Issue #2 (isAllowed HasPrefix bypass) was already using exact match — no change needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p misleading comment

- Replace string(action) == "deny" with action == executor.ActionDeny for
  consistency with the rest of the executor package
- Remove unused guardrailsAction struct and String() method
- Remove misleading "Also handle prefix/*" comment that implied unfinished work;
  the HasSuffix branch already handles this case

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…test assertion

- postToBlackboard now uses countDiffLines() for lines_added instead of
  len(diff), which was counting all diff lines (added + removed)
- Add content assertion to TestSelfImproveDiffStep_PostToBlackboard
  verifying lines_added is 1, not total diff line count
- Replace empty if-body in TestSelfImproveValidateStep_MCPUnavailable
  with real assertion checking for lsp-unavailable warning
- InjectAs rename (InjectKey→InjectAs) + system_prompt_append/user_message modes
- RequireApproval field added to ReviewPipelineConfig
- LSP diagnostics integrated into SelfImproveValidateStep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “self-improving” multi-agent orchestration workflow with shared blackboard artifacts and enforceable guardrails, plus new validation/diff/deploy/LSP diagnostic steps to support iterative config evolution.

Changes:

  • Introduces a SQLite-backed Blackboard (with optional SSE broadcast) and steps to post/read artifacts, plus blackboard-to-agent context injection.
  • Adds Guardrails + shell command static analyzer (mvdan.cc/sh) and wires guardrails into agent tool execution and executor trust evaluation.
  • Adds self-improvement steps: validate (YAML + immutability + optional LSP/MCP), diff generation, and deployment strategies (hot_reload / git_pr / canary), plus an LSP diagnose step.

Reviewed changes

Copilot reviewed 24 out of 26 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
step_agent_execute.go Passes a TrustEngine (first executor.TrustEvaluator in registry) into the executor config.
safety/command_analyzer.go Implements shell command safety analysis (AST-based + pattern checks) with configurable policy.
safety/command_analyzer_test.go Adds unit tests covering destructive commands, pipe-to-shell, heredocs, proc-subst, allowlist mode, etc.
orchestrator/step_self_improve_validate.go Adds validation step (YAML parse, immutability checks, optional LSP + placeholder MCP validation).
orchestrator/step_self_improve_validate_test.go Tests validation behavior for valid/invalid YAML, immutability violations, missing inputs, warnings.
orchestrator/step_self_improve_diff.go Adds diff step to compute a simple line-based diff and optionally post to Blackboard.
orchestrator/step_self_improve_diff_test.go Tests diff generation, force mode, missing inputs, and blackboard posting.
orchestrator/step_self_improve_deploy.go Adds deploy step with hot_reload / git_pr / canary strategies and a pre-deploy validation gate.
orchestrator/step_self_improve_deploy_test.go Tests deploy step behavior including validation gate and failure handling.
orchestrator/step_lsp_diagnose.go Adds LSP diagnose step + LSPProvider lookup helper.
orchestrator/step_lsp_diagnose_test.go Tests LSP diagnose step behavior with/without provider and content.
orchestrator/step_blackboard.go Adds steps to post/read Blackboard artifacts (including latest_only mode).
orchestrator/step_blackboard_test.go Tests blackboard post/read steps and error paths when blackboard missing.
orchestrator/step_agent_execute.go Injects blackboard context into prompts/messages; enforces guardrails before tool execution.
orchestrator/review_pipeline.go Adds InputFromBlackboard config + helper to read artifacts and inject into agent context/prompt.
orchestrator/review_pipeline_test.go Tests blackboard input injection modes and parsing config helper.
orchestrator/ratchetplugin_test.go Updates plugin factory/hook expectations to include guardrails + new steps + blackboard hook.
orchestrator/plugin.go Registers new module/steps and wires a ratchet.blackboard hook to init Blackboard from DB + SSE.
orchestrator/guardrails.go Adds agent.guardrails module implementing executor.TrustEvaluator + tool policy + command safety.
orchestrator/guardrails_test.go Tests tool allow/block behavior, scope matching, immutability matching, and trust-evaluator methods.
orchestrator/blackboard.go Implements Blackboard persistence, subscription fanout, and optional SSE broadcasting.
orchestrator/blackboard_test.go Tests Blackboard CRUD + latest retrieval + subscriptions.
executor/guardrails_integration_test.go Integration-style tests asserting GuardrailsModule works as executor.TrustEvaluator.
go.mod Adds mvdan.cc/sh/v3 dependency and bumps golang.org/x/sys.
go.sum Records new module sums (mvdan sh, qt, x/sys/x/term updates).
.gitignore Ignores .worktrees/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +123 to +127
for _, sec := range guardrails.immutableSections {
proposedVal := extractNestedPath(proposed, sec.Path)
currentVal := extractNestedPath(current, sec.Path)
if fmt.Sprintf("%v", proposedVal) != fmt.Sprintf("%v", currentVal) {
violations = append(violations, ImmutabilityViolation{
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Immutability diffing compares values via fmt.Sprintf("%v"), which is unreliable for maps/slices (map iteration order is random) and can produce false positives/negatives. Consider using reflect.DeepEqual on normalized YAML-decoded structures, or marshal both values to canonical JSON/YAML before comparison.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment on lines +318 to +324
}
if v, ok := d["require_diff_review"].(bool); ok {
defaults.RequireDiffReview = v
}
if v, ok := d["max_iterations_per_cycle"].(int); ok {
defaults.MaxIterationsPerCycle = v
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config parsing here only accepts max_iterations_per_cycle when it is exactly an int. Workflow/YAML configs often decode numbers as int64/float64, so this setting may silently not apply. Consider using a helper that handles numeric types (e.g., extractInt-style conversion) or a type switch covering int/int64/float64.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment on lines +358 to +364
}
if v, ok := cfg["enable_static_analysis"].(bool); ok {
p.EnableStaticAnalysis = v
}
if v, ok := cfg["max_command_length"].(int); ok {
p.MaxCommandLength = v
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseCommandPolicy only reads max_command_length when it is exactly an int; if the config loader yields float64/int64, the limit will be ignored. Use numeric coercion (type switch or shared extractInt helper) so the policy behaves as configured.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment on lines +154 to +164
// --- executor.TrustEvaluator implementation ---

// Evaluate implements executor.TrustEvaluator.
// Checks whether a tool call is allowed by the default guardrails rules.
func (g *GuardrailsModule) Evaluate(_ context.Context, toolName string, _ map[string]any) executor.Action {
ok, _ := g.CheckTool(toolName)
if ok {
return executor.ActionAllow
}
return executor.ActionDeny
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guardrails scoped rules (agent/team/model/provider) are defined, but Evaluate() always calls CheckTool() with an empty ScopeContext, so scopes never affect actual enforcement when used as executor.TrustEvaluator. Either derive ScopeContext from the provided context (agent/model/provider identifiers) and call CheckToolScoped, or clarify/remove the scoped rule feature to avoid a false sense of enforcement.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment thread orchestrator/guardrails.go Outdated
Comment on lines +231 to +246
// matchPattern matches a tool name against a glob-style pattern.
// Supports * wildcard (single segment) and ** (multi-segment via path separator :).
func matchPattern(pattern, value string) bool {
if pattern == "*" || pattern == "**" {
return true
}
if pattern == value {
return true
}
// Suffix wildcard: "mcp:wfctl:validate_*" matches "mcp:wfctl:validate_config"
if strings.HasSuffix(pattern, "*") {
prefix := strings.TrimSuffix(pattern, "*")
return strings.HasPrefix(value, prefix)
}
return false
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchPattern's comment claims support for "**" and segment-aware globbing, but the implementation only supports exact match and a trailing "*" prefix match. Either implement the documented glob semantics (including mid-string wildcards / **), or update the comment to match actual behavior to prevent misconfigured allow/block patterns.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment on lines +131 to +140
artifactType := s.artifactType
if artifactType == "" {
artifactType = extractString(pc.Current, "artifact_type", "")
}

if s.latestOnly {
art, err := bb.ReadLatest(ctx, phase)
if err != nil {
return nil, fmt.Errorf("blackboard_read step %q: %w", s.name, err)
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When latest_only is true, artifact_type is ignored (ReadLatest only filters by phase). This makes configs that specify both artifact_type and latest_only behave unexpectedly and can return the wrong artifact if multiple types exist in a phase. Consider adding a ReadLatestByType(phase,type) query, or document that artifact_type is ignored in latest_only mode.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment on lines +44 to +52
func InjectBlackboardInput(ctx context.Context, app modular.Application, cfg InputFromBlackboard, pc *module.PipelineContext) (string, error) {
if cfg.Phase == "" {
return "", nil
}

var bb *Blackboard
if svc, ok := app.SvcRegistry()["ratchet-blackboard"]; ok {
bb, _ = svc.(*Blackboard)
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InjectBlackboardInput assumes app is non-nil and calls app.SvcRegistry() unconditionally; if a step is constructed/executed without an application context, this will panic. Add an early if app == nil { return "", nil } guard (similar to other steps) to keep the helper safe to call.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment on lines +60 to +70
art, err := bb.ReadLatest(ctx, cfg.Phase)
if err != nil {
return "", fmt.Errorf("input_from_blackboard: read latest phase %q: %w", cfg.Phase, err)
}
if art == nil {
return "", nil
}
if promptMode {
return fmt.Sprintf("[Blackboard artifact — phase: %s, type: %s]\n%v", art.Phase, art.Type, art.Content), nil
}
pc.Current["blackboard_input"] = artifactToMap(*art)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LatestOnly mode this always reads the latest artifact for the phase, ignoring cfg.ArtifactType. If multiple artifact types are posted to the same phase, this can inject unrelated data. Consider filtering by both phase+type for LatestOnly, or make ArtifactType required when LatestOnly is true and enforce it in the query.

Suggested change
art, err := bb.ReadLatest(ctx, cfg.Phase)
if err != nil {
return "", fmt.Errorf("input_from_blackboard: read latest phase %q: %w", cfg.Phase, err)
}
if art == nil {
return "", nil
}
if promptMode {
return fmt.Sprintf("[Blackboard artifact — phase: %s, type: %s]\n%v", art.Phase, art.Type, art.Content), nil
}
pc.Current["blackboard_input"] = artifactToMap(*art)
if cfg.ArtifactType == "" {
return "", fmt.Errorf("input_from_blackboard: artifact_type is required when latest_only is true for phase %q", cfg.Phase)
}
artifacts, err := bb.Read(ctx, cfg.Phase, cfg.ArtifactType)
if err != nil {
return "", fmt.Errorf("input_from_blackboard: read latest phase %q and type %q: %w", cfg.Phase, cfg.ArtifactType, err)
}
if len(artifacts) == 0 {
return "", nil
}
art := artifacts[len(artifacts)-1]
if promptMode {
return fmt.Sprintf("[Blackboard artifact — phase: %s, type: %s]\n%v", art.Phase, art.Type, art.Content), nil
}
pc.Current["blackboard_input"] = artifactToMap(art)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment thread orchestrator/plugin.go
Comment on lines +764 to +770
bb := NewBlackboard(db, hub)
if err := bb.Migrate(context.Background()); err != nil {
app.Logger().Warn("blackboard migrate failed", "error", err)
}

_ = app.RegisterService("ratchet-blackboard", bb)
return nil
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If bb.Migrate fails, the hook still registers the Blackboard service. That can lead to runtime failures later (Post/Read queries against a missing table) while startup continues. Consider returning the migration error (or skipping registration) so the system fails fast or at least doesn't register a non-functional blackboard instance.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

Comment on lines +273 to +278
// runCommand executes a shell command and returns stdout or an error.
func runCommand(name string, args ...string) (string, error) {
cmd := exec.Command(name, args...)
out, err := cmd.Output()
return string(out), err
}
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runCommand uses exec.Command(...).Output() with no context/timeout and discards stderr. For git/gh/docker this can hang indefinitely and error messages will be hard to diagnose. Consider using exec.CommandContext with a bounded timeout (or the step ctx) and CombinedOutput so failures include stderr.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 04e8de0

- Use reflect.DeepEqual for immutability diffing (not fmt.Sprintf)
- Add int64/float64 type switches for max_iterations_per_cycle and max_command_length YAML decode
- Clarify Evaluate() comment: uses default rules only, not scope-aware
- Update matchPattern comment to describe exact + trailing-* prefix match behavior
- Document that artifact_type is ignored when latest_only=true in BlackboardReadStep
- Add app == nil guard in InjectBlackboardInput before SvcRegistry() call
- When latest_only=true and artifact_type is set, filter by both phase+type
- Skip blackboard registration if Migrate fails (log error, return early)
- Replace runCommand with exec.CommandContext (2m timeout) + CombinedOutput to capture stderr

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@intel352 intel352 enabled auto-merge (squash) April 13, 2026 13:33
@intel352 intel352 disabled auto-merge April 13, 2026 17:04
@intel352 intel352 merged commit 0ea591d into master Apr 13, 2026
4 checks passed
@intel352 intel352 deleted the self-improve/test-agent branch April 13, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants