feat: v0.9.1 — integration lifecycle test, shared skills, composition execution

Ace Autonomous · Ace Autonomous · commit 0eae8f4c170a · 2026-03-28T11:09:19.000-04:00
1. Integration test Stages 4-6:
   - Stage 4: validateAndDeploy path (move proposal → validate → deploy)
     Exercises the EXACT code path where _skillAction scoping bug lived.
     Would have caught that critical bug before any manual review.
   - Stage 5: Activation tracking (recordActivation, getSkillStats,
     listActiveSkills, getSkillMaturity). Verifies the health pipeline
     records and reports correctly.
   - Stage 6: Evolution trigger path (checkMilestone, executeEvolve,
     isShortCircuitCandidate). Verifies the evolution modules connect
     to deployed skills without crashing.
   Full lifecycle: trace → analyze → propose → deploy → activate → evolve.

2. PHILOSOPHY.md: Mission statement on human-in-the-loop design.
   Why nothing auto-deploys. Why validation before deployment.
   Why milestone-based evolution. Why format types not channel names.
   Why provider agnosticism. Why research grounding.
   Written as a standalone document, not a competitor comparison.

3. Multi-agent shared skills: ACEFORGE_SHARED_SKILLS=true deploys
   approved skills to ~/.openclaw/skills/ (visible to ALL agents on
   the same machine) in addition to the per-workspace copy. Follows
   OpenClaw's native skill precedence: workspace &gt; shared &gt; bundled.

4. Composition execution bridge: proposeCompositionSkills() converts
   co-activation detections into actual workflow skill proposals via
   generateWorkflowSkillWithLLm. Wired into the agent_end Phase 2
   cycle. Closes the gap between "these skills activate together"
   and "here's a workflow that combines them."
diff --git a/.env.example b/.env.example
@@ -31,6 +31,12 @@
 # ── Notification Digest (optional — batch notifications) ─────────────
 # ACEFORGE_NOTIFY_DIGEST=true   # Queue notifications, flush as single message per cycle
 
+# ── Multi-Agent Shared Skills (optional) ─────────────────────────────
+# When true, approved skills are ALSO copied to ~/.openclaw/skills/
+# (visible to ALL agents on the same machine). Per-agent copy in
+# <workspace>/skills/ is always written regardless.
+# ACEFORGE_SHARED_SKILLS=true
+
 # ── OpenViking (optional — context-enriched challenges) ──────────────
 # ACEFORGE_VIKING_URL=http://127.0.0.1:1933
 
diff --git a/PHILOSOPHY.md b/PHILOSOPHY.md
@@ -0,0 +1,64 @@
+# Design Philosophy
+
+## Nothing Auto-Deploys
+
+AceForge generates skills. It proposes them. It validates them against 23 attack patterns. It scores them on structural quality and trace coverage. It even has an LLM judge evaluate borderline cases.
+
+But it never deploys a skill without your explicit approval.
+
+This is not a limitation — it is the core design constraint. Every other decision in AceForge follows from this one.
+
+## Why Human-in-the-Loop
+
+The [ClawHavoc campaign](https://www.antiy.net/p/clawhavoc-analysis-of-large-scale-poisoning-campaign-targeting-the-openclaw-skill-market-for-ai-agents/) distributed 1,184 malicious skills through ClawHub. Security researchers found that 20% of skills on the registry contained malicious payloads — reverse shells, credential exfiltration, prompt injection. These skills passed basic checks. They looked legitimate. They had reasonable names and descriptions.
+
+An auto-deploying system would have installed them.
+
+AceForge's position: **the person running the agent is the final authority on what that agent learns.** Skills generated from trace data are proposals, not mandates. The `/forge preview` command exists so you can read what a skill teaches in plain language before deciding. The `/forge quality` command exists so you can see the structural score. The unified diff in `/forge evolve` exists so you can see exactly what changed, line by line.
+
+Auto-deployment optimizes for speed. Human approval optimizes for trust. We chose trust.
+
+## Why Validation Before Deployment
+
+Every skill passes through a security validator before it can be deployed:
+
+- **Credential scanning** — API keys, tokens, passwords in skill text
+- **Path traversal** — attempts to read `~/.ssh`, `/etc/shadow`, or escape the workspace
+- **Git credential URLs** — `https://token@github.com` patterns
+- **Shell history access** — attempts to read `.bash_history` or `.zsh_history`
+- **SOUL.md injection** — attempts to override the agent's identity
+- **23 adversarial mutations** — the test suite generates known-bad skills and verifies they're caught
+
+This validation runs on every skill — LLM-generated, manually proposed, or upgraded. If a skill fails security validation, it's blocked and the user is told exactly why. No silent failures, no "warnings" that get ignored.
+
+## Why Milestone-Based Evolution, Not Continuous Mutation
+
+AceForge distills trace data at activation milestones (500, 2,000, 5,000 uses) rather than continuously mutating skills after every use. This follows [K2-Agent's SRLR loop](https://arxiv.org/abs/2603.00676) and [SAGE's Sequential Rollout](https://arxiv.org/abs/2512.17102).
+
+The reasoning: continuous mutation creates unstable skills that change faster than you can evaluate them. Milestone-based distillation gives skills time to accumulate operational wisdom before triggering a revision cycle. When a skill reaches 500 activations, it has enough data for statistically meaningful divergence detection. The revision at that point is informed, not reactive.
+
+And even then — the revision is a proposal. It goes through the same human approval gate as every other skill.
+
+## Why Format Types, Not Channel Names
+
+The notification formatting layer operates on format types (`html`, `markdown`, `mrkdwn`, `plain`), not channel names (`telegram`, `slack`, `discord`). Channel names appear exactly once, in a lookup table called `FORMAT_MAP`.
+
+This isn't academic purity. It prevents a real bug: Slack's `*` means bold, Discord's `*` means italic. If you hardcode channel names into formatting functions, adding a new channel means touching every function. With format types, adding a channel is one line in a table.
+
+## Why Provider-Agnostic LLM Pipeline
+
+AceForge's LLM pipeline supports both OpenAI-compatible (`/chat/completions`) and Anthropic-native (`/v1/messages`) API formats. Format auto-detected from the provider name or openclaw.json `api` field.
+
+This matters because vendor lock-in in LLM tooling is a trap. Models improve and change pricing monthly. The generator that works best today might not be the right choice in three months. AceForge should never be the reason you can't switch.
+
+13 providers have correct default URLs built in. Adding a new one is one line in `PROVIDER_DEFAULTS`.
+
+## Why Research Grounding
+
+Every major design decision in AceForge cites a specific paper and explains how the paper's finding informed the implementation. This is not decoration — it's engineering discipline.
+
+When [SkillsBench](https://arxiv.org/abs/2602.12670) found that 56% of agent skills are never invoked because their descriptions don't match how users phrase requests, that directly informed AceForge's trigger phrase check in the reviewer prompt and the description optimizer module.
+
+When [Single-Agent scaling](https://arxiv.org/abs/2601.04748) found that more skills don't always help and selection quality degrades at scale, that directly informed the escalating crystallization threshold (3→5 at 20+ skills).
+
+Research without implementation is theory. Implementation without research is guessing.
diff --git a/README.md b/README.md
@@ -24,6 +24,7 @@
 
 ## Table of Contents
 
+- [Design Philosophy](PHILOSOPHY.md)
 - [Why AceForge Exists](#why-aceforge-exists)
 - [How It Works](#how-it-works)
 - [Observation & Pattern Detection](#observation--pattern-detection)
@@ -465,6 +466,7 @@ AceForge is fully compatible with [OpenViking](https://github.com/volcengine/Ope
 | `ACEFORGE_SLACK_WEBHOOK_URL` | — | Slack incoming webhook |
 | `ACEFORGE_VIKING_URL` | `http://127.0.0.1:1933` | OpenViking URL (optional) |
 | `ACEFORGE_DRY_RUN` | `false` | Observation-only mode — log proposals without writing to disk |
+| `ACEFORGE_SHARED_SKILLS` | `false` | Deploy approved skills to `~/.openclaw/skills/` (shared across all agents) |
 
 </details>
 
diff --git a/index.ts b/index.ts
@@ -50,7 +50,7 @@ import { NATIVE_TOOLS } from "./src/pattern/constants.js";
 // ─── Phase 2 imports ────────────────────────────────────────────────────
 import { buildCapabilityTree, formatCapabilityTree, getPriorityDomains } from "./src/intelligence/capability-tree.js";
 import { mergePatterns, formatCrossSessionReport, getCrossSessionCandidates } from "./src/intelligence/cross-session.js";
-import { formatCompositionReport } from "./src/intelligence/composition.js";
+import { formatCompositionReport, proposeCompositionSkills } from "./src/intelligence/composition.js";
 import { summarizeBehaviorGaps, formatBehaviorGapReport, updateTreeWithBehaviorGaps } from "./src/intelligence/proactive-gaps.js";
 import { formatOptimizationReport } from "./src/intelligence/description-optimizer.js";
 import { handleCorrectionForSkill } from "./src/intelligence/auto-adjust.js";
@@ -148,6 +148,21 @@ async function validateAndDeploy(skillName: string): Promise<{ ok: boolean; mess
     recordRevision(skillName, deployedMd, "deploy", "Deployed via /forge approve");
   } catch { /* non-critical */ }
 
+  // Multi-agent: optionally deploy to shared skills directory too
+  if (process.env.ACEFORGE_SHARED_SKILLS === "true") {
+    const sharedDir = path.join(HOME, ".openclaw", "skills", skillName);
+    try {
+      fs.mkdirSync(sharedDir, { recursive: true });
+      const skillFiles = fs.readdirSync(path.join(SKILLS_DIR, skillName));
+      for (const file of skillFiles) {
+        fs.copyFileSync(path.join(SKILLS_DIR, skillName, file), path.join(sharedDir, file));
+      }
+      console.log(`[aceforge] shared skill deployed: ${skillName} → ~/.openclaw/skills/`);
+    } catch (err) {
+      console.warn(`[aceforge] shared skill deploy failed: ${(err as Error).message}`);
+    }
+  }
+
   notify(_skillAction("✅", "Skill deployed", skillName));
   return { ok: true, message: `Skill '${skillName}' deployed. Active now.` };
 }
@@ -326,6 +341,15 @@ function buildPlugin() {
             log.error(`[aceforge] capability tree error: ${(err as Error).message}`);
           }
 
+          // Phase 2C: Composition execution — propose workflow skills from co-activations
+          try {
+            proposeCompositionSkills().catch(err =>
+              log.error(`[aceforge] composition proposal error: ${(err as Error).message}`)
+            );
+          } catch (err) {
+            log.error(`[aceforge] composition error: ${(err as Error).message}`);
+          }
+
           // Phase 2D: Proactive behavior gap detection
           try {
             const behaviorGaps = summarizeBehaviorGaps();
diff --git a/src/intelligence/composition.ts b/src/intelligence/composition.ts
@@ -13,6 +13,12 @@ import * as os from "os";
 import * as fsSync from "fs";
 import * as path from "path";
 import { listActiveSkills } from "../skill/lifecycle.js";
+import { generateWorkflowSkillWithLLm } from "../skill/llm-generator.js";
+import { writeProposal } from "../skill/generator.js";
+import { validateSkillMd } from "../skill/validator.js";
+import { notify } from "../notify.js";
+import { bold, mono } from "../notify-format.js";
+import { appendJsonl } from "../pattern/store.js";
 
 const HOME = os.homedir() || process.env.HOME || "";
 const FORGE_DIR = path.join(HOME, ".openclaw", "workspace", ".forge");
@@ -163,6 +169,70 @@ export function getCompositionCandidates(): CompositionCandidate[] {
   return candidates;
 }
 
+// ─── Composition Execution — Bridge to Workflow Generation ──────────────
+// Converts co-activation candidates into actual workflow skill proposals.
+// This bridges the detector (detectCoActivations) with the generator
+// (generateWorkflowSkillWithLLm) to close the composition loop.
+
+export async function proposeCompositionSkills(): Promise<number> {
+  const candidates = getCompositionCandidates();
+  if (candidates.length === 0) return 0;
+
+  let proposed = 0;
+
+  for (const candidate of candidates.slice(0, 3)) { // max 3 per cycle
+    const proposalName = candidate.name;
+
+    // Skip if already proposed or deployed
+    const proposalDir = path.join(FORGE_DIR, "proposals", proposalName);
+    if (fsSync.existsSync(proposalDir)) continue;
+    const skillDir = path.join(HOME, ".openclaw", "workspace", "skills", proposalName);
+    if (fsSync.existsSync(skillDir)) continue;
+
+    // Build a ChainCandidate-compatible structure from co-activation data
+    const chainCandidate = {
+      toolSequence: candidate.skills.map(s =>
+        s.replace(/-(guard|skill|v\d+|rev\d+|upgrade|operations|workflow).*$/, "")
+      ),
+      occurrences: candidate.sessionsObserved,
+      successRate: candidate.coActivationRate,
+      distinctSessions: candidate.sessionsObserved,
+      sampleTraces: [] as Array<{ tool: string; args_summary?: string; result_summary?: string; success: boolean; error?: string }[]>,
+    };
+
+    try {
+      const result = await generateWorkflowSkillWithLLm(chainCandidate);
+      if (!result || result.verdict === "REJECT") continue;
+
+      const validation = validateSkillMd(result.skillMd, proposalName);
+      if (validation.errors.some((e: string) => e.startsWith("BLOCKED:"))) continue;
+
+      writeProposal(proposalName, result.skillMd);
+      appendJsonl("candidates.jsonl", {
+        ts: new Date().toISOString(),
+        tool: candidate.skills.join("+"),
+        type: "composition",
+        occurrences: candidate.sessionsObserved,
+        coActivationRate: candidate.coActivationRate,
+      });
+
+      notify(
+        `📋 ${bold("Composition Skill Proposal")}\n\n` +
+        `${bold(proposalName)}\n` +
+        `${candidate.skills.join(" + ")} activate together ${Math.round(candidate.coActivationRate * 100)}% of sessions (${candidate.sessionsObserved} observed)\n\n` +
+        `${mono("/forge preview " + proposalName)}\n${mono("/forge approve " + proposalName)}`
+      ).catch(() => {});
+
+      proposed++;
+      console.log(`[aceforge] composition proposal: ${proposalName}`);
+    } catch (err) {
+      console.error(`[aceforge] composition generation error: ${(err as Error).message}`);
+    }
+  }
+
+  return proposed;
+}
+
 // ─── Format for Display ─────────────────────────────────────────────────
 
 export function formatCompositionReport(): string {
diff --git a/tests/test-integration.ts b/tests/test-integration.ts