Skip to content

Commit 0eae8f4

Browse files
Ace AutonomousAce Autonomous
authored andcommitted
feat: v0.9.1 — integration lifecycle test, shared skills, composition execution
1. Integration test Stages 4-6: - Stage 4: validateAndDeploy path (move proposal → validate → deploy) Exercises the EXACT code path where _skillAction scoping bug lived. Would have caught that critical bug before any manual review. - Stage 5: Activation tracking (recordActivation, getSkillStats, listActiveSkills, getSkillMaturity). Verifies the health pipeline records and reports correctly. - Stage 6: Evolution trigger path (checkMilestone, executeEvolve, isShortCircuitCandidate). Verifies the evolution modules connect to deployed skills without crashing. Full lifecycle: trace → analyze → propose → deploy → activate → evolve. 2. PHILOSOPHY.md: Mission statement on human-in-the-loop design. Why nothing auto-deploys. Why validation before deployment. Why milestone-based evolution. Why format types not channel names. Why provider agnosticism. Why research grounding. Written as a standalone document, not a competitor comparison. 3. Multi-agent shared skills: ACEFORGE_SHARED_SKILLS=true deploys approved skills to ~/.openclaw/skills/ (visible to ALL agents on the same machine) in addition to the per-workspace copy. Follows OpenClaw's native skill precedence: workspace > shared > bundled. 4. Composition execution bridge: proposeCompositionSkills() converts co-activation detections into actual workflow skill proposals via generateWorkflowSkillWithLLm. Wired into the agent_end Phase 2 cycle. Closes the gap between "these skills activate together" and "here's a workflow that combines them."
1 parent c2b1801 commit 0eae8f4

6 files changed

Lines changed: 331 additions & 6 deletions

File tree

.env.example

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@
3131
# ── Notification Digest (optional — batch notifications) ─────────────
3232
# ACEFORGE_NOTIFY_DIGEST=true # Queue notifications, flush as single message per cycle
3333

34+
# ── Multi-Agent Shared Skills (optional) ─────────────────────────────
35+
# When true, approved skills are ALSO copied to ~/.openclaw/skills/
36+
# (visible to ALL agents on the same machine). Per-agent copy in
37+
# <workspace>/skills/ is always written regardless.
38+
# ACEFORGE_SHARED_SKILLS=true
39+
3440
# ── OpenViking (optional — context-enriched challenges) ──────────────
3541
# ACEFORGE_VIKING_URL=http://127.0.0.1:1933
3642

PHILOSOPHY.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Design Philosophy
2+
3+
## Nothing Auto-Deploys
4+
5+
AceForge generates skills. It proposes them. It validates them against 23 attack patterns. It scores them on structural quality and trace coverage. It even has an LLM judge evaluate borderline cases.
6+
7+
But it never deploys a skill without your explicit approval.
8+
9+
This is not a limitation — it is the core design constraint. Every other decision in AceForge follows from this one.
10+
11+
## Why Human-in-the-Loop
12+
13+
The [ClawHavoc campaign](https://www.antiy.net/p/clawhavoc-analysis-of-large-scale-poisoning-campaign-targeting-the-openclaw-skill-market-for-ai-agents/) distributed 1,184 malicious skills through ClawHub. Security researchers found that 20% of skills on the registry contained malicious payloads — reverse shells, credential exfiltration, prompt injection. These skills passed basic checks. They looked legitimate. They had reasonable names and descriptions.
14+
15+
An auto-deploying system would have installed them.
16+
17+
AceForge's position: **the person running the agent is the final authority on what that agent learns.** Skills generated from trace data are proposals, not mandates. The `/forge preview` command exists so you can read what a skill teaches in plain language before deciding. The `/forge quality` command exists so you can see the structural score. The unified diff in `/forge evolve` exists so you can see exactly what changed, line by line.
18+
19+
Auto-deployment optimizes for speed. Human approval optimizes for trust. We chose trust.
20+
21+
## Why Validation Before Deployment
22+
23+
Every skill passes through a security validator before it can be deployed:
24+
25+
- **Credential scanning** — API keys, tokens, passwords in skill text
26+
- **Path traversal** — attempts to read `~/.ssh`, `/etc/shadow`, or escape the workspace
27+
- **Git credential URLs**`https://token@github.com` patterns
28+
- **Shell history access** — attempts to read `.bash_history` or `.zsh_history`
29+
- **SOUL.md injection** — attempts to override the agent's identity
30+
- **23 adversarial mutations** — the test suite generates known-bad skills and verifies they're caught
31+
32+
This validation runs on every skill — LLM-generated, manually proposed, or upgraded. If a skill fails security validation, it's blocked and the user is told exactly why. No silent failures, no "warnings" that get ignored.
33+
34+
## Why Milestone-Based Evolution, Not Continuous Mutation
35+
36+
AceForge distills trace data at activation milestones (500, 2,000, 5,000 uses) rather than continuously mutating skills after every use. This follows [K2-Agent's SRLR loop](https://arxiv.org/abs/2603.00676) and [SAGE's Sequential Rollout](https://arxiv.org/abs/2512.17102).
37+
38+
The reasoning: continuous mutation creates unstable skills that change faster than you can evaluate them. Milestone-based distillation gives skills time to accumulate operational wisdom before triggering a revision cycle. When a skill reaches 500 activations, it has enough data for statistically meaningful divergence detection. The revision at that point is informed, not reactive.
39+
40+
And even then — the revision is a proposal. It goes through the same human approval gate as every other skill.
41+
42+
## Why Format Types, Not Channel Names
43+
44+
The notification formatting layer operates on format types (`html`, `markdown`, `mrkdwn`, `plain`), not channel names (`telegram`, `slack`, `discord`). Channel names appear exactly once, in a lookup table called `FORMAT_MAP`.
45+
46+
This isn't academic purity. It prevents a real bug: Slack's `*` means bold, Discord's `*` means italic. If you hardcode channel names into formatting functions, adding a new channel means touching every function. With format types, adding a channel is one line in a table.
47+
48+
## Why Provider-Agnostic LLM Pipeline
49+
50+
AceForge's LLM pipeline supports both OpenAI-compatible (`/chat/completions`) and Anthropic-native (`/v1/messages`) API formats. Format auto-detected from the provider name or openclaw.json `api` field.
51+
52+
This matters because vendor lock-in in LLM tooling is a trap. Models improve and change pricing monthly. The generator that works best today might not be the right choice in three months. AceForge should never be the reason you can't switch.
53+
54+
13 providers have correct default URLs built in. Adding a new one is one line in `PROVIDER_DEFAULTS`.
55+
56+
## Why Research Grounding
57+
58+
Every major design decision in AceForge cites a specific paper and explains how the paper's finding informed the implementation. This is not decoration — it's engineering discipline.
59+
60+
When [SkillsBench](https://arxiv.org/abs/2602.12670) found that 56% of agent skills are never invoked because their descriptions don't match how users phrase requests, that directly informed AceForge's trigger phrase check in the reviewer prompt and the description optimizer module.
61+
62+
When [Single-Agent scaling](https://arxiv.org/abs/2601.04748) found that more skills don't always help and selection quality degrades at scale, that directly informed the escalating crystallization threshold (3→5 at 20+ skills).
63+
64+
Research without implementation is theory. Implementation without research is guessing.

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424

2525
## Table of Contents
2626

27+
- [Design Philosophy](PHILOSOPHY.md)
2728
- [Why AceForge Exists](#why-aceforge-exists)
2829
- [How It Works](#how-it-works)
2930
- [Observation & Pattern Detection](#observation--pattern-detection)
@@ -465,6 +466,7 @@ AceForge is fully compatible with [OpenViking](https://github.com/volcengine/Ope
465466
| `ACEFORGE_SLACK_WEBHOOK_URL` || Slack incoming webhook |
466467
| `ACEFORGE_VIKING_URL` | `http://127.0.0.1:1933` | OpenViking URL (optional) |
467468
| `ACEFORGE_DRY_RUN` | `false` | Observation-only mode — log proposals without writing to disk |
469+
| `ACEFORGE_SHARED_SKILLS` | `false` | Deploy approved skills to `~/.openclaw/skills/` (shared across all agents) |
468470

469471
</details>
470472

index.ts

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ import { NATIVE_TOOLS } from "./src/pattern/constants.js";
5050
// ─── Phase 2 imports ────────────────────────────────────────────────────
5151
import { buildCapabilityTree, formatCapabilityTree, getPriorityDomains } from "./src/intelligence/capability-tree.js";
5252
import { mergePatterns, formatCrossSessionReport, getCrossSessionCandidates } from "./src/intelligence/cross-session.js";
53-
import { formatCompositionReport } from "./src/intelligence/composition.js";
53+
import { formatCompositionReport, proposeCompositionSkills } from "./src/intelligence/composition.js";
5454
import { summarizeBehaviorGaps, formatBehaviorGapReport, updateTreeWithBehaviorGaps } from "./src/intelligence/proactive-gaps.js";
5555
import { formatOptimizationReport } from "./src/intelligence/description-optimizer.js";
5656
import { handleCorrectionForSkill } from "./src/intelligence/auto-adjust.js";
@@ -148,6 +148,21 @@ async function validateAndDeploy(skillName: string): Promise<{ ok: boolean; mess
148148
recordRevision(skillName, deployedMd, "deploy", "Deployed via /forge approve");
149149
} catch { /* non-critical */ }
150150

151+
// Multi-agent: optionally deploy to shared skills directory too
152+
if (process.env.ACEFORGE_SHARED_SKILLS === "true") {
153+
const sharedDir = path.join(HOME, ".openclaw", "skills", skillName);
154+
try {
155+
fs.mkdirSync(sharedDir, { recursive: true });
156+
const skillFiles = fs.readdirSync(path.join(SKILLS_DIR, skillName));
157+
for (const file of skillFiles) {
158+
fs.copyFileSync(path.join(SKILLS_DIR, skillName, file), path.join(sharedDir, file));
159+
}
160+
console.log(`[aceforge] shared skill deployed: ${skillName} → ~/.openclaw/skills/`);
161+
} catch (err) {
162+
console.warn(`[aceforge] shared skill deploy failed: ${(err as Error).message}`);
163+
}
164+
}
165+
151166
notify(_skillAction("✅", "Skill deployed", skillName));
152167
return { ok: true, message: `Skill '${skillName}' deployed. Active now.` };
153168
}
@@ -326,6 +341,15 @@ function buildPlugin() {
326341
log.error(`[aceforge] capability tree error: ${(err as Error).message}`);
327342
}
328343

344+
// Phase 2C: Composition execution — propose workflow skills from co-activations
345+
try {
346+
proposeCompositionSkills().catch(err =>
347+
log.error(`[aceforge] composition proposal error: ${(err as Error).message}`)
348+
);
349+
} catch (err) {
350+
log.error(`[aceforge] composition error: ${(err as Error).message}`);
351+
}
352+
329353
// Phase 2D: Proactive behavior gap detection
330354
try {
331355
const behaviorGaps = summarizeBehaviorGaps();

src/intelligence/composition.ts

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,12 @@ import * as os from "os";
1313
import * as fsSync from "fs";
1414
import * as path from "path";
1515
import { listActiveSkills } from "../skill/lifecycle.js";
16+
import { generateWorkflowSkillWithLLm } from "../skill/llm-generator.js";
17+
import { writeProposal } from "../skill/generator.js";
18+
import { validateSkillMd } from "../skill/validator.js";
19+
import { notify } from "../notify.js";
20+
import { bold, mono } from "../notify-format.js";
21+
import { appendJsonl } from "../pattern/store.js";
1622

1723
const HOME = os.homedir() || process.env.HOME || "";
1824
const FORGE_DIR = path.join(HOME, ".openclaw", "workspace", ".forge");
@@ -163,6 +169,70 @@ export function getCompositionCandidates(): CompositionCandidate[] {
163169
return candidates;
164170
}
165171

172+
// ─── Composition Execution — Bridge to Workflow Generation ──────────────
173+
// Converts co-activation candidates into actual workflow skill proposals.
174+
// This bridges the detector (detectCoActivations) with the generator
175+
// (generateWorkflowSkillWithLLm) to close the composition loop.
176+
177+
export async function proposeCompositionSkills(): Promise<number> {
178+
const candidates = getCompositionCandidates();
179+
if (candidates.length === 0) return 0;
180+
181+
let proposed = 0;
182+
183+
for (const candidate of candidates.slice(0, 3)) { // max 3 per cycle
184+
const proposalName = candidate.name;
185+
186+
// Skip if already proposed or deployed
187+
const proposalDir = path.join(FORGE_DIR, "proposals", proposalName);
188+
if (fsSync.existsSync(proposalDir)) continue;
189+
const skillDir = path.join(HOME, ".openclaw", "workspace", "skills", proposalName);
190+
if (fsSync.existsSync(skillDir)) continue;
191+
192+
// Build a ChainCandidate-compatible structure from co-activation data
193+
const chainCandidate = {
194+
toolSequence: candidate.skills.map(s =>
195+
s.replace(/-(guard|skill|v\d+|rev\d+|upgrade|operations|workflow).*$/, "")
196+
),
197+
occurrences: candidate.sessionsObserved,
198+
successRate: candidate.coActivationRate,
199+
distinctSessions: candidate.sessionsObserved,
200+
sampleTraces: [] as Array<{ tool: string; args_summary?: string; result_summary?: string; success: boolean; error?: string }[]>,
201+
};
202+
203+
try {
204+
const result = await generateWorkflowSkillWithLLm(chainCandidate);
205+
if (!result || result.verdict === "REJECT") continue;
206+
207+
const validation = validateSkillMd(result.skillMd, proposalName);
208+
if (validation.errors.some((e: string) => e.startsWith("BLOCKED:"))) continue;
209+
210+
writeProposal(proposalName, result.skillMd);
211+
appendJsonl("candidates.jsonl", {
212+
ts: new Date().toISOString(),
213+
tool: candidate.skills.join("+"),
214+
type: "composition",
215+
occurrences: candidate.sessionsObserved,
216+
coActivationRate: candidate.coActivationRate,
217+
});
218+
219+
notify(
220+
`📋 ${bold("Composition Skill Proposal")}\n\n` +
221+
`${bold(proposalName)}\n` +
222+
`${candidate.skills.join(" + ")} activate together ${Math.round(candidate.coActivationRate * 100)}% of sessions (${candidate.sessionsObserved} observed)\n\n` +
223+
`${mono("/forge preview " + proposalName)}\n${mono("/forge approve " + proposalName)}`
224+
).catch(() => {});
225+
226+
proposed++;
227+
console.log(`[aceforge] composition proposal: ${proposalName}`);
228+
} catch (err) {
229+
console.error(`[aceforge] composition generation error: ${(err as Error).message}`);
230+
}
231+
}
232+
233+
return proposed;
234+
}
235+
166236
// ─── Format for Display ─────────────────────────────────────────────────
167237

168238
export function formatCompositionReport(): string {

0 commit comments

Comments
 (0)