Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,10 @@
# Changelog

## 0.0.10-beta.12 — 2026-05-10

### Features
- Pi: enforce `Stop` policies (`require-commit-before-stop`, `require-push-before-stop`, etc.) on the next user turn via `before_agent_start` injection. Pi's `AgentEndEvent` has no Result type — by the time it fires, Pi's agent loop has already exited, so a deny return cannot keep Pi running the way Claude's exit-2-from-Stop can. Empirically observed: a user on Pi with `require-commit-before-stop` enabled saw the deny `reason` ("You have uncommitted changes …") propagate to Pi but Pi exited anyway. Fix: the `pi-extension/index.ts` shim captures any deny `reason` from `agent_end` into a per-`sessionId` in-memory map, then on the next `before_agent_start` (Pi v0.73.x — fires after the user submits a prompt, before the agent loop runs) returns `{systemPrompt: <event.systemPrompt> + "\n\n" + reason}` so the LLM sees a `MANDATORY ACTION REQUIRED` directive at the top of its next turn. The map is one-shot per drain and is cleared on every `session_shutdown` reason (including `quit`), so a stale gate cannot leak into a fresh session started in the same Pi process. `policy-evaluator.ts` now emits the `MANDATORY ACTION REQUIRED from failproofai (policy: …)` wrapper inside `reason` for `cli === "pi" && eventType === "Stop"` (both the deny and instruct paths), mirroring the Cursor/Gemini/Copilot/OpenCode Stop branches; non-Stop Pi events keep the existing flat `{permission, reason}` shape. Bounded by Pi process lifetime — same bound Claude has on exit-2-from-Stop (kill the agent and the gate is missed). New shim unit tests cover capture/drain/one-shot/`session_shutdown`-clear/no-pending-noop/missing-systemPrompt; new policy-evaluator unit tests pin the Pi-Stop deny + instruct payload shapes and verify non-Stop Pi events keep the legacy shape; a new e2e test pins the binary's stdout shape (`{permission:"deny", reason:/MANDATORY ACTION REQUIRED.*require-commit-before-stop.*uncommitted changes/}`) for `agent_end` in a dirty repo (#341).

## 0.0.10-beta.11 — 2026-05-10

### Fixes
Expand Down
21 changes: 11 additions & 10 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -218,18 +218,19 @@ pi-extension. Same self-reference caveat applies — do **not** install the
standard `npx` form from inside this repo.

**Pi limitations vs. Claude semantics** (verified against pi-coding-agent
v0.72.1 d.ts; the `pi-extension/` shim subscribes to 7 events but Pi's API
v0.73.1 d.ts; the `pi-extension/` shim subscribes to 8 events but Pi's API
caps what each handler can do):

| Pi event | → Claude event | Veto / mutate? | Notes |
|--------------------|------------------|----------------|-------|
| `tool_call` | PreToolUse | ✅ block | Full deny support via `{block, reason}`. |
| `user_bash` | PreToolUse | ✅ block | Full deny support. |
| `input` | UserPromptSubmit | ✅ block | Full deny support. |
| `session_start` | SessionStart | observation | No return-value effect on Pi. |
| `tool_result` | PostToolUse | observation | `ToolResultEventResult` exposes `{content, details, isError}` for mutation but no `block`. PostToolUse is observation/sanitize anyway, matching Claude semantics. |
| `agent_end` | Stop | observation | Pi's agent loop has already exited; we cannot keep Pi running the way Claude's exit-2-from-Stop can. `require-*-before-stop` policies still RUN — their findings land in the activity store + stderr — but the stop is not vetoed. |
| `session_shutdown` | SessionEnd | observation | Symmetry only. |
| Pi event | → Claude event | Veto / mutate? | Notes |
|----------------------|------------------|-----------------|-------|
| `tool_call` | PreToolUse | ✅ block | Full deny support via `{block, reason}`. |
| `user_bash` | PreToolUse | ✅ block | Full deny support. |
| `input` | UserPromptSubmit | ✅ block | Full deny support. |
| `session_start` | SessionStart | observation | No return-value effect on Pi. |
| `tool_result` | PostToolUse | observation | `ToolResultEventResult` exposes `{content, details, isError}` for mutation but no `block`. PostToolUse is observation/sanitize anyway, matching Claude semantics. |
| `agent_end` | Stop | shifted (next turn) | Pi's `AgentEndEvent` has no Result type — we cannot retry the same loop the way Claude's exit-2-from-Stop can. The shim captures any deny `reason` and stashes it keyed by sessionId for the next `before_agent_start` handler to drain. The 5 `require-*-before-stop` builtins thus enforce by gating the NEXT user turn's system prompt. Bounded by Pi process lifetime — same bound Claude has on exit-2-from-Stop. |
| `before_agent_start` | (Pi-only handoff) | systemPrompt | Drains any pending Stop deny captured at `agent_end`, returning `{systemPrompt: <event.systemPrompt> + "\n\n" + reason}` so the LLM sees the MANDATORY ACTION directive before its next turn. Multiple extensions chain. No injection when no block is pending. |
| `session_shutdown` | SessionEnd | observation | Symmetry only. Also clears any pending stop-block for the session id (every reason, not just `new`/`resume`/`fork`). |

**Instruct (`additionalContext`) on Pi `tool_call`** — Pi's
`ToolCallEventResult` shape is `{block?, reason?}` only; there's no
Expand Down
33 changes: 33 additions & 0 deletions __tests__/e2e/hooks/pi-integration.e2e.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,39 @@ describe("E2E: Pi integration — hook protocol (handler-only)", () => {
}
});

it("agent_end with require-commit-before-stop in a dirty repo emits MANDATORY ACTION reason", () => {
const env = createPiEnv();
try {
// Make env.cwd a git repo with an uncommitted file so
// require-commit-before-stop returns deny. This is the bridge to the
// shim-side handoff: the binary's stdout MUST be a Pi-flat
// `{permission:"deny", reason}` payload whose reason carries the
// "MANDATORY ACTION REQUIRED" wrapper. The shim captures that
// reason on agent_end and re-injects it via before_agent_start (the
// in-process map handoff is covered by the shim unit tests).
execSync("git init -q && git config user.email t@t && git config user.name t", { cwd: env.cwd });
writeFileSync(resolve(env.cwd, "dirty.txt"), "uncommitted\n");
writeConfig(env.cwd, ["require-commit-before-stop"]);
const result = runHook(
"agent_end",
PiPayloads.agentEnd(env.cwd),
{ homeDir: env.home, cli: "pi" },
);
// Stop on Pi uses the MANDATORY ACTION wrapping (not the generic
// `Blocked <displayTool> by failproofai because: …` wording that
// assertPiDeny matches), so we inline the deny shape checks here.
expect(result.exitCode).toBe(0);
expect(result.parsed?.permission).toBe("deny");
expect(result.parsed?.hookSpecificOutput).toBeUndefined();
const reason = String(result.parsed?.reason ?? "");
expect(reason).toMatch(/MANDATORY ACTION REQUIRED/);
expect(reason).toMatch(/require-commit-before-stop/);
expect(reason).toMatch(/uncommitted changes/i);
} finally {
env.cleanup();
}
});

it("agent-settings guard: Bash read of .pi/settings.json is denied", () => {
const env = createPiEnv();
try {
Expand Down
134 changes: 133 additions & 1 deletion __tests__/hooks/pi-extension-shim.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,23 @@ interface PiExtensionApi {

const captured: CapturedCall[] = [];

/** Per-event stdout reply queue for the spawnSync mock. Tests that need
* to simulate a binary deny set `mockSpawnReplyByEvent[<eventName>]`
* to a JSON string before invoking the matching handler. Any event not
* in the map gets the default empty stdout. */
const mockSpawnReplyByEvent: Record<string, string | undefined> = {};

function eventNameFromArgs(args: string[]): string | undefined {
const i = args.indexOf("--hook");
return i >= 0 ? args[i + 1] : undefined;
}

vi.mock("node:child_process", () => ({
spawnSync: (_cmd: string, args: string[], opts: { input?: string }) => {
captured.push({ args: args ?? [], payload: JSON.parse(opts?.input ?? "{}") });
return { pid: 0, output: [], status: 0, signal: null, stderr: "", stdout: "" };
const evt = eventNameFromArgs(args ?? []);
const stdout = (evt && mockSpawnReplyByEvent[evt]) ?? "";
return { pid: 0, output: [], status: 0, signal: null, stderr: "", stdout };
},
}));

Expand All @@ -43,6 +56,7 @@ describe("pi-extension shim — sessionId resolution via on-disk discovery", ()

beforeEach(async () => {
captured.length = 0;
for (const k of Object.keys(mockSpawnReplyByEvent)) delete mockSpawnReplyByEvent[k];
handlers = {};
piRoot = mkdtempSync(join(tmpdir(), "pi-shim-test-"));
originalEnv = process.env.PI_SESSIONS_DIR;
Expand Down Expand Up @@ -240,4 +254,122 @@ describe("pi-extension shim — sessionId resolution via on-disk discovery", ()
});
});

/**
* Pi cannot veto `agent_end` directly (Pi's AgentEndEvent has no Result type).
* The shim captures any deny reason and re-injects it as a `systemPrompt`
* suffix on the next `before_agent_start`. These tests cover that handoff.
*/
describe("pi-extension shim — agent_end → before_agent_start stop-block handoff", () => {
let handlers: Record<string, (event: unknown) => unknown> = {};
let piRoot: string;
let originalEnv: string | undefined;
const SID = "ffffffff-ffff-ffff-ffff-ffffffffffff";

beforeEach(async () => {
captured.length = 0;
for (const k of Object.keys(mockSpawnReplyByEvent)) delete mockSpawnReplyByEvent[k];
handlers = {};
piRoot = mkdtempSync(join(tmpdir(), "pi-shim-handoff-"));
originalEnv = process.env.PI_SESSIONS_DIR;
process.env.PI_SESSIONS_DIR = piRoot;
// Seed a transcript so resolveSessionId returns a stable id.
const dir = join(piRoot, piEncodeCwd("/proj"));
mkdirSync(dir, { recursive: true });
writeFileSync(join(dir, `2026-05-09T00-00-00-000Z_${SID}.jsonl`), "{}\n");
vi.resetModules();
const mod = await import("../../pi-extension/index");
mod.default({ on: (name, fn) => { handlers[name] = fn; } });
});

afterEach(() => {
if (originalEnv === undefined) delete process.env.PI_SESSIONS_DIR;
else process.env.PI_SESSIONS_DIR = originalEnv;
rmSync(piRoot, { recursive: true, force: true });
});

it("agent_end deny is captured and drained on next before_agent_start as a systemPrompt suffix", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({
permission: "deny",
reason: "MANDATORY ACTION REQUIRED from failproofai (policy: require-commit-before-stop): commit now.",
});
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
// No reply value from agent_end (Pi cannot veto stop).
const result = handlers.before_agent_start({
type: "before_agent_start",
prompt: "next prompt",
systemPrompt: "BASE",
cwd: "/proj",
}) as { systemPrompt?: string } | undefined;
expect(result?.systemPrompt).toBe(
"BASE\n\nMANDATORY ACTION REQUIRED from failproofai (policy: require-commit-before-stop): commit now.",
);
});

it("before_agent_start with no pending block returns undefined", () => {
const result = handlers.before_agent_start({
type: "before_agent_start",
prompt: "p",
systemPrompt: "BASE",
cwd: "/proj",
});
expect(result).toBeUndefined();
});

it("the stop-block is one-shot: a second before_agent_start in the same session does not re-fire", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({ permission: "deny", reason: "X" });
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
const first = handlers.before_agent_start({ type: "before_agent_start", systemPrompt: "B", cwd: "/proj" }) as { systemPrompt?: string };
expect(first?.systemPrompt).toBe("B\n\nX");
const second = handlers.before_agent_start({ type: "before_agent_start", systemPrompt: "B", cwd: "/proj" });
expect(second).toBeUndefined();
});

it("session_shutdown clears the pending stop-block (quit reason too, not just new/resume/fork)", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({ permission: "deny", reason: "X" });
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
handlers.session_shutdown({ type: "session_shutdown", reason: "quit", cwd: "/proj" });
// Even though `quit` retains the cached sessionId, the pending block must
// be dropped so a future before_agent_start (e.g. in the next session
// started in this process) doesn't inherit a stale gate.
const result = handlers.before_agent_start({
type: "before_agent_start",
systemPrompt: "B",
cwd: "/proj",
});
expect(result).toBeUndefined();
});

it("agent_end with allow stdout (empty reason) does NOT set a pending block", () => {
// Default mock returns empty stdout → callPolicy returns {block:false}.
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
const result = handlers.before_agent_start({
type: "before_agent_start",
systemPrompt: "B",
cwd: "/proj",
});
expect(result).toBeUndefined();
});

it("before_agent_start without a resolvable sessionId is a no-op", () => {
// Use a cwd that has no on-disk transcript — sessionId discovery returns
// undefined and the handler must early-return without throwing.
const result = handlers.before_agent_start({
type: "before_agent_start",
systemPrompt: "B",
cwd: "/no-such-cwd",
});
expect(result).toBeUndefined();
});

it("before_agent_start with no systemPrompt in the event still injects (uses empty base)", () => {
mockSpawnReplyByEvent["agent_end"] = JSON.stringify({ permission: "deny", reason: "Y" });
handlers.agent_end({ type: "agent_end", cwd: "/proj" });
const result = handlers.before_agent_start({
type: "before_agent_start",
cwd: "/proj",
}) as { systemPrompt?: string };
expect(result?.systemPrompt).toBe("\n\nY");
});
});

import { afterEach } from "vitest";
Loading