GitHub - heggria/pi-taskflow: A declarative, verifiable graph of task nodes for the Pi coding agent — not a workflow you script, but a DAG you declare: statically verified before it runs, with dynamic fan-out, gates, isolated subagent context, and resumable runs. Zero deps.

pi-taskflow — a declarative, verifiable graph of task nodes for Pi subagents: stateful, resumable, context-isolated

English · 简体中文 · हिन्दी · Español · العربية · বাংলা · Português · Русский

A declarative, verifiable graph of tasks for Pi subagents.
Not a workflow you script — a DAG you declare. Fan out · gate · resume · save as a command — intermediate results stay out of your context.

pi install npm:pi-taskflow

A workflow flows. A taskflow is a graph. Other orchestrators let the model script the work — imperative code that flows step by step, with the graph hidden inside control flow. pi-taskflow does the opposite: you declare the work as a graph of discrete, named task nodes connected by dependsOn edges — and the runtime verifies that graph before it spends a single token.

You already know the built-in subagent tool's task / tasks / chain. pi-taskflow speaks the same shorthand — so your existing delegations instantly become tracked, resumable, and saveable as a one-word /tf:<name> command. When you outgrow the shorthand, the full DSL gives you a real DAG: dynamic fan-out over dozens of items, conditional routing, quality gates, human approvals, retries, and a hard spend ceiling.

And the whole time, only the final phase reaches your conversation. Every intermediate transcript stays in the runtime, never your context window.

Why "taskflow" and not "workflow"?

The name is the thesis. In engineering, a task is a discrete, declared unit of work — the node of a task graph (the same task a build system, scheduler, or compiler wires into a DAG). Work, by contrast, is fluid and unbounded — the continuous, imperative act of doing.

That distinction is exactly the design split in the Pi ecosystem:

work is a fluid imperative script whose graph hides in control flow and can't be verified before it runs; a taskflow is a declarative graph of discrete task nodes that is statically verified before any token is spent

A workflow (the dynamic, code-mode kind) is the model writing an imperative script that flows: await agent(...), an if, a for, another await. Expressive — it's Turing-complete — but the graph only exists as the code runs. You can't see it, diff it, or prove it terminates before you pay for it.
A taskflow moves the plan out of code and into a declarative graph of task nodes. Because the graph is data, the runtime can do what an imperative script structurally cannot: statically verify it (no cycles, no dead ends, no budget overflow, no dangling refs) before a single subagent spawns, render it (the live progress is the DAG), resume it phase-by-phase, and save it as a one-word command.

The trade we make on purpose: we give up the raw expressivity of arbitrary code to gain something an imperative script can't have — a graph that is verifiable, observable, replayable, and safe to generate with an LLM. When a job needs twelve steps with branching fan-out and a review gate, you want a graph you can check — not a script you hope runs right.

Why this exists

Here's the wall you hit with raw subagents: you describe a multi-step plan in prose, the model re-derives it every single run, the intermediate transcripts flood your context, and the moment one model call fails you start over from zero. There's no reuse, no recovery, no structure — and no way to check the plan before it burns tokens.

pi-taskflow moves the plan out of the prompt and into a declarative graph of task nodes. The runtime owns the DAG, the loops, the retries, and the intermediate state. You declare a pipeline once and run it a hundred times — by name. Because the plan is data, not prose and not code, it can be validated, visualized, and replayed.

With raw subagents every transcript floods your context; with pi-taskflow transcripts stay in the runtime and only the final result returns

Twelve steps, branching fan-out, a review gate, a spend cap — that's a graph, and you want to see and check it, not re-prompt it every run.

	subagent (built-in)	pi-taskflow
Who drives	the model, turn by turn	the runtime, from a definition
Topology	chain / flat parallel	DAG with layered concurrency + routing
Intermediate results	in your context window	in the runtime — not your context
Scale	a handful of tasks	dynamic `map` fan-out over dozens of items
Reusable	re-described every time	saved as `/tf:<name>`
Resumable	✗	✓ cross-session — cached phases auto-skip
Quality gates	✗	`gate` phases that halt on `VERDICT: BLOCK`
Conditional routing	✗	`when` guards + `join: any` OR-joins
Fault tolerance	✗	per-phase `retry` + auto-retry on transient errors
Human-in-the-loop	✗	`approval` phases (approve / reject / edit)
Cost control	✗	run-wide `budget` (USD / token caps)
Composition	✗	`flow` phases run saved sub-flows
Live progress	opaque while running	live DAG render with timing + cost
Ergonomics	inline JSON each time	shorthand (`task`/`tasks`/`chain`) or DSL

It doesn't replace the subagent tool. It gives your subagents a graph, a memory, and a name.

Declarative graph vs. imperative script

The closest thing to pi-taskflow in spirit is the dynamic / code-mode workflow — where the model writes a JavaScript orchestration script. It's powerful and genuinely expressive. But it sits at the opposite end of one fundamental axis: expressivity vs. verifiability.

	dynamic `workflow` (code-mode)	`pi-taskflow` (declarative graph)
The plan is	imperative JS the model writes & runs	declarative JSON data the runtime executes
The graph	implicit — hidden in `if`/`for`/`await` control flow	explicit — `phases[]` + `dependsOn` edges, a first-class object
Verify before running	✗ Turing-complete; can't prove it terminates	✓ static checks: no cycles, dead-ends, budget overflow, dangling refs
See it	✗ the graph only exists as the code runs	✓ the live progress render is the DAG
Resume	coarse (call-cache dedup)	✓ phase-by-phase input-hash resume, cross-session
Safe to LLM-generate	risky — it's executable code	*✓ it's just data — no `eval`; and a runtime-generated sub-flow is structurally validated* (cycles / dangling refs / duplicate ids) before it runs**
Expressivity ceiling	higher — arbitrary control flow	bounded by the DSL, but `map`/`when`/`loop`/`gate` — plus runtime-generated sub-flows (`flow {def}`) for plan-then-execute and iterative replanning — cover most jobs

We chose the verifiable side on purpose. The expressivity you give up is real; what you get back — a plan you can check, watch, replay, and safely let a model author — is what turns one-off prompting into durable orchestration.

Compared to other Pi extensions

The Pi ecosystem now has 20+ delegation, workflow, and orchestration extensions — each great at what it's for. Here's an honest map of where pi-taskflow sits (verified against each package's latest npm release, June 2026). For the full breakdown — every package, strengths and weaknesses — see PI-ECOSYSTEM.md. For the broader, non-Pi landscape (LangGraph, Temporal, CrewAI, Mastra…) see COMPETITORS.md.

Extension	Model	Custom DSL	DAG	Dynamic fan-out	Cross-session resume	Quality gate	Human approval	Save as command	Zero deps
pi-taskflow	declarative multi-phase taskflows	✓	✓	✓ `map`	✓ phase-hash	✓	✓	✓ `/tf:<name>`	✓
`@pi-agents/orchid`	opinionated 9-phase pipeline + Ralph loop	fixed	✓	✓	✓	✓	✓	✓	✕ (2)
`pi-crew`	role teams + git worktrees + async	partial	✓	✓	✓	✓	✓	–	✕ (7)
`ultimate-pi`	governed plan→execute→review harness	YAML contracts	✓ (plan-time)	✕	✓	✓ (3-tier)	✓	✓	✕ (16)
`@zhushanwen/pi-workflow`	JS scripts (`agent`/`parallel`/`pipeline`)	yes (JS)	✕ (linear)	✓	✓	✕	✕	✓ (call cache)	✓
`@fiale-plus/pi-rogue-orchestration`	timer loop + goal resolution	✕	✕	✕	✓	✓ (goal-check)	✕	✕	✓
`pi-subagents`	single / parallel / chain delegation	✕	✕	static	–	✕	clarify	named workflows	✕ (3)
`@gotgenes/pi-subagents`	Claude-Code-style subagents + worktrees	✕	✕	✕	✓ (by id)	✕	per-agent	✕	✕ (1)
`pi-pipeline`	fixed SPEC→PLAN→TASKS→VERIFY	✕	fixed	✕	session planning	✓	clarify	✕	✕ (2)
`pi-agent-flow`	one-shot parallel specialist `fork`	yes	✕	✕	–	✕	✕	–	✕ (2)

(Representative slice of the 20+ — see PI-ECOSYSTEM.md for all of them, plus @0xkobold/pi-orchestration, @melihmucuk/pi-crew, @mediadatafusion/pi-workflow-suite, gentle-pi, @dreki-gg/pi-subagent, and more.)

How to choose:

@pi-agents/orchid is the most feature-complete orchestrator in the ecosystem (DAG + worktrees + Ralph loop + agent mailbox) — but its DSL is a fixed 9-phase pipeline, it carries runtime deps + jiti, and it's beta. Reach for pi-taskflow when you want to define your own graph (not adopt an opinionated one) with zero dependencies and a one-command install.
pi-crew / ultimate-pi go heavier — worktree isolation, durable async teams, multi-tier governance. If you want lightweight, declarative, and zero-dependency, that's this project.
@zhushanwen/pi-workflow is the closest in spirit and also zero-dep, but it's the imperative side of the split above: you author workflows as JavaScript scripts the model writes and runs. pi-taskflow's declarative JSON DAG is the verifiable side — statically checkable, visualizable, safe to LLM-generate, and resumable at phase granularity rather than call-cache dedup.
@fiale-plus/pi-rogue-orchestration has a real loop-until-done (a feature pi-taskflow doesn't yet have). If your job is "keep going until the goal is met," it's worth a look; pi-taskflow is for structured, branching pipelines instead.
pi-subagents / @gotgenes/pi-subagents are the mature picks for ad-hoc "use reviewer on this diff" delegation and background jobs. pi-taskflow is for when those delegations need to become a repeatable, resumable pipeline.
pi-pipeline / pi-agent-flow ship opinionated, fixed flows. pi-taskflow ships an empty canvas: you (or the model) declare the graph that fits the job.

The honest one-liner: pi-taskflow is the only Pi extension that gives you a declarative, verifiable, resumable DAG of task nodes — saved as a one-word command, with zero runtime dependencies and context isolation by design. Where code-mode workflows let the model script the work, pi-taskflow lets it declare a graph the runtime can prove correct before running. The known gaps it's closing next: worktree isolation (see STRATEGY.md).

30-second start

1. Install — one command:

pi install npm:pi-taskflow

Optional: run /tf init once to map the 18 built-in agents' model roles (fast, strong, thinker, …) to your own models — an interactive picker. Skip it and agents just use Pi's default model. See Model roles.

2. Run — just ask the model in a Pi session:

Run a chain: first explore the auth flow, then summarize the findings.

The model calls the taskflow tool automatically. You get live progress, per-step timing, token cost, and a saved run record — same effort as the built-in tool, now tracked and resumable.

3. Save — say "save it" and you have /tf:<name> forever.

That's it. You can be running your first workflow before your coffee cools — without writing a single phase definition.

The shorthand (same shape as the built-in tool)

// Single — one agent, one job
{ "task": "Summarize the architecture of src/", "agent": "explorer" }

// Parallel — fire several at once, outputs merge
{ "tasks": [
  { "task": "Audit auth in src/api",             "agent": "analyst" },
  { "task": "Audit input validation in src/api", "agent": "analyst" }
] }

// Chain — sequential; each step sees the previous output
{ "chain": [
  { "task": "List the public API of src/lib", "agent": "scout" },
  { "task": "Write docs for:\n{previous.output}", "agent": "writer" }
] }

agent is optional (defaults to the first discovered agent). Add a name to label the run and unlock saving it as a command.

Watch it run

This is not a mockup. This is stdout from a real run — the self-improve flow that writes and verifies its own test suites, caught mid-flight by a quality gate:

⊗ taskflow self-improve  6/7 · blocked · $0.095
    ✓ discover            agent   deepseek-v4-flash  10t ↑38k ↓6.7k $0.011
  ┌ ✓ write-runner-tests  agent   claude-sonnet-4-6  10t ↑13 ↓6.6k $0.020
  ├ ✓ write-store-tests   agent   claude-sonnet-4-6  10t ↑11 ↓10k $0.018
  ├ ✓ write-agents-tests  agent   claude-sonnet-4-6  10t ↑28 ↓13k $0.030
  └ ✓ fix-stability       agent   claude-sonnet-4-6  10t ↑13 ↓3.9k $0.012
    ✓ verify              gate    BLOCK 3 type errors in test files  deepseek-v4-flash
    ⊘ report              reduce  skipped · Gate blocked  ↳ fix-stability

The layout is the DAG. No dashboard, no logs to grep — you read the progress bar and you understand the whole pipeline:

Header — ⊗ = blocked (a gate halted it); 6/7 phases processed; aggregate cost $0.095.
Status icons — ✓ done · ◐ running · ✗ failed · ⊘ skipped · ○ pending.
Rail ┌ ├ └ — phases in the same DAG layer, running concurrently. The four write-*/fix-stability tasks fan out from discover. A blank gutter = a single-phase layer.
↳ — a long, layer-skipping dependency. report depends on the adjacent verify and on fix-stability two layers back, so only that skip edge is annotated.
Gate — verify emitted VERDICT: BLOCK, so the runtime skipped report and ended the run as blocked, surfacing the reason inline.
Detail — per phase: model, token counts (↑in ↓out), cost, timing. Fan-out phases also show sub-task progress (3/15 2✗ 8▸).

Go declarative

The shorthand is your onramp. The DSL is where pi-taskflow earns its keep — dynamic fan-out, structured routing, and quality gates.

Fan out and reduce

{
  "name": "summarize-files",
  "description": "Discover files, summarize each, produce one report",
  "args": { "dir": { "default": "." } },
  "concurrency": 8,
  "phases": [
    { "id": "discover", "type": "agent", "agent": "scout",
      "task": "List source files under {args.dir} (non-recursive).\nOutput ONLY a JSON array [{\"file\":\"\"}]. No prose.",
      "output": "json" },
    { "id": "summarize", "type": "map",
      "over": "{steps.discover.json}", "as": "item", "agent": "scout",
      "task": "Read {item.file} and give a one-sentence summary.",
      "dependsOn": ["discover"] },
    { "id": "report", "type": "reduce", "from": ["summarize"], "agent": "writer",
      "task": "Combine into a short overview:\n{steps.summarize.output}",
      "dependsOn": ["summarize"], "final": true }
  ]
}

discover lists every file and emits a JSON array.
summarize is a map — it fans out one subagent per file, throttled to 8 concurrent, with {item.file} bound to each path.
report is a reduce — it merges every summary into one clean overview.

The intermediate summaries never enter your context. The runtime owns them; you get the report. Save it once → /tf:summarize-files dir=src forever.

Route, gate, retry, approve, and cap the spend

{
  "name": "triage-and-fix",
  "budget": { "maxUSD": 1.5 },
  "phases": [
    { "id": "triage", "type": "agent", "agent": "analyst", "output": "json",
      "task": "Classify the bug. Output ONLY {\"severity\":\"high\"} or {\"severity\":\"low\"}." },
    { "id": "deep",  "when": "{steps.triage.json.severity} == high", "dependsOn": ["triage"],
      "agent": "executor-code", "task": "Root-cause and patch it.",
      "retry": { "max": 2, "backoffMs": 500 } },
    { "id": "quick", "when": "{steps.triage.json.severity} == low",  "dependsOn": ["triage"],
      "agent": "executor-fast", "task": "Apply the quick fix." },
    { "id": "approve", "type": "approval", "join": "any", "dependsOn": ["deep", "quick"],
      "task": "Review the fix before it ships." },
    { "id": "ship", "type": "agent", "dependsOn": ["approve"],
      "task": "Open a PR with the change.", "final": true }
  ]
}

when routes to deep or quick from the triage JSON — the other branch is skipped.
join: "any" lets approve fire the moment whichever branch ran completes (an OR-join).
retry re-runs a flaky patch with backoff; budget halts the whole run if it gets too expensive.
approval pauses for a human (approve / reject / edit) before the final ship.

No scripting. No eval. Just data the runtime executes — safe enough to run LLM-generated definitions directly.

Phase types

type	what it does	required fields
`agent`	one subagent runs a single task	`task`
`parallel`	run `branches[]` concurrently	`branches` (array of `{task, agent?}`)
`map`	fan out over an array — one subagent per item, `{item}` bound	`over`, `task`
`gate`	quality/review step that can halt the flow	`task`
`reduce`	aggregate `from[]` phase outputs into one	`from`, `task`
`approval`	human-in-the-loop pause — approve / reject / edit	—
`flow`	run a sub-flow as one phase — a saved flow (`use`) or a runtime-generated one (`def`)	`use` \| `def`
`loop`	iterate a task until done — re-run a body until a condition, convergence, or a cap	`task`, `until`
`tournament`	N variants compete, a judge picks the best (or aggregates)	`task` \| `branches`

Common phase fields

Every phase needs a unique id and a type (defaults to agent). On top of the per-type fields:

Field	Meaning
`agent`	Agent to run (defaults to the first discovered agent)
`dependsOn`	Phase ids this phase waits for — builds the DAG
`join`	`"all"` (default) waits for every dep; `"any"` is an OR-join
`when`	Conditional guard — skip unless the expression is truthy
`retry`	`{ max, backoffMs?, factor? }` — retry a failing subagent
`output`	`"text"` (default) or `"json"` (exposes `{steps.ID.json}`)
`model` / `thinking` / `tools`	Per-phase overrides for the subagent
`cwd`	Working directory for the subagent
`concurrency`	Fan-out cap for `map` / `parallel` (overrides the flow default)
`final`	Marks the result-bearing phase (else the last phase wins)
`optional`	A failure here does not abort the run
`use` / `with`	(`flow`) saved sub-flow name + its args
`def`	(`flow`) inline sub-flow generated at runtime — usually `"{steps.plan.json}"` (mutually exclusive with `use`)
`cache`	`{ scope, ttl?, fingerprint? }` — cross-run memoization (see below)

Flow-level keys: name, description, args, concurrency (default 8), agentScope, and budget: { maxUSD?, maxTokens? }.

Control flow & reliability

when — skip a phase unless an expression is truthy. Supports {refs}, == != < > <= >=, && || !, parentheses, and quoted strings/numbers. Pair with join: "any" on the merge phase for real if/else routing. Parse errors fail open.
join: "any" — an OR-join: the phase runs as soon as one dependency completes (default "all" waits for all).
retry — { "max": 2, "backoffMs": 500, "factor": 2 } retries a failing subagent with fixed or exponential backoff; usage is summed and the attempt count shows as ↻N in the TUI. Transient provider errors (rate-limit / 5xx / timeout) auto-retry even without an explicit policy; hard errors don't.
approval — pause for a human (Approve / Reject / Edit). Reject halts the flow; Edit injects the typed note as the phase output for downstream steps. Non-interactive runs auto-reject (safety: approval gates are never bypassed).
flow — { "type": "flow", "use": "deep-research", "with": { "topic": "{item}" } } runs a saved flow as a phase (recursion is detected and rejected). Or generate the sub-flow at runtime: { "type": "flow", "def": "{steps.plan.json}" } resolves an upstream phase's JSON output into a sub-flow, validates it (cycles / dangling refs / duplicate ids), then runs it — the number and shape of the generated phases is decided at runtime, not authored in advance. A malformed plan fails open (the phase is skipped with a defError, the run continues). This is how a planner decides at runtime what work to spawn — the declarative answer to a code-mode for loop, with each generated plan checked before it spends a token. Pair it with loop for data-dependent iterative replanning (round N's plan depends on round N-1's result). See examples/dynamic-plan-execute.json and examples/iterative-replan.json.

Loop-until-done (`loop`)

Some work is inherently iterative — refine a draft until a reviewer is satisfied, retry-and-improve until tests pass, converge on an answer. A loop phase re-runs one task body until a stop condition holds:

{
  "id": "refine",
  "type": "loop",
  "task": "Improve this draft (iteration {loop.iteration}). Previous attempt:\n{loop.lastOutput}\n\nReturn JSON {\"draft\":\"…\",\"done\":true|false}.",
  "until": "{steps.refine.json.done} == true",   // the iteration's own output is exposed here
  "output": "json",
  "maxIterations": 6,        // default 10, hard cap 100 — the loop ALWAYS terminates
  "convergence": true        // default: stop early if an iteration's output is identical to the last
}

Body locals — the task can read {loop.iteration} (1-based), {loop.lastOutput} (the prior iteration's output), and {loop.maxIterations} to build on its own previous work; all three are also available to the until condition.
until — evaluated after each iteration with the iteration's output exposed as {steps.<thisId>.output} / .json. Same operators as when. The loop stops the moment it's truthy.
Always terminates. Four independent stops: until truthy, convergence (a fixed point — output identical to the previous iteration), maxIterations (hard-capped at 100), or a failing iteration (the phase fails with the partial output preserved). A malformed until stops the loop rather than spinning forever (fail-safe) and surfaces a warning on the phase.
The TUI shows ↻N with the stop reason (done / converged / max / failed); usage is summed across iterations. Like gate/approval, loop is excluded from cross-run cache (each run must iterate fresh).

Tournament (`tournament`)

For open-ended work, the best result often comes from generating several candidates and picking the strongest — best-of-N with a judge, in one declarative phase:

{
  "id": "headline",
  "type": "tournament",
  "task": "Write a punchy headline for this launch post.",
  "variants": 4,                    // spawn 4 competitors of the SAME task (default 3, max 20)
  "judge": "Pick the headline with the strongest hook and clearest promise.",
  "judgeAgent": "reviewer",          // optional; defaults to the phase agent
  "mode": "best"                     // "best" (default) | "aggregate"
}

Competitors — either variants: N copies of one task (diversity comes from model nondeterminism), or distinct branches: [{task, agent?}, …] when you want to pit different approaches against each other.
Judge — after the fan-out, one judge agent sees every variant (numbered) plus your judge rubric and picks a winner via a WINNER: <n> line or {"winner": n}. An unreadable verdict fails open to variant 1; a failed judge falls back too — the work is never lost.
mode — best returns the winning variant verbatim; aggregate returns the judge's synthesized answer combining the strongest parts.
Short-circuits: if only one competitor survives, it wins with no judge call; if all fail, the phase fails. The TUI shows ⚑ N→#k; usage sums variants + judge. Like gate, it's excluded from cross-run cache.
budget — a run-wide {maxUSD, maxTokens} ceiling; once exceeded, pending phases skip and in-flight fan-out stops spawning, ending the run as blocked.
idle watchdog — a subagent that goes silent for 5 minutes is treated as wedged and killed (SIGTERM → SIGKILL), so one hung child can never freeze the whole flow.

Cross-run memoization (`cache`)

Every phase is already content-addressed: within a single run's resume, a phase whose resolved inputs are unchanged is skipped. cache extends that reuse across independent runs — if any prior run computed a phase with an identical input hash, its result is reused for $0.00.

{
  "id": "analyze-auth",
  "task": "Summarize how the auth module works.",
  "context": ["src/auth/**/*.ts"],
  "cache": {
    "scope": "cross-run",                 // "run-only" (default) | "cross-run" | "off"
    "ttl": "6h",                          // optional max age before a hit is treated as a miss
    "fingerprint": ["git:HEAD", "glob:src/auth/**/*.ts"]  // fold world-state into the key
  }
}

scope — "run-only" (default) is exactly the historical behavior (within-run resume only). "cross-run" opts the phase into the persistent store. "off" disables reuse entirely (even within a run), for debugging.
Freshness is the whole game. The cache key already includes the prompt, the over items, and any context files (pre-read into the task). fingerprint folds implicit inputs into the key so "the world changed" becomes a cache miss: git:HEAD, glob:<pat> (size+mtime), glob!:<pat> (content hash), file:<path>, env:<NAME>. ttl (30m/6h/7d) is a time backstop.
Honest limit: a subagent that reads a file it didn't declare in context/fingerprint can still serve a stale cross-run hit. That's why the default is run-only and why gate/approval phases are forbidden from cross-run (they must produce a fresh result each run). Opt in only for phases whose output is a function of declared inputs.
Cache lives in .pi/taskflows/cache/ (gitignored). Clear it with action: "cache-clear". Full rationale: docs/rfc-cross-run-memoization.md.

Gate phases (quality control)

A gate runs an agent to review upstream output and can block the rest of the workflow. End the gate task by asking for a verdict the runtime can read:

a final line VERDICT: PASS or VERDICT: BLOCK (also accepts OK, FAIL, STOP, REJECT, HALT — last occurrence wins), or
JSON like {"continue": false, "reason": "missing auth checks"} / {"verdict": "block", "reason": "..."}.

On BLOCK, downstream phases skip and the run ends as blocked with the reason surfaced. Ambiguous output fails open (treated as PASS) — a gate never halts your flow by accident.

Review the audit below. If any endpoint is missing auth, end with
"VERDICT: BLOCK" and a one-line reason; otherwise end with "VERDICT: PASS".

{steps.audit.output}

Interpolation & expressions

placeholder	resolves to
`{args.X}`	invocation argument
`{steps.ID.output}`	a prior phase's text output
`{steps.ID.json}`	prior output parsed as JSON (or `{steps.ID.json.field}`)
`{item}` / `{item.field}`	current item inside a `map` phase
`{previous.output}`	the immediately-upstream phase output

Condition grammar (for when): == != < > <= >=, && || !, parentheses, quoted strings/numbers, and any {...} reference — e.g. "when": "{steps.triage.json.route} == deep && {args.force} != true".

Referencing {steps.X} that isn't declared in dependsOn is a hard validation error — the runtime catches the most common pipeline bug before a single agent runs.

Commands

Saved flows become CLI shortcuts. All commands run in the Pi session:

Command	What it does
`/tf list`	List all saved flows
`/tf run <name> [args]`	Run a saved flow (e.g. `/tf run summarize-files dir=src`)
`/tf show <name>`	Print a flow's definition
`/tf runs`	Browse recent run history (interactive TUI)
`/tf resume <runId>`	Continue a paused/failed run — cached phases skip automatically
`/tf init`	Interactively map model roles to your enabled models (writes `~/.pi/agent/settings.json`)
`/tf:<name> [args]`	Shortcut — runs the flow in one tap

Tool actions (used by the model): run (inline define or saved name), save, resume, list, init.

Resume across sessions

A taskflow run isn't tied to your session. Every completed phase is written to disk, so a run that fails (or that you stop) can be continued later with /tf resume <runId> — cached phases skip automatically and only the remaining work spends tokens.

A run fails midway in session 1; in session 2 /tf resume skips the cached phases and only re-runs the failed phase and what follows

Resume is keyed on each phase's input hash — if an upstream output changed, dependent phases re-run; if nothing changed, they're reused. No competing Pi extension does this across sessions.

Storage

.pi/taskflows/<name>.json          # project-scoped definitions (commit to share)
~/.pi/agent/taskflows/<name>.json  # user-scoped definitions
.pi/taskflows/runs/<flowName>/<runId>.json  # run state for resume (gitignore this)

Commit .pi/taskflows/ and your whole team shares the pipelines — no config sync, no onboarding doc. Run state is written atomically and guarded by a zero-dependency file lock, so concurrent runs never corrupt the index.

Agent discovery scope (via agentScope in the flow definition):

value	discovers agents from
`"user"` (default)	`~/.pi/agent/agents/*.md`
`"project"`	`.pi/agents/*.md` (walks up the tree)
`"both"`	user + project; project wins on name collision

Agents

Taskflow ships 18 built-in agents — each a .md file with a tuned system prompt, thinking level, and tool set. You can reference them by name in any phase or shorthand, right after install. No setup required.

Built-in agent roster

Agent	Role	Thinking	Default role
`executor`	Implement planned code changes	high	`{{fast}}`
`executor-fast`	Trivial fixes (≤2 files, ≤50 lines)	off	`{{fast}}`
`executor-code`	Complex multi-file implementation	high	`{{strong}}`
`executor-ui`	Frontend / styling / visual changes	high	`{{vision}}`
`scout`	Fast codebase recon & file mapping	off	`{{fast}}`
`planner`	Implementation plan creation	high	`{{strong}}`
`analyst`	Requirements analysis, ambiguity detection	high	`{{thinker}}`
`critic`	Inline self-doubt during reasoning	xhigh	`{{thinker}}`
`reviewer`	General code / architecture review	high	`{{strong}}`
`risk-reviewer`	Backend / infra / DB / API risk	high	`{{reasoner}}`
`security-reviewer`	Security vulns, auth/crypto	xhigh	`{{reasoner}}`
`plan-arbiter`	Plan quality gate (complex tasks)	high	`{{arbiter}}`
`final-arbiter`	Tiebreaker when critics disagree	xhigh	`{{arbiter}}`
`test-engineer`	Design & implement tests	high	`{{fast}}`
`doc-writer`	Documentation authoring	off	`{{fast}}`
`recover`	Session recovery after compaction	low	`{{fast}}`
`verifier`	Run tests, validate outcomes	off	`{{fast}}`
`visual-explorer`	Figma design metadata analysis	high	`{{vision}}`

Agents are layered: built-in → user (~/.pi/agent/agents/) → project (.pi/agents/). A user or project agent with the same name overrides the built-in — so you can customize any agent without touching the package.

Model roles

Each built-in agent's model field uses a role placeholder (e.g. {{fast}}) instead of a hardcoded provider string. This decouples intent from implementation — you map roles to models once, and every agent adapts.

Role	Intent	Typical model
`{{fast}}`	Cheap & quick — high-volume, low-stakes	DeepSeek V4 Flash
`{{strong}}`	Balanced — planning, review, moderate complexity	MiMo v2.5 Pro
`{{thinker}}`	Deep analysis — requirements, critique	DeepSeek V4 Pro
`{{arbiter}}`	Final judgment — tiebreak, plan quality gates	Qwen 3.7 Max
`{{vision}}`	Multimodal — UI work, design reading	MiniMax M3
`{{reasoner}}`	Cautious reasoning — security, risk	GLM 5.1

Without configuration, agents fall back to Pi's default model. To map roles to real models, run the interactive setup:

/tf init

/tf init starts with an action menu. First-time users get a 2-option shortcut ("Use recommended defaults" / "Configure each role"). Returning users see the full 5-option menu:

? What do you want to do with model roles?
  ❯ Use recommended defaults
    Configure each role
    Edit one role
    Show current roles
    Cancel

The picker shows model display names with capability flags and current/recommended markers:

? Model for 'vision' — Multimodal (executor-ui, visual-explorer)
  Current: openrouter/anthropic/claude-sonnet-4-6
  Recommended: minimax/MiniMax-M3
  ───────────────
  ❯ MiniMax M3 (minimax/MiniMax-M3) · image ✓ · reasoning ✓ · (recommended)
    Claude Sonnet 4.6 (openrouter/anthropic/...) · image ✓ · reasoning ✓ · (current)
    GPT-5 (openrouter/openai/gpt-5) · image ✓
    DeepSeek V4 Flash (openrouter/deepseek/v4-flash)
    ───────────────
    Custom (type your own)
    Keep current
    Back to action menu

Before saving, a preview screen shows the diff of your changes:

? Review changes:
  fast       openrouter/deepseek/deepseek-v4-flash   (unchanged)
  strong     openrouter/xiaomi/mimo-v2.5-pro         (unchanged)
  thinker    openrouter/qwen/qwen3.7-max             (changed ← was: openrouter/deepseek/v4-pro)
  arbiter    openrouter/qwen/qwen3.7-max             (unchanged)
  vision     minimax/MiniMax-M3                      (unchanged)
  reasoner   z-ai/glm-5.1                            (unchanged)
  ───────────────
  ❯ Save these changes
    Edit a role
    Cancel

Your choices are written to ~/.pi/agent/settings.json:

{
  "modelRoles": {
    "fast":     "openrouter/deepseek/deepseek-v4-flash",
    "strong":   "openrouter/xiaomi/mimo-v2.5-pro",
    "thinker":  "openrouter/deepseek/deepseek-v4-pro",
    "arbiter":  "openrouter/qwen/qwen3.7-max",
    "vision":   "minimax/MiniMax-M3",
    "reasoner": "z-ai/glm-5.1"
  }
}

Edit the values manually any time, or just re-run /tf init.

To customize a specific agent's model or thinking without changing modelRoles, create an agent file at ~/.pi/agent/agents/<name>.md with the desired overrides in the YAML frontmatter.

Tool path (`action="init"`)

The model can also configure roles via the taskflow tool:

Mode	Behavior
`mode: "show"` (default)	Read-only report of current `modelRoles`. Never overwrites.
`mode: "apply-defaults"` + `force: true`	Writes `RECOMMENDED_DEFAULTS` to `settings.json`, preserving stale keys.
`mode: "interactive"`	Launches the full action menu + picker flow (requires a UI session).

Custom agents

Drop a .md file into ~/.pi/agent/agents/ (user-level) or .pi/agents/ (project-level, commit it) to add your own:

---
name: my-linter

description: Run ESLint and report violations

tools: read, bash

model: "{{fast}}"

thinking: off
---

You are a linting agent. Run `npx eslint --format json` on the
provided files. Report violations grouped by file. No fixes.

Then reference it in any phase: { "agent": "my-linter", "task": "Lint src/" }.

Examples

Ready-to-read definitions in examples/:

File	Demonstrates
`summarize-files.json`	discover → `map` fan-out → `reduce`
`conditional-research.json`	`when` routing + `join: any` + `gate` + `budget`
`guarded-refactor.json`	`approval` (human-in-the-loop) + `retry` + `gate`

Copy one into .pi/taskflows/<name>.json (or ~/.pi/agent/taskflows/) and it registers as /tf:<name> — or just point the model at it.

What's inside

0 runtime dependencies · 601 tests · 9 phase types · cross-session resume · cross-run memoization · ~7.7k LOC runtime

Zero runtime dependencies. No dependencies field — the runtime is built entirely on Node built-ins (fs / path / os / child_process / crypto). The file lock is fs.openSync("wx"), not a third-party library.
601 tests across 25 test files covering concurrency, atomic file locking (8-process race regressions), path-traversal hardening, cross-session resume, cross-run cache freshness (flow/thinking/tools key isolation, fingerprint invalidation, TTL/LRU eviction), gate verdicts, budget caps, retry/backoff, approval flows, loop termination, tournament judging, sub-flow composition, callback isolation, the idle watchdog, model-role init config, and parseModelFromLabel with parenthesized-model-name regression.
Hardened by design. Path-traversal defense (lexical + realpath), runId validation, HTML/error sanitization, atomic writes, stale-lock stealing via rename, and an idle watchdog that kills wedged subagents.
Dogfooded. Every new feature has to survive the project's own self-improve taskflow before it ships.

🍽️ We eat our own dog food

Every feature in pi-taskflow ships through pi-taskflow.

Our self-improve flow is a 10-phase DAG — it audits the codebase, patches defects, verifies correctness, gates on quality, and surfaces the report — all declaratively. It's saved as /tf:self-improve and run before every release. No other agent orchestrator in the Pi ecosystem builds itself with itself.

Campaign	Scale	Phases	Outcome
v0.0.8 dogfood	Full codebase audit → triage → fix → verify	10 phases, 234 tests	13 fixes, all pass
v0.0.6 self-audit	inventory → map audit → gate → approval → map fix → reduce	9 phases	11 critical defects fixed
Cross-run cache dogfood	Real runtime + on-disk store	Dedicated test harness	Cache correctness under adversarial fingerprints
Adversarial cross-review	Multi-agent adversarial review	`tournament` + `gate`	P0 cache-key fix shipped
Init redesign review	Necessity audit → parallel checks → verdict	7 phases	Full redesign plan validated
Round 2 adversarial audit	Phase-by-phase DAG execution — 12 findings across runner/runtime/interpolate/verify	14 phases	10 fixes applied, 0 regressions
Round 3 adversarial audit	Integration layer + cross-module — 10 findings across index/agents/cache/render/runs-view	9 phases	10 fixes applied, 0 regressions

Meta: we used pi-taskflow's map fan-out, gate verdicts, approval human-in-the-loop, tournament best-of-N, loop until-done, and cross-run cache — to build pi-taskflow.

Status & limits

v0.0.20 — loop-until-done (loop phase: iterate to a condition, convergence, or cap), tournament (best-of-N with a judge), cross-run memoization (content-addressed cache with git/file/glob/env fingerprints and TTL), interactive /tf init with role-aware model pickers + diff preview + atomic merge-write, configurable built-in agents, 18 built-in agents with 6 model roles. Full control-flow & reliability layer (when guards, join: any, retry/backoff, approval, flow composition, budget caps, idle watchdog) on top of the DSL + DAG runtime (agent/parallel/map/gate/reduce). Inline + saved flows, cross-session resume, live progress, and isolated context. A run executes as one streaming tool call.

Known boundaries (tracked, bounded — no surprises mid-flow):

Detached background execution (new). Add detach: true to action: "run" to spawn the flow in a detached child process. The tool returns immediately with the runId; the flow continues running even if the host session exits. Status is polled via the store (/tf runs or action: "resume"). Approval phases auto-reject in detached mode.
No output: "file". Outputs are text/JSON only — write files via an agent's write tool call.
map requires a JSON array. The over field must resolve to a {steps.ID.json} array. Wrap a text list in a single-agent output: "json" phase first.
The DAG must be acyclic. Cycles are rejected at validation.

Development

npm install
npm run typecheck
npm test            # unit tests — no network, no process spawning
npm run test:e2e    # real end-to-end (spawns live subagents; needs model access)

Runtime lives in extensions/, tests in test/, and runnable examples in examples/.

Contributing

Contributions welcome — this is a young, fast-moving project. Open an issue or PR on GitHub. Good first contributions: new example flows, phase-type ideas, and TUI polish.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
.github		.github
assets		assets
docs		docs
examples		examples
extensions		extensions
skills/taskflow		skills/taskflow
test		test
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Why "taskflow" and not "workflow"?

Why this exists

Declarative graph vs. imperative script

Compared to other Pi extensions

30-second start

The shorthand (same shape as the built-in tool)

Watch it run

Go declarative

Fan out and reduce

Route, gate, retry, approve, and cap the spend

Phase types

Common phase fields

Control flow & reliability

Loop-until-done (loop)

Tournament (tournament)

Cross-run memoization (cache)

Gate phases (quality control)

Interpolation & expressions

Commands

Resume across sessions

Storage

Agents

Built-in agent roster

Model roles

Tool path (action="init")

Custom agents

Examples

What's inside

🍽️ We eat our own dog food

Status & limits

Development

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Loop-until-done (`loop`)

Tournament (`tournament`)

Cross-run memoization (`cache`)

Tool path (`action="init"`)

Packages