Stepwise Backends

For: Developers integrating stepwise execution, operators choosing backends.

Prerequisites: FLOW_STUDIO.md | Related: CLAUDE.md (Technology Stack section)

Stepwise in 60 Seconds

# 1. Run a stepwise demo
make demo-run-claude-stepwise

# 2. View it in Flow Studio
make flow-studio
# Open http://localhost:5000/?run=stepwise-claude

# 3. Inspect a transcript
cat swarm/examples/stepwise-claude/signal/llm/normalize-signal-normalizer-claude.jsonl

What happened? Each flow step executed as a separate LLM call, producing:

Transcript: Conversation log (llm/*.jsonl)
Receipt: Execution metadata (receipts/*.json)
Events: Timeline in events.jsonl

Continue reading for configuration, architecture, and extension details.

What is Stepwise Execution?

In batch execution (standard mode), a backend runs an entire flow in one LLM call. The LLM receives all step instructions upfront and executes them sequentially within a single session. This is fast but offers limited visibility into step-level behavior.

In stepwise execution, the orchestrator:

Loads the flow definition from flow_registry
Iterates through each step in order
Makes a separate LLM call per step
Passes context from previous steps to subsequent steps
Persists events and artifacts after each step

This approach trades throughput for observability and control and token efficiency.

Benefits of Stepwise Execution

Benefit	Description
Per-step observability	Each step emits separate `step_start` and `step_end` events
Context handoff	Previous step outputs are included in subsequent step prompts
Better error isolation	When a step fails, you know exactly which one and why
Teaching mode	Supports pausing at step boundaries for demos and debugging
Engine flexibility	Same orchestrator works with different LLM backends

Available Stepwise Backends

Backend ID	Engine	Description
`gemini-step-orchestrator`	`GeminiStepEngine`	Stepwise execution via Gemini CLI
`claude-step-orchestrator`	`ClaudeStepEngine`	Stepwise execution via Claude Agent SDK

Both backends use the GeminiStepOrchestrator class (despite its name) with different underlying StepEngine implementations. The orchestrator handles flow traversal while the engine handles LLM communication.

Architecture

Component Overview

Backend (GeminiStepwiseBackend / ClaudeStepwiseBackend)
    |
    v
Orchestrator (GeminiStepOrchestrator)
    |
    +-- flow_registry: Loads flow definitions and steps
    |
    +-- engine: StepEngine implementation
    |       |
    |       v
    |   GeminiStepEngine or ClaudeStepEngine
    |       |
    |       v
    |   LLM (Gemini CLI or Claude Agent SDK)
    |
    +-- storage: Persists events, summaries, transcripts

StepEngine Abstraction

The StepEngine interface (swarm/runtime/engines.py) defines how individual steps are executed. All engines implement:

class StepEngine(ABC):
    @property
    @abstractmethod
    def engine_id(self) -> str:
        """Unique identifier (e.g., 'gemini-step', 'claude-step')."""
        ...

    @abstractmethod
    def run_step(self, ctx: StepContext) -> Tuple[StepResult, Iterable[RunEvent]]:
        """Execute a step and return result + events."""
        ...

StepContext Dataclass

Contains all information needed to execute a step:

@dataclass
class StepContext:
    repo_root: Path          # Repository root path
    run_id: str              # Run identifier
    flow_key: str            # Flow being executed (signal, plan, build, etc.)
    step_id: str             # Step identifier within the flow
    step_index: int          # 1-based step index
    total_steps: int         # Total steps in the flow
    spec: RunSpec            # Run specification
    flow_title: str          # Human-readable flow title
    step_role: str           # Description of what this step does
    step_agents: Tuple[str]  # Agent keys assigned to this step
    history: List[Dict]      # Previous step results for context
    extra: Dict[str, Any]    # Additional context-specific data

    @property
    def run_base(self) -> Path:
        """Get RUN_BASE path for this step's artifacts."""
        return repo_root / "swarm" / "runs" / run_id / flow_key

StepResult Dataclass

Returned by engines after step execution:

@dataclass
class StepResult:
    step_id: str                      # Step identifier
    status: str                       # "succeeded" | "failed" | "skipped"
    output: str                       # Summary text describing what happened
    error: Optional[str] = None       # Error message if failed
    duration_ms: int = 0              # Execution duration in milliseconds
    artifacts: Optional[Dict] = None  # Artifact paths/metadata produced

Orchestrator

The GeminiStepOrchestrator (swarm/runtime/orchestrator.py) coordinates stepwise execution:

Run creation: Generates run ID, creates directories, writes initial metadata
Flow loading: Gets flow definition from flow_registry
Step iteration: Loops through steps, building context for each
Engine invocation: Calls engine.run_step(ctx) for each step
Event persistence: Writes engine events to events.jsonl
Status updates: Updates run summary as execution progresses

Key method:

def run_stepwise_flow(
    self,
    flow_key: str,
    spec: RunSpec,
    start_step: Optional[str] = None,
    end_step: Optional[str] = None,
) -> RunId:
    """Execute a flow step-by-step, one LLM call per step."""

Transcript and Receipt Format

Transcripts

Stepwise backends write detailed transcripts of LLM interactions.

Location: RUN_BASE/<flow>/llm/<step_id>-<agent>-<engine>.jsonl

Example path: swarm/runs/run-20251209-143022-abc123/signal/llm/S1-context-loader-claude.jsonl

Format: JSONL (one JSON object per line)

{"timestamp": "2025-01-15T10:00:00Z", "role": "system", "content": "Executing step S1 with agent context-loader"}
{"timestamp": "2025-01-15T10:00:01Z", "role": "user", "content": "Step role: Load relevant context from the codebase"}
{"timestamp": "2025-01-15T10:00:05Z", "role": "assistant", "content": "I have loaded the following files..."}

Receipts

Step receipts capture execution metadata for auditing and debugging.

Location: RUN_BASE/<flow>/receipts/<step_id>-<agent>.json

Example path: swarm/runs/run-20251209-143022-abc123/signal/receipts/S1-context-loader.json

Format: JSON with execution metadata:

{
  "engine": "claude-step",
  "mode": "sdk",
  "provider": "anthropic",
  "model": "claude-sonnet-4-20250514",
  "step_id": "S1",
  "flow_key": "signal",
  "run_id": "run-20251209-143022-abc123",
  "agent_key": "context-loader",
  "started_at": "2025-01-15T10:00:00Z",
  "completed_at": "2025-01-15T10:00:05Z",
  "duration_ms": 5000,
  "status": "succeeded",
  "tokens": {"prompt": 1200, "completion": 800, "total": 2000},
  "transcript_path": "llm/S1-context-loader-claude.jsonl"
}

The mode and provider fields indicate how the step was executed:

mode: Engine mode used (stub, sdk, or real)
provider: Provider profile (anthropic, anthropic_compat)

Engines & Providers

The stepwise system supports multiple LLM providers behind a unified interface.

Provider Profiles

Provider	Base URL	Description
`anthropic`	api.anthropic.com	Direct Anthropic Claude API
`anthropic_compat`	Configurable	Anthropic-compatible APIs (GLM, etc.)

Engine Modes

Each engine can run in different modes:

Engine	Mode	Description
`claude-step`	`stub`	Synthetic responses for testing (default)
`claude-step`	`sdk`	Real Claude Agent SDK execution
`claude-step`	`cli`	Real Claude CLI execution (`claude --output-format stream-json`)
`gemini-step`	`stub`	Synthetic responses for testing
`gemini-step`	`cli`	Real Gemini CLI execution

Which Backend Should I Use?

If you have a Claude Code seat (no API key):

Use claude-step-orchestrator with mode: cli (NEW!)
Your Claude Code CLI handles authentication automatically
Full stepwise execution with per-step transcripts
Works with Claude Code or GLM Coding Plan via your CLI settings

If you have an Anthropic API key:

Use claude-step-orchestrator with mode: sdk
Set ANTHROPIC_API_KEY environment variable
Full stepwise execution with per-step transcripts

If you have a GLM Coding Plan (Z.AI):

Option 1 (CLI): Use claude-step-orchestrator with mode: cli
- Your Claude Code CLI should already be configured for GLM
- No environment variables needed in the swarm
Option 2 (SDK): Use claude-step-orchestrator with mode: sdk
- Set ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
- Set ANTHROPIC_API_KEY to your GLM key

If you want cheap testing:

Use gemini-step-orchestrator with Gemini CLI
Good for iteration before using Claude

If you just want to explore the harness:

Leave everything at defaults (mode: stub)
Run make demo-run-stepwise to see synthetic runs

Quick Reference: Three Personas

Persona	Description	Command	API Key Required?
CI / Demo	Testing flows, exploring harness	`make stepwise-sdlc-stub`	No
Agent SDK	Local dev with Claude subscription (Max/Team/Enterprise)	TypeScript/Python Agent SDK	No (uses Claude login)
API User	Server-side / multi-user integration	`make stepwise-sdlc-claude-sdk`	Yes (`ANTHROPIC_API_KEY`)

Understanding Claude Surfaces

The Agent SDK (TypeScript/Python) is "headless Claude Code"—it reuses your Claude subscription when you're logged into Claude Code on your machine. No separate API account needed for local dev. Use the HTTP API (ANTHROPIC_API_KEY) for server-side, CI, or multi-tenant deployments.

The CLI mode (make stepwise-sdlc-claude-cli) is a lower-level surface that bridges to Claude Code CLI. It's useful for debugging and for providers without an Agent SDK (Gemini CLI, etc.).

Try it now: make agent-sdk-ts-demo or make agent-sdk-py-demo See examples/agent-sdk-ts/ and examples/agent-sdk-py/ for working examples.

For a step-by-step walkthrough, see swarm/runbooks/stepwise-fastpath.md.

Configuration

Mode Switches

Both stepwise engines support stub mode for development and CI testing. In stub mode, engines return synthetic responses without calling real LLMs.

Variable	Values	Description
`SWARM_GEMINI_STUB`	`0`, `1`	Force Gemini stub mode (default: `1`)
`SWARM_GEMINI_CLI`	path	Override Gemini CLI executable path
`SWARM_CLAUDE_STEP_ENGINE_MODE`	`stub`, `sdk`, `cli`	Claude engine mode
`SWARM_CLAUDE_CLI`	path	Override Claude CLI executable path

Default behavior:

SWARM_GEMINI_STUB=1: Stub mode is enabled by default
Set SWARM_GEMINI_STUB=0 to use real Gemini CLI (requires CLI installed)
Set SWARM_CLAUDE_STEP_ENGINE_MODE=sdk to use real Claude Agent SDK
Set SWARM_CLAUDE_STEP_ENGINE_MODE=cli to use Claude CLI (NEW!)

Mode selection order for Claude:

Environment variable: SWARM_CLAUDE_STEP_ENGINE_MODE
Config file: swarm/config/runtime.yaml
Default: stub

Configuration Examples

Using Anthropic API

# swarm/config/runtime.yaml
engines:
  claude:
    mode: "sdk"
    provider: "anthropic"

export ANTHROPIC_API_KEY=sk-ant-...
make demo-run-claude-stepwise

Using GLM via Z.AI (SDK)

# swarm/config/runtime.yaml
engines:
  claude:
    mode: "sdk"
    provider: "anthropic_compat"
    env:
      ANTHROPIC_BASE_URL: "https://api.z.ai/api/anthropic"

export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
export ANTHROPIC_API_KEY=<your-glm-key>
make demo-run-claude-stepwise

Using Claude CLI (Claude Code / GLM Coding Plan)

For users with Claude Code or GLM Coding Plan configured in their CLI:

# swarm/config/runtime.yaml
engines:
  claude:
    mode: "cli"
    provider: "anthropic"  # or "anthropic_compat" for GLM

# No environment variables needed - uses your CLI config!
make demo-run-claude-stepwise

# Or via the CLI tool with mode flag:
uv run swarm/tools/demo_stepwise_run.py \
  --backend claude-step-orchestrator \
  --mode cli \
  --flows build

This mode uses claude --output-format stream-json to execute steps, leveraging your existing Claude Code CLI authentication.

Using Gemini CLI

# swarm/config/runtime.yaml
engines:
  gemini:
    mode: "real"

export SWARM_GEMINI_STUB=0
make demo-run-gemini-stepwise

Stub Mode Rationale

Stub mode is the default because:

CI/CD safety: Tests run without LLM credentials or API calls
Fast iteration: Developers can test orchestration logic without LLM latency
Cost control: No LLM costs during development or testing
Deterministic testing: Stub responses are predictable

The stub mode still writes transcript and receipt files with placeholder content, allowing end-to-end testing of the orchestrator, storage, and Flow Studio UI.

Using Stepwise Backends

In Flow Studio

Open Flow Studio: make flow-studio
In the left sidebar, locate the Backend dropdown (above the flow list)
Select Gemini CLI (stepwise) or Claude Agent SDK (stepwise)
Start a run using the selected backend
View step-by-step progress in the Events Timeline

The Run Detail modal shows:

step_start events when each step begins
step_end or step_error events when steps complete
Engine-specific events from the LLM (tool_start, tool_end, etc.)

Via Python API

from swarm.runtime.backends import get_backend
from swarm.runtime.types import RunSpec

# Get a stepwise backend
backend = get_backend("gemini-step-orchestrator")
# or
backend = get_backend("claude-step-orchestrator")

# Create a run specification
spec = RunSpec(
    flow_keys=["signal", "plan"],
    backend="gemini-step-orchestrator",
    initiator="api",
    params={"title": "My Stepwise Run"},
)

# Start the run (returns immediately, runs in background)
run_id = backend.start(spec)

# Check status
summary = backend.get_summary(run_id)
print(f"Status: {summary.status}")

# Get events
events = backend.get_events(run_id)
for event in events:
    print(f"{event.kind}: {event.step_id or 'run-level'}")

Via Orchestrator Directly

For finer control, use the orchestrator directly:

from pathlib import Path
from swarm.runtime.orchestrator import get_orchestrator
from swarm.runtime.engines import ClaudeStepEngine
from swarm.runtime.types import RunSpec

# Create orchestrator with specific engine
repo_root = Path("/path/to/repo")
engine = ClaudeStepEngine(repo_root)
orchestrator = get_orchestrator(engine=engine, repo_root=repo_root)

# Run a single flow stepwise
spec = RunSpec(flow_keys=["build"], backend="claude-step-orchestrator")
run_id = orchestrator.run_stepwise_flow("build", spec)

# Optional: Run partial flow (steps 2-5 only)
run_id = orchestrator.run_stepwise_flow(
    "build",
    spec,
    start_step="S2",
    end_step="S5",
)

Development

Running Tests

# Run Gemini stepwise backend tests
uv run pytest tests/test_gemini_stepwise_backend.py -v

# Run Claude stepwise backend tests
uv run pytest tests/test_claude_stepwise_backend.py -v

# Run routing microloop tests
uv run pytest tests/test_build_stepwise_routing.py -v

# Run all stepwise-related tests
uv run pytest tests/ -k "stepwise" -v

Running Demos

Basic demos (signal + plan flows):

make demo-run-gemini-stepwise  # Gemini stepwise backend (stub mode)
make demo-run-claude-stepwise  # Claude stepwise backend (stub mode)
make demo-run-stepwise         # Run both

Full SDLC demos (signal + plan + build flows):

make stepwise-sdlc-gemini      # Gemini stepwise (stub mode)
make stepwise-sdlc-claude-cli  # Claude CLI stepwise
make stepwise-sdlc-claude-sdk  # Claude Agent SDK stepwise (requires API key)
make stepwise-sdlc-stub        # Both backends in stub mode

View help:

make stepwise-help

Golden Examples

Pre-generated stepwise runs are available in swarm/examples/:

Example	Backend	Flows	Description
`stepwise-gemini/`	Gemini	signal, plan	Lightweight mode (events only)
`stepwise-claude/`	Claude	signal, plan	Rich mode (transcripts + receipts)
`stepwise-build-gemini/`	Gemini	signal, plan, build	Build flow (lightweight)
`stepwise-build-claude/`	Claude	signal, plan, build	Build flow (rich mode)
`stepwise-gate-claude/`	Claude	signal, plan, build, gate	Through Gate verification
`stepwise-deploy-claude/`	Claude	signal, plan, build, gate, deploy	Through Deploy
`stepwise-sdlc-claude/`	Claude	all 7 flows	Complete SDLC (44 steps, recommended)

Each example includes:

spec.json - Run specification
meta.json - Run metadata
events.jsonl - Event log
README.md - Documentation

Claude examples also include:

<flow>/llm/*.jsonl - Per-step transcripts
<flow>/receipts/*.json - Per-step execution receipts

Testing in Stub Mode

Tests use isolated_runs_env fixture which:

Creates temporary swarm/runs/ and swarm/examples/ directories
Monkeypatches storage module to use temporary paths
Resets RunService singleton before/after each test

Example test fixture:

@pytest.fixture
def isolated_runs_env(tmp_path, monkeypatch):
    runs_dir = tmp_path / "swarm" / "runs"
    examples_dir = tmp_path / "swarm" / "examples"
    runs_dir.mkdir(parents=True)
    examples_dir.mkdir(parents=True)

    monkeypatch.setattr(storage, "RUNS_DIR", runs_dir)
    monkeypatch.setattr(storage, "EXAMPLES_DIR", examples_dir)

    RunService.reset()
    yield {"runs_dir": runs_dir, "examples_dir": examples_dir}
    RunService.reset()

Adding a New StepEngine

To add a new LLM backend (e.g., for OpenAI):

Create engine class in swarm/runtime/engines.py:

class OpenAIStepEngine(StepEngine):
    @property
    def engine_id(self) -> str:
        return "openai-step"

    def run_step(self, ctx: StepContext) -> Tuple[StepResult, Iterable[RunEvent]]:
        # Build prompt from ctx
        # Call OpenAI API
        # Write transcript and receipt
        # Return StepResult and events
        ...

Create backend class in swarm/runtime/backends.py:

class OpenAIStepwiseBackend(RunBackend):
    def _get_orchestrator(self) -> GeminiStepOrchestrator:
        from .engines import OpenAIStepEngine
        from .orchestrator import get_orchestrator
        return get_orchestrator(
            engine=OpenAIStepEngine(self._repo_root),
            repo_root=self._repo_root,
        )
    # ... implement required methods

Register in backend registry:

_BACKEND_REGISTRY: dict[BackendId, type[RunBackend]] = {
    # ... existing backends
    "openai-step-orchestrator": OpenAIStepwiseBackend,
}

Add BackendId type:

# In swarm/runtime/types.py
BackendId = Literal[
    # ... existing backends
    "openai-step-orchestrator",
]

Write tests following patterns in tests/test_gemini_stepwise_backend.py

Troubleshooting

Common Issues

Issue	Cause	Solution
No transcript files	Engine did not write them	Check engine logs, verify `run_base` path exists
Stub mode active unexpectedly	Env var override or CLI not found	Check `SWARM_GEMINI_STUB`, verify CLI in PATH
Step timeout	Long-running step	Check step complexity, increase timeout if needed
"Unknown flow" error	Flow not in registry	Verify `swarm/config/flows/<flow>.yaml` exists
"Flow has no steps"	Empty steps list	Check flow YAML has steps defined
Events missing	Storage write failed	Check disk space, verify write permissions

Debugging Steps

Check stub mode status:

from swarm.runtime.backends import GeminiStepwiseBackend
backend = GeminiStepwiseBackend()
orch = backend._get_orchestrator()
print(f"Stub mode: {orch._engine.stub_mode}")

Inspect events:

events = backend.get_events(run_id)
for e in events:
    print(f"{e.ts} {e.kind} {e.step_id}: {e.payload}")

Check transcript content:

cat swarm/runs/<run_id>/<flow>/llm/*.jsonl

Check receipt metadata:

cat swarm/runs/<run_id>/<flow>/receipts/*.json | jq .

Events Reference

Stepwise runs emit these event kinds:

Core Execution Events

Event	Level	Description
`run_created`	Run	Initial run creation (has `stepwise: true`)
`run_started`	Run	Execution began
`step_start`	Step	Step execution began
`tool_start`	Tool	Tool invocation started (engine-specific)
`tool_end`	Tool	Tool invocation completed
`step_end`	Step	Step completed successfully
`step_error`	Step	Step failed with error
`run_completed`	Run	All steps finished
`run_stopping`	Run	Orderly shutdown initiated
`run_stopped`	Run	Run stopped before completion
`run_pausing`	Run	Pause requested
`run_paused`	Run	Run paused at step boundary
`run_resumed`	Run	Execution resumed

Routing Events

Event	Level	Description
`routing_decision`	Step	Navigator made a routing decision
`routing_offroad`	Step	Navigator deviated from golden path
`flow_injected`	Flow	A new flow was injected mid-run
`node_injected`	Step	A new node was added to current flow
`graph_extended`	Run	Navigator proposed spec changes

Stack Events

Event	Level	Description
`stack_push`	Flow	A flow was paused and new flow injected
`stack_pop`	Flow	An injected flow completed, resuming parent
`stack_overflow_prevented`	Flow	Max stack depth would be exceeded

Fact Extraction Events

Event	Level	Description
`facts_updated`	Step	Fact markers extracted from handoff
`assumption_recorded`	Step	An assumption was documented
`decision_recorded`	Step	A significant decision was documented

Event payload examples:

// step_start
{
  "role": "Load context from codebase",
  "agents": ["context-loader"],
  "step_index": 1,
  "engine": "claude-step"
}

// step_end
{
  "status": "succeeded",
  "duration_ms": 5000,
  "engine": "claude-step"
}

// routing_offroad
{
  "golden_path_step": "code-critic",
  "actual_step": "security-scanner",
  "route_type": "DETOUR",
  "rationale": "Detected potential SQL injection pattern",
  "return_address": "code-critic",
  "confidence": 0.85,
  "evaluated_conditions": ["has_db_queries == true"],
  "tie_breaker_used": false
}

// stack_push
{
  "paused_flow": "build",
  "paused_step": "code-implementer",
  "injected_flow": "rebase",
  "current_depth": 2
}

Contracts: Proof, Not Promise

The stepwise harness is backed by test-enforced contracts. Here's where each contract is proven:

Contract	Where it's enforced	What it proves
Teaching notes appear in prompts	`tests/test_step_prompt_teaching_notes.py`	Inputs/outputs/emphasizes/constraints from flow YAML appear in LLM prompts
Routing decisions follow receipts	`tests/test_build_stepwise_routing.py`	Orchestrator routes based on `status` field in receipts; microloops exit on VERIFIED
Receipts have required fields	`tests/test_step_engine_contract.py`	Every receipt includes engine, mode, provider, step_id, flow_key, run_id, status, duration_ms
Transcripts have valid format	`tests/test_step_engine_contract.py`	JSONL format with role, content, timestamp per message
Examples reflect real flows	`swarm/examples/stepwise-build-claude/`	Golden receipts/transcripts from actual stepwise runs

Key Contract Dataclasses

Teaching Notes (swarm/config/flow_registry.py):

@dataclass
class TeachingNotes:
    inputs: Tuple[str, ...]      # What the step reads
    outputs: Tuple[str, ...]     # What the step writes
    emphasizes: Tuple[str, ...]  # Key focus areas
    constraints: Tuple[str, ...] # What the step cannot do

Step Routing (swarm/config/flow_registry.py):

@dataclass
class StepRouting:
    kind: str              # "linear" | "microloop" | "branch"
    loop_target: str       # Step to loop back to
    loop_condition_field: str  # Receipt field to check
    loop_success_values: Tuple[str, ...]  # Values that exit the loop

Step Result (swarm/runtime/engines.py):

@dataclass
class StepResult:
    step_id: str
    status: str  # "succeeded" | "failed" | "skipped"
    output: str
    duration_ms: int

Verifying Contracts Locally

# Verify teaching notes appear in prompts
uv run pytest tests/test_step_prompt_teaching_notes.py -v

# Verify routing decisions
uv run pytest tests/test_build_stepwise_routing.py -v

# Verify receipt/transcript contracts
uv run pytest tests/test_step_engine_contract.py -v

Known Limitations

This section documents deliberate scope boundaries for the stepwise harness implementation.

Current Scope

Feature	Status	Notes
Flows 1-4 (Signal → Gate)	✅ Complete	Fully stepwise with teaching_notes, microloops, golden examples
Flows 5-6 (Deploy → Wisdom)	✅ Complete	Linear routing, teaching_notes, golden examples
Stub mode	✅ Complete	Zero-cost testing for CI and demos
CLI mode	✅ Complete	Claude Code or GLM Coding Plan
SDK mode	✅ Complete	Real Anthropic API calls

Intentional Limitations

No multi-engine per step: Engine selection is at the flow level, not step level. You cannot run step 1 with Claude and step 2 with Gemini in the same flow. Workaround: Run different flows with different engines.
No automatic resumption: If a run fails mid-flow, you must restart from the beginning. There's no checkpoint/resume mechanism yet. Workaround: Use stub mode to test flow structure, then run real mode when ready.
SDK tests are minimal: Most test coverage uses stub mode. The SDK smoke test (test_step_engine_sdk_smoke.py) only runs when ANTHROPIC_API_KEY is set. Workaround: Run uv run pytest tests/test_step_engine_sdk_smoke.py -v manually to verify SDK integration.
Microloops only in Build: Routing with microloops (loop-back on UNVERIFIED) is only tested for Build flow (test/code loops). Deploy and Wisdom use linear routing. Workaround: Extend _route() in orchestrator.py if you need microloops elsewhere.
No streaming to UI: Transcripts are written after step completion, not streamed during execution. Flow Studio shows completed steps, not in-progress output.

Extension Points

To extend beyond these limitations:

Per-step engine: Add engine_strategy config to runtime.yaml and implement get_engine_for_step(flow_key, step_id) in engines.py
Resumption: Add checkpoint serialization in storage.py and resume logic in orchestrator._execute_stepwise()
Streaming: Use streaming_callback in engine methods to emit events during execution

Future Work

The following enhancements are not implemented and out-of-scope for v2.3.0. They are documented here for future reference.

Feature	Description	Entry Point
Per-step engine strategy	Mix Gemini + Claude in one flow (e.g., step 1 with Claude, step 2 with Gemini)	Add `engine_strategy` to `swarm/config/runtime.yaml`, implement `get_engine_for_step(flow_key, step_id)` in `swarm/runtime/engines.py`
Run resumption / checkpoints	Persist orchestrator step index + routing state for mid-flow recovery	Add checkpoint serialization to `swarm/runtime/storage.py`, add `start_from_step` / `resume_run` entry points in orchestrator
Streaming into Flow Studio	Real-time updates during step execution instead of post-completion	Expose streaming event channel from engines, extend Flow Studio to subscribe and render in-progress steps

FilesExpand file tree

STEPWISE_BACKENDS.md

Latest commit

History