For: Developers integrating stepwise execution, operators choosing backends.
Prerequisites: FLOW_STUDIO.md | Related: CLAUDE.md (Technology Stack section)
# 1. Run a stepwise demo
make demo-run-claude-stepwise
# 2. View it in Flow Studio
make flow-studio
# Open http://localhost:5000/?run=stepwise-claude
# 3. Inspect a transcript
cat swarm/examples/stepwise-claude/signal/llm/normalize-signal-normalizer-claude.jsonlWhat happened? Each flow step executed as a separate LLM call, producing:
- Transcript: Conversation log (
llm/*.jsonl) - Receipt: Execution metadata (
receipts/*.json) - Events: Timeline in
events.jsonl
Continue reading for configuration, architecture, and extension details.
In batch execution (standard mode), a backend runs an entire flow in one LLM call. The LLM receives all step instructions upfront and executes them sequentially within a single session. This is fast but offers limited visibility into step-level behavior.
In stepwise execution, the orchestrator:
- Loads the flow definition from
flow_registry - Iterates through each step in order
- Makes a separate LLM call per step
- Passes context from previous steps to subsequent steps
- Persists events and artifacts after each step
This approach trades throughput for observability and control and token efficiency.
| Benefit | Description |
|---|---|
| Per-step observability | Each step emits separate step_start and step_end events |
| Context handoff | Previous step outputs are included in subsequent step prompts |
| Better error isolation | When a step fails, you know exactly which one and why |
| Teaching mode | Supports pausing at step boundaries for demos and debugging |
| Engine flexibility | Same orchestrator works with different LLM backends |
| Backend ID | Engine | Description |
|---|---|---|
gemini-step-orchestrator |
GeminiStepEngine |
Stepwise execution via Gemini CLI |
claude-step-orchestrator |
ClaudeStepEngine |
Stepwise execution via Claude Agent SDK |
Both backends use the GeminiStepOrchestrator class (despite its name) with
different underlying StepEngine implementations. The orchestrator handles
flow traversal while the engine handles LLM communication.
Backend (GeminiStepwiseBackend / ClaudeStepwiseBackend)
|
v
Orchestrator (GeminiStepOrchestrator)
|
+-- flow_registry: Loads flow definitions and steps
|
+-- engine: StepEngine implementation
| |
| v
| GeminiStepEngine or ClaudeStepEngine
| |
| v
| LLM (Gemini CLI or Claude Agent SDK)
|
+-- storage: Persists events, summaries, transcripts
The StepEngine interface (swarm/runtime/engines.py) defines how individual
steps are executed. All engines implement:
class StepEngine(ABC):
@property
@abstractmethod
def engine_id(self) -> str:
"""Unique identifier (e.g., 'gemini-step', 'claude-step')."""
...
@abstractmethod
def run_step(self, ctx: StepContext) -> Tuple[StepResult, Iterable[RunEvent]]:
"""Execute a step and return result + events."""
...Contains all information needed to execute a step:
@dataclass
class StepContext:
repo_root: Path # Repository root path
run_id: str # Run identifier
flow_key: str # Flow being executed (signal, plan, build, etc.)
step_id: str # Step identifier within the flow
step_index: int # 1-based step index
total_steps: int # Total steps in the flow
spec: RunSpec # Run specification
flow_title: str # Human-readable flow title
step_role: str # Description of what this step does
step_agents: Tuple[str] # Agent keys assigned to this step
history: List[Dict] # Previous step results for context
extra: Dict[str, Any] # Additional context-specific data
@property
def run_base(self) -> Path:
"""Get RUN_BASE path for this step's artifacts."""
return repo_root / "swarm" / "runs" / run_id / flow_keyReturned by engines after step execution:
@dataclass
class StepResult:
step_id: str # Step identifier
status: str # "succeeded" | "failed" | "skipped"
output: str # Summary text describing what happened
error: Optional[str] = None # Error message if failed
duration_ms: int = 0 # Execution duration in milliseconds
artifacts: Optional[Dict] = None # Artifact paths/metadata producedThe GeminiStepOrchestrator (swarm/runtime/orchestrator.py) coordinates
stepwise execution:
- Run creation: Generates run ID, creates directories, writes initial metadata
- Flow loading: Gets flow definition from
flow_registry - Step iteration: Loops through steps, building context for each
- Engine invocation: Calls
engine.run_step(ctx)for each step - Event persistence: Writes engine events to
events.jsonl - Status updates: Updates run summary as execution progresses
Key method:
def run_stepwise_flow(
self,
flow_key: str,
spec: RunSpec,
start_step: Optional[str] = None,
end_step: Optional[str] = None,
) -> RunId:
"""Execute a flow step-by-step, one LLM call per step."""Stepwise backends write detailed transcripts of LLM interactions.
Location: RUN_BASE/<flow>/llm/<step_id>-<agent>-<engine>.jsonl
Example path: swarm/runs/run-20251209-143022-abc123/signal/llm/S1-context-loader-claude.jsonl
Format: JSONL (one JSON object per line)
{"timestamp": "2025-01-15T10:00:00Z", "role": "system", "content": "Executing step S1 with agent context-loader"}
{"timestamp": "2025-01-15T10:00:01Z", "role": "user", "content": "Step role: Load relevant context from the codebase"}
{"timestamp": "2025-01-15T10:00:05Z", "role": "assistant", "content": "I have loaded the following files..."}Step receipts capture execution metadata for auditing and debugging.
Location: RUN_BASE/<flow>/receipts/<step_id>-<agent>.json
Example path: swarm/runs/run-20251209-143022-abc123/signal/receipts/S1-context-loader.json
Format: JSON with execution metadata:
{
"engine": "claude-step",
"mode": "sdk",
"provider": "anthropic",
"model": "claude-sonnet-4-20250514",
"step_id": "S1",
"flow_key": "signal",
"run_id": "run-20251209-143022-abc123",
"agent_key": "context-loader",
"started_at": "2025-01-15T10:00:00Z",
"completed_at": "2025-01-15T10:00:05Z",
"duration_ms": 5000,
"status": "succeeded",
"tokens": {"prompt": 1200, "completion": 800, "total": 2000},
"transcript_path": "llm/S1-context-loader-claude.jsonl"
}The mode and provider fields indicate how the step was executed:
mode: Engine mode used (stub,sdk, orreal)provider: Provider profile (anthropic,anthropic_compat)
The stepwise system supports multiple LLM providers behind a unified interface.
| Provider | Base URL | Description |
|---|---|---|
anthropic |
api.anthropic.com | Direct Anthropic Claude API |
anthropic_compat |
Configurable | Anthropic-compatible APIs (GLM, etc.) |
Each engine can run in different modes:
| Engine | Mode | Description |
|---|---|---|
claude-step |
stub |
Synthetic responses for testing (default) |
claude-step |
sdk |
Real Claude Agent SDK execution |
claude-step |
cli |
Real Claude CLI execution (claude --output-format stream-json) |
gemini-step |
stub |
Synthetic responses for testing |
gemini-step |
cli |
Real Gemini CLI execution |
If you have a Claude Code seat (no API key):
- Use
claude-step-orchestratorwithmode: cli(NEW!) - Your Claude Code CLI handles authentication automatically
- Full stepwise execution with per-step transcripts
- Works with Claude Code or GLM Coding Plan via your CLI settings
If you have an Anthropic API key:
- Use
claude-step-orchestratorwithmode: sdk - Set
ANTHROPIC_API_KEYenvironment variable - Full stepwise execution with per-step transcripts
If you have a GLM Coding Plan (Z.AI):
- Option 1 (CLI): Use
claude-step-orchestratorwithmode: cli- Your Claude Code CLI should already be configured for GLM
- No environment variables needed in the swarm
- Option 2 (SDK): Use
claude-step-orchestratorwithmode: sdk- Set
ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic - Set
ANTHROPIC_API_KEYto your GLM key
- Set
If you want cheap testing:
- Use
gemini-step-orchestratorwith Gemini CLI - Good for iteration before using Claude
If you just want to explore the harness:
- Leave everything at defaults (
mode: stub) - Run
make demo-run-stepwiseto see synthetic runs
| Persona | Description | Command | API Key Required? |
|---|---|---|---|
| CI / Demo | Testing flows, exploring harness | make stepwise-sdlc-stub |
No |
| Agent SDK | Local dev with Claude subscription (Max/Team/Enterprise) | TypeScript/Python Agent SDK | No (uses Claude login) |
| API User | Server-side / multi-user integration | make stepwise-sdlc-claude-sdk |
Yes (ANTHROPIC_API_KEY) |
Understanding Claude Surfaces
The Agent SDK (TypeScript/Python) is "headless Claude Code"—it reuses your Claude subscription when you're logged into Claude Code on your machine. No separate API account needed for local dev. Use the HTTP API (
ANTHROPIC_API_KEY) for server-side, CI, or multi-tenant deployments.The CLI mode (
make stepwise-sdlc-claude-cli) is a lower-level surface that bridges to Claude Code CLI. It's useful for debugging and for providers without an Agent SDK (Gemini CLI, etc.).Try it now:
make agent-sdk-ts-demoormake agent-sdk-py-demoSeeexamples/agent-sdk-ts/andexamples/agent-sdk-py/for working examples.
For a step-by-step walkthrough, see swarm/runbooks/stepwise-fastpath.md.
Both stepwise engines support stub mode for development and CI testing. In stub mode, engines return synthetic responses without calling real LLMs.
| Variable | Values | Description |
|---|---|---|
SWARM_GEMINI_STUB |
0, 1 |
Force Gemini stub mode (default: 1) |
SWARM_GEMINI_CLI |
path | Override Gemini CLI executable path |
SWARM_CLAUDE_STEP_ENGINE_MODE |
stub, sdk, cli |
Claude engine mode |
SWARM_CLAUDE_CLI |
path | Override Claude CLI executable path |
Default behavior:
SWARM_GEMINI_STUB=1: Stub mode is enabled by default- Set
SWARM_GEMINI_STUB=0to use real Gemini CLI (requires CLI installed) - Set
SWARM_CLAUDE_STEP_ENGINE_MODE=sdkto use real Claude Agent SDK - Set
SWARM_CLAUDE_STEP_ENGINE_MODE=clito use Claude CLI (NEW!)
Mode selection order for Claude:
- Environment variable:
SWARM_CLAUDE_STEP_ENGINE_MODE - Config file:
swarm/config/runtime.yaml - Default:
stub
# swarm/config/runtime.yaml
engines:
claude:
mode: "sdk"
provider: "anthropic"export ANTHROPIC_API_KEY=sk-ant-...
make demo-run-claude-stepwise# swarm/config/runtime.yaml
engines:
claude:
mode: "sdk"
provider: "anthropic_compat"
env:
ANTHROPIC_BASE_URL: "https://api.z.ai/api/anthropic"export ANTHROPIC_BASE_URL=https://api.z.ai/api/anthropic
export ANTHROPIC_API_KEY=<your-glm-key>
make demo-run-claude-stepwiseFor users with Claude Code or GLM Coding Plan configured in their CLI:
# swarm/config/runtime.yaml
engines:
claude:
mode: "cli"
provider: "anthropic" # or "anthropic_compat" for GLM# No environment variables needed - uses your CLI config!
make demo-run-claude-stepwise
# Or via the CLI tool with mode flag:
uv run swarm/tools/demo_stepwise_run.py \
--backend claude-step-orchestrator \
--mode cli \
--flows buildThis mode uses claude --output-format stream-json to execute steps,
leveraging your existing Claude Code CLI authentication.
# swarm/config/runtime.yaml
engines:
gemini:
mode: "real"export SWARM_GEMINI_STUB=0
make demo-run-gemini-stepwiseStub mode is the default because:
- CI/CD safety: Tests run without LLM credentials or API calls
- Fast iteration: Developers can test orchestration logic without LLM latency
- Cost control: No LLM costs during development or testing
- Deterministic testing: Stub responses are predictable
The stub mode still writes transcript and receipt files with placeholder content, allowing end-to-end testing of the orchestrator, storage, and Flow Studio UI.
- Open Flow Studio:
make flow-studio - In the left sidebar, locate the Backend dropdown (above the flow list)
- Select
Gemini CLI (stepwise)orClaude Agent SDK (stepwise) - Start a run using the selected backend
- View step-by-step progress in the Events Timeline
The Run Detail modal shows:
step_startevents when each step beginsstep_endorstep_errorevents when steps complete- Engine-specific events from the LLM (tool_start, tool_end, etc.)
from swarm.runtime.backends import get_backend
from swarm.runtime.types import RunSpec
# Get a stepwise backend
backend = get_backend("gemini-step-orchestrator")
# or
backend = get_backend("claude-step-orchestrator")
# Create a run specification
spec = RunSpec(
flow_keys=["signal", "plan"],
backend="gemini-step-orchestrator",
initiator="api",
params={"title": "My Stepwise Run"},
)
# Start the run (returns immediately, runs in background)
run_id = backend.start(spec)
# Check status
summary = backend.get_summary(run_id)
print(f"Status: {summary.status}")
# Get events
events = backend.get_events(run_id)
for event in events:
print(f"{event.kind}: {event.step_id or 'run-level'}")For finer control, use the orchestrator directly:
from pathlib import Path
from swarm.runtime.orchestrator import get_orchestrator
from swarm.runtime.engines import ClaudeStepEngine
from swarm.runtime.types import RunSpec
# Create orchestrator with specific engine
repo_root = Path("/path/to/repo")
engine = ClaudeStepEngine(repo_root)
orchestrator = get_orchestrator(engine=engine, repo_root=repo_root)
# Run a single flow stepwise
spec = RunSpec(flow_keys=["build"], backend="claude-step-orchestrator")
run_id = orchestrator.run_stepwise_flow("build", spec)
# Optional: Run partial flow (steps 2-5 only)
run_id = orchestrator.run_stepwise_flow(
"build",
spec,
start_step="S2",
end_step="S5",
)# Run Gemini stepwise backend tests
uv run pytest tests/test_gemini_stepwise_backend.py -v
# Run Claude stepwise backend tests
uv run pytest tests/test_claude_stepwise_backend.py -v
# Run routing microloop tests
uv run pytest tests/test_build_stepwise_routing.py -v
# Run all stepwise-related tests
uv run pytest tests/ -k "stepwise" -vBasic demos (signal + plan flows):
make demo-run-gemini-stepwise # Gemini stepwise backend (stub mode)
make demo-run-claude-stepwise # Claude stepwise backend (stub mode)
make demo-run-stepwise # Run bothFull SDLC demos (signal + plan + build flows):
make stepwise-sdlc-gemini # Gemini stepwise (stub mode)
make stepwise-sdlc-claude-cli # Claude CLI stepwise
make stepwise-sdlc-claude-sdk # Claude Agent SDK stepwise (requires API key)
make stepwise-sdlc-stub # Both backends in stub modeView help:
make stepwise-helpPre-generated stepwise runs are available in swarm/examples/:
| Example | Backend | Flows | Description |
|---|---|---|---|
stepwise-gemini/ |
Gemini | signal, plan | Lightweight mode (events only) |
stepwise-claude/ |
Claude | signal, plan | Rich mode (transcripts + receipts) |
stepwise-build-gemini/ |
Gemini | signal, plan, build | Build flow (lightweight) |
stepwise-build-claude/ |
Claude | signal, plan, build | Build flow (rich mode) |
stepwise-gate-claude/ |
Claude | signal, plan, build, gate | Through Gate verification |
stepwise-deploy-claude/ |
Claude | signal, plan, build, gate, deploy | Through Deploy |
stepwise-sdlc-claude/ |
Claude | all 7 flows | Complete SDLC (44 steps, recommended) |
Each example includes:
spec.json- Run specificationmeta.json- Run metadataevents.jsonl- Event logREADME.md- Documentation
Claude examples also include:
<flow>/llm/*.jsonl- Per-step transcripts<flow>/receipts/*.json- Per-step execution receipts
Tests use isolated_runs_env fixture which:
- Creates temporary
swarm/runs/andswarm/examples/directories - Monkeypatches storage module to use temporary paths
- Resets RunService singleton before/after each test
Example test fixture:
@pytest.fixture
def isolated_runs_env(tmp_path, monkeypatch):
runs_dir = tmp_path / "swarm" / "runs"
examples_dir = tmp_path / "swarm" / "examples"
runs_dir.mkdir(parents=True)
examples_dir.mkdir(parents=True)
monkeypatch.setattr(storage, "RUNS_DIR", runs_dir)
monkeypatch.setattr(storage, "EXAMPLES_DIR", examples_dir)
RunService.reset()
yield {"runs_dir": runs_dir, "examples_dir": examples_dir}
RunService.reset()To add a new LLM backend (e.g., for OpenAI):
- Create engine class in
swarm/runtime/engines.py:
class OpenAIStepEngine(StepEngine):
@property
def engine_id(self) -> str:
return "openai-step"
def run_step(self, ctx: StepContext) -> Tuple[StepResult, Iterable[RunEvent]]:
# Build prompt from ctx
# Call OpenAI API
# Write transcript and receipt
# Return StepResult and events
...- Create backend class in
swarm/runtime/backends.py:
class OpenAIStepwiseBackend(RunBackend):
def _get_orchestrator(self) -> GeminiStepOrchestrator:
from .engines import OpenAIStepEngine
from .orchestrator import get_orchestrator
return get_orchestrator(
engine=OpenAIStepEngine(self._repo_root),
repo_root=self._repo_root,
)
# ... implement required methods- Register in backend registry:
_BACKEND_REGISTRY: dict[BackendId, type[RunBackend]] = {
# ... existing backends
"openai-step-orchestrator": OpenAIStepwiseBackend,
}- Add BackendId type:
# In swarm/runtime/types.py
BackendId = Literal[
# ... existing backends
"openai-step-orchestrator",
]- Write tests following patterns in
tests/test_gemini_stepwise_backend.py
| Issue | Cause | Solution |
|---|---|---|
| No transcript files | Engine did not write them | Check engine logs, verify run_base path exists |
| Stub mode active unexpectedly | Env var override or CLI not found | Check SWARM_GEMINI_STUB, verify CLI in PATH |
| Step timeout | Long-running step | Check step complexity, increase timeout if needed |
| "Unknown flow" error | Flow not in registry | Verify swarm/config/flows/<flow>.yaml exists |
| "Flow has no steps" | Empty steps list | Check flow YAML has steps defined |
| Events missing | Storage write failed | Check disk space, verify write permissions |
-
Check stub mode status:
from swarm.runtime.backends import GeminiStepwiseBackend backend = GeminiStepwiseBackend() orch = backend._get_orchestrator() print(f"Stub mode: {orch._engine.stub_mode}")
-
Inspect events:
events = backend.get_events(run_id) for e in events: print(f"{e.ts} {e.kind} {e.step_id}: {e.payload}")
-
Check transcript content:
cat swarm/runs/<run_id>/<flow>/llm/*.jsonl
-
Check receipt metadata:
cat swarm/runs/<run_id>/<flow>/receipts/*.json | jq .
Stepwise runs emit these event kinds:
| Event | Level | Description |
|---|---|---|
run_created |
Run | Initial run creation (has stepwise: true) |
run_started |
Run | Execution began |
step_start |
Step | Step execution began |
tool_start |
Tool | Tool invocation started (engine-specific) |
tool_end |
Tool | Tool invocation completed |
step_end |
Step | Step completed successfully |
step_error |
Step | Step failed with error |
run_completed |
Run | All steps finished |
run_stopping |
Run | Orderly shutdown initiated |
run_stopped |
Run | Run stopped before completion |
run_pausing |
Run | Pause requested |
run_paused |
Run | Run paused at step boundary |
run_resumed |
Run | Execution resumed |
| Event | Level | Description |
|---|---|---|
routing_decision |
Step | Navigator made a routing decision |
routing_offroad |
Step | Navigator deviated from golden path |
flow_injected |
Flow | A new flow was injected mid-run |
node_injected |
Step | A new node was added to current flow |
graph_extended |
Run | Navigator proposed spec changes |
| Event | Level | Description |
|---|---|---|
stack_push |
Flow | A flow was paused and new flow injected |
stack_pop |
Flow | An injected flow completed, resuming parent |
stack_overflow_prevented |
Flow | Max stack depth would be exceeded |
| Event | Level | Description |
|---|---|---|
facts_updated |
Step | Fact markers extracted from handoff |
assumption_recorded |
Step | An assumption was documented |
decision_recorded |
Step | A significant decision was documented |
Event payload examples:
// step_start
{
"role": "Load context from codebase",
"agents": ["context-loader"],
"step_index": 1,
"engine": "claude-step"
}
// step_end
{
"status": "succeeded",
"duration_ms": 5000,
"engine": "claude-step"
}
// routing_offroad
{
"golden_path_step": "code-critic",
"actual_step": "security-scanner",
"route_type": "DETOUR",
"rationale": "Detected potential SQL injection pattern",
"return_address": "code-critic",
"confidence": 0.85,
"evaluated_conditions": ["has_db_queries == true"],
"tie_breaker_used": false
}
// stack_push
{
"paused_flow": "build",
"paused_step": "code-implementer",
"injected_flow": "rebase",
"current_depth": 2
}The stepwise harness is backed by test-enforced contracts. Here's where each contract is proven:
| Contract | Where it's enforced | What it proves |
|---|---|---|
| Teaching notes appear in prompts | tests/test_step_prompt_teaching_notes.py |
Inputs/outputs/emphasizes/constraints from flow YAML appear in LLM prompts |
| Routing decisions follow receipts | tests/test_build_stepwise_routing.py |
Orchestrator routes based on status field in receipts; microloops exit on VERIFIED |
| Receipts have required fields | tests/test_step_engine_contract.py |
Every receipt includes engine, mode, provider, step_id, flow_key, run_id, status, duration_ms |
| Transcripts have valid format | tests/test_step_engine_contract.py |
JSONL format with role, content, timestamp per message |
| Examples reflect real flows | swarm/examples/stepwise-build-claude/ |
Golden receipts/transcripts from actual stepwise runs |
Teaching Notes (swarm/config/flow_registry.py):
@dataclass
class TeachingNotes:
inputs: Tuple[str, ...] # What the step reads
outputs: Tuple[str, ...] # What the step writes
emphasizes: Tuple[str, ...] # Key focus areas
constraints: Tuple[str, ...] # What the step cannot doStep Routing (swarm/config/flow_registry.py):
@dataclass
class StepRouting:
kind: str # "linear" | "microloop" | "branch"
loop_target: str # Step to loop back to
loop_condition_field: str # Receipt field to check
loop_success_values: Tuple[str, ...] # Values that exit the loopStep Result (swarm/runtime/engines.py):
@dataclass
class StepResult:
step_id: str
status: str # "succeeded" | "failed" | "skipped"
output: str
duration_ms: int# Verify teaching notes appear in prompts
uv run pytest tests/test_step_prompt_teaching_notes.py -v
# Verify routing decisions
uv run pytest tests/test_build_stepwise_routing.py -v
# Verify receipt/transcript contracts
uv run pytest tests/test_step_engine_contract.py -vThis section documents deliberate scope boundaries for the stepwise harness implementation.
| Feature | Status | Notes |
|---|---|---|
| Flows 1-4 (Signal → Gate) | ✅ Complete | Fully stepwise with teaching_notes, microloops, golden examples |
| Flows 5-6 (Deploy → Wisdom) | ✅ Complete | Linear routing, teaching_notes, golden examples |
| Stub mode | ✅ Complete | Zero-cost testing for CI and demos |
| CLI mode | ✅ Complete | Claude Code or GLM Coding Plan |
| SDK mode | ✅ Complete | Real Anthropic API calls |
-
No multi-engine per step: Engine selection is at the flow level, not step level. You cannot run step 1 with Claude and step 2 with Gemini in the same flow. Workaround: Run different flows with different engines.
-
No automatic resumption: If a run fails mid-flow, you must restart from the beginning. There's no checkpoint/resume mechanism yet. Workaround: Use stub mode to test flow structure, then run real mode when ready.
-
SDK tests are minimal: Most test coverage uses stub mode. The SDK smoke test (
test_step_engine_sdk_smoke.py) only runs whenANTHROPIC_API_KEYis set. Workaround: Runuv run pytest tests/test_step_engine_sdk_smoke.py -vmanually to verify SDK integration. -
Microloops only in Build: Routing with microloops (loop-back on UNVERIFIED) is only tested for Build flow (test/code loops). Deploy and Wisdom use linear routing. Workaround: Extend
_route()inorchestrator.pyif you need microloops elsewhere. -
No streaming to UI: Transcripts are written after step completion, not streamed during execution. Flow Studio shows completed steps, not in-progress output.
To extend beyond these limitations:
-
Per-step engine: Add
engine_strategyconfig toruntime.yamland implementget_engine_for_step(flow_key, step_id)inengines.py -
Resumption: Add checkpoint serialization in
storage.pyand resume logic inorchestrator._execute_stepwise() -
Streaming: Use
streaming_callbackin engine methods to emit events during execution
The following enhancements are not implemented and out-of-scope for v2.3.0. They are documented here for future reference.
| Feature | Description | Entry Point |
|---|---|---|
| Per-step engine strategy | Mix Gemini + Claude in one flow (e.g., step 1 with Claude, step 2 with Gemini) | Add engine_strategy to swarm/config/runtime.yaml, implement get_engine_for_step(flow_key, step_id) in swarm/runtime/engines.py |
| Run resumption / checkpoints | Persist orchestrator step index + routing state for mid-flow recovery | Add checkpoint serialization to swarm/runtime/storage.py, add start_from_step / resume_run entry points in orchestrator |
| Streaming into Flow Studio | Real-time updates during step execution instead of post-completion | Expose streaming event channel from engines, extend Flow Studio to subscribe and render in-progress steps |
- FLOW_STUDIO.md -- Flow Studio UI guide, including stepwise section
- CONTEXT_BUDGETS.md -- Token discipline, priority-aware history selection
- GLOSSARY.md -- Terminology definitions
- CLAUDE.md -- Technology stack and runtime backends overview
- LONG_RUNNING_HARNESSES.md -- How stepwise maps to Anthropic's long-running agent patterns