AI agent orchestration framework with MCP-style tool integrations. The Go orchestrator runs a multi-step agent loop against tool servers that speak JSON-RPC 2.0 over stdio. Every tool response is validated against a JSON Schema declared by the tool itself, retries are bounded with exponential backoff and an explicit transient-vs-permanent classifier, and every step writes an OpenTelemetry span with the input arguments, result preview, and attempt count.
This repo is a small lab for four ideas:
- MCP-style tool protocols. Tools are subprocesses, not in-process
callables. Each one speaks JSON-RPC 2.0 over newline-delimited JSON on
stdin/stdout. See
docs/jsonrpc-wire.md. - Structured-output validation as a load-bearing piece. Every tool
declares a
resultSchemain its firsttools/listresponse. The orchestrator validates every response against that schema before passing it to the next step. Validation failures are classified as permanent and are not retried. - OTel-style tracing for agent steps. One root span per run, one span
per step, one span per attempt. Result previews, retry counts, and
failure classes live on the span attributes. See
docs/tracing.md. - Retry classification. Network and JSON-RPC
-32603 internal errorare transient (retried at 100 ms / 400 ms / 1600 ms). Schema-validation errors,-32002 permanent tool error, and bad-input codes are not retried. Seeinternal/orchestrator/retry.go.
The two repos sit on different axes of the agent design space.
| Axis | agentic-runner |
agentlab (this repo) |
|---|---|---|
| Language split | Pure Python, single process | Go orchestrator + Python tool servers (two processes/tool) |
| Tool interface | In-process callables, looked up in a registry | Subprocess JSON-RPC 2.0 over stdio (MCP-style) |
| Tool count and surface | One generic registry; tools added as functions | 8 distinct tools, each its own Python server |
| Provider focus | Replan loop: validate output, re-decompose plan | Fixed scripted plan; the focus is the orchestrator side |
| Schema validation | Pydantic at tool boundary | Pydantic at tool boundary plus gojsonschema in Go on every result |
| Tracing | Python logging + step records |
OTel-style span tree (root + step + attempt) |
| Retry classifier | Inline per-tool | Centralised in the orchestrator; transient/permanent table |
They are deliberately complementary. agentic-runner is the place to study
provider-driven replanning; agentlab is the place to study the protocol
and orchestrator layer.
Each tool is a separate Python process. The orchestrator spawns it,
discovers its schema via tools/list, and routes calls to it via
tools/call.
| Tool | Purpose |
|---|---|
file_read |
Read a UTF-8 text file (with byte cap and truncation flag). |
file_write |
Write a UTF-8 text file (overwrites; optional create_parents). |
http_get |
GET a URL. In CI, served from a fixture map (no real network). |
calculate |
Safe arithmetic evaluator using a real recursive-descent parser. |
query_db |
SELECT-only queries against a bundled SQLite (cities, countries). |
summarize |
Deterministic first-N-sentences summariser. No LLM call. |
extract_json |
Pull a JSON object from a string and validate against a JSON Schema. |
finish |
Terminate the loop with a final answer. |
See docs/tool-protocol.md for the skeleton if you want to add another one.
+-----------------------------+
| cmd/agentlab (Go) |
+--------------+--------------+
|
+--------------v--------------+
| internal/orchestrator |
| - step loop |
| - retry classifier |
| - OTel tracer |
+------+------+--------+-------+
| | |
+--------v-+ +--v----+ +-v-----------+
| provider | | tools | | trace |
| (Fake/ | | reg + | | exporter |
| Claude | | schema| | (mem/OTLP) |
| stub) | | cache | | |
+----+-----+ +---+---+ +-------------+
| |
scripted YAML | JSON-RPC 2.0 over stdio (one subprocess per tool)
v
+----------+ +-----------+ +----------+ +----------+ ...
| file_read| | file_write| | calculate| | finish |
| (Python) | | (Python) | | (Python) | | (Python) |
+----------+ +-----------+ +----------+ +----------+
Requirements: Go 1.22+, Python 3.11 or 3.12, Docker (optional).
make up # creates a venv, installs deps, builds bin/agentlab
make demo # runs the 8-tool demo, writes demo-output/{result,trace}.jsonl
make lint # golangci-lint + ruff + black --check
make typecheck # go vet + mypy --strict
make test # go test + pytest + the 8-tool integrationOr via Docker:
docker compose build
docker compose run --rm agentlab$ make demo
registered 8 tools: [calculate extract_json file_read file_write finish http_get query_db summarize]
steps=8 final_answer="Tokyo, capital of Japan, population ~13.96M" done=true
The plan in tasks/multi_step_demo.yaml touches every tool exactly once,
in this order:
| Step | Tool | Result (truncated) |
|---|---|---|
| 0 | file_write |
{"path": "/tmp/agentlab-demo.txt", "bytes_written": 139} |
| 1 | file_read |
{"content": "Tokyo is the capital of Japan...", "bytes_read": 139, ...} |
| 2 | calculate |
{"value": 13.96, "expression": "13960000 / 1000000"} |
| 3 | query_db |
{"rows": [{"name": "Tokyo", "population": 13960000, "country": "JP"}], ...} |
| 4 | http_get |
{"status": 200, "url": "http://agentlab.local/tokyo.json", "body": "..."} |
| 5 | extract_json |
{"data": {"city": "Tokyo", "population": 13960000, "country": "JP"}, ...} |
| 6 | summarize |
{"summary": "Tokyo is the capital of Japan...", "sentence_count": 2} |
| 7 | finish |
{"answer": "Tokyo, capital of Japan, population ~13.96M", "done": true} |
Trace numbers from one local run:
| Metric | Value |
|---|---|
| Spans emitted | 17 |
| Step spans | 8 |
| Attempt spans | 8 (no retries; happy-path run) |
| Sum of step latencies | 41.8 ms |
| Wall-clock end-to-end (including subprocess spawn) | sub-second |
Latencies vary by machine. The shape (17 spans, 8 steps, 0 retries on the
happy path) is asserted by the integration test in
tests/integration/demo_test.go.
- Not a real MCP implementation. The wire format and
tools/list/tools/callnames borrow from MCP, but resources, prompts, content blocks, capability negotiation, and sampling are out of scope. Seedocs/mcp-comparison.mdfor the full list of deltas. - No real LLM in CI. The
FakeProviderreads a scripted YAML plan; every test and themake demoflow uses it. AClaudeProviderstub exists ininternal/provider/claude.goto document the BYOK swap path, but it is env-gated and never invoked in CI. - No auth on tool servers. The subprocess trust model assumes that whoever spawned the server gets to talk to it. There is no transport encryption or authentication.
- No streaming tool outputs. Strict request/response.
- No tool composition or sub-agents. A scripted plan is one linear
list of
(tool, arguments)pairs. - No parallel tool calls. Tool calls are sequential. The companion
agentic-runnerrepo studies a different shape (provider-driven replan); parallel tool dispatch is left for a future repo.
agentlab/
cmd/agentlab/ Go CLI entrypoint
internal/
orchestrator/ agent loop, retry classifier
jsonrpc/ JSON-RPC 2.0 client + stdio transport
tools/ registry, schema cache, validation
provider/ Provider interface, FakeProvider, ClaudeProvider stub
trace/ span model and exporters (in-memory + OTLP/JSON)
config/ YAML loader
chaos/ in-process fault injection for tests
agentlab_tools/ the 8 Python tool servers + shared protocol base
tasks/ the canonical demo plan + top-level config
tests/ Go integration + Python unit suites
docs/ wire format, MCP comparison, tool-author guide, tracing
MIT, see LICENSE.