AgentLab

AI agent orchestration framework with MCP-style tool integrations. The Go orchestrator runs a multi-step agent loop against tool servers that speak JSON-RPC 2.0 over stdio. Every tool response is validated against a JSON Schema declared by the tool itself, retries are bounded with exponential backoff and an explicit transient-vs-permanent classifier, and every step writes an OpenTelemetry span with the input arguments, result preview, and attempt count.

What this studies

This repo is a small lab for four ideas:

MCP-style tool protocols. Tools are subprocesses, not in-process callables. Each one speaks JSON-RPC 2.0 over newline-delimited JSON on stdin/stdout. See docs/jsonrpc-wire.md.
Structured-output validation as a load-bearing piece. Every tool declares a resultSchema in its first tools/list response. The orchestrator validates every response against that schema before passing it to the next step. Validation failures are classified as permanent and are not retried.
OTel-style tracing for agent steps. One root span per run, one span per step, one span per attempt. Result previews, retry counts, and failure classes live on the span attributes. See docs/tracing.md.
Retry classification. Network and JSON-RPC -32603 internal error are transient (retried at 100 ms / 400 ms / 1600 ms). Schema-validation errors, -32002 permanent tool error, and bad-input codes are not retried. See internal/orchestrator/retry.go.

How this differs from `SAY-5/agentic-runner`

The two repos sit on different axes of the agent design space.

Axis	`agentic-runner`	`agentlab` (this repo)
Language split	Pure Python, single process	Go orchestrator + Python tool servers (two processes/tool)
Tool interface	In-process callables, looked up in a registry	Subprocess JSON-RPC 2.0 over stdio (MCP-style)
Tool count and surface	One generic registry; tools added as functions	8 distinct tools, each its own Python server
Provider focus	Replan loop: validate output, re-decompose plan	Fixed scripted plan; the focus is the orchestrator side
Schema validation	Pydantic at tool boundary	Pydantic at tool boundary plus `gojsonschema` in Go on every result
Tracing	Python `logging` + step records	OTel-style span tree (root + step + attempt)
Retry classifier	Inline per-tool	Centralised in the orchestrator; transient/permanent table

They are deliberately complementary. agentic-runner is the place to study provider-driven replanning; agentlab is the place to study the protocol and orchestrator layer.

The 8 tools

Each tool is a separate Python process. The orchestrator spawns it, discovers its schema via tools/list, and routes calls to it via tools/call.

Tool	Purpose
`file_read`	Read a UTF-8 text file (with byte cap and truncation flag).
`file_write`	Write a UTF-8 text file (overwrites; optional `create_parents`).
`http_get`	GET a URL. In CI, served from a fixture map (no real network).
`calculate`	Safe arithmetic evaluator using a real recursive-descent parser.
`query_db`	SELECT-only queries against a bundled SQLite (cities, countries).
`summarize`	Deterministic first-N-sentences summariser. No LLM call.
`extract_json`	Pull a JSON object from a string and validate against a JSON Schema.
`finish`	Terminate the loop with a final answer.

See docs/tool-protocol.md for the skeleton if you want to add another one.

Architecture

                +-----------------------------+
                |     cmd/agentlab (Go)       |
                +--------------+--------------+
                               |
                +--------------v--------------+
                |  internal/orchestrator       |
                |  - step loop                 |
                |  - retry classifier          |
                |  - OTel tracer               |
                +------+------+--------+-------+
                       |      |        |
              +--------v-+ +--v----+ +-v-----------+
              | provider | | tools | |  trace      |
              | (Fake/   | | reg + | |  exporter   |
              | Claude   | | schema| |  (mem/OTLP) |
              | stub)    | | cache | |             |
              +----+-----+ +---+---+ +-------------+
                   |           |
            scripted YAML      | JSON-RPC 2.0 over stdio (one subprocess per tool)
                               v
        +----------+ +-----------+ +----------+ +----------+ ...
        | file_read| | file_write| | calculate| |  finish  |
        | (Python) | | (Python)  | | (Python) | | (Python) |
        +----------+ +-----------+ +----------+ +----------+

Quickstart

Requirements: Go 1.22+, Python 3.11 or 3.12, Docker (optional).

make up        # creates a venv, installs deps, builds bin/agentlab
make demo      # runs the 8-tool demo, writes demo-output/{result,trace}.jsonl
make lint      # golangci-lint + ruff + black --check
make typecheck # go vet + mypy --strict
make test      # go test + pytest + the 8-tool integration

Or via Docker:

docker compose build
docker compose run --rm agentlab

Demo run output

$ make demo
registered 8 tools: [calculate extract_json file_read file_write finish http_get query_db summarize]
steps=8 final_answer="Tokyo, capital of Japan, population ~13.96M" done=true

The plan in tasks/multi_step_demo.yaml touches every tool exactly once, in this order:

Step	Tool	Result (truncated)
0	`file_write`	`{"path": "/tmp/agentlab-demo.txt", "bytes_written": 139}`
1	`file_read`	`{"content": "Tokyo is the capital of Japan...", "bytes_read": 139, ...}`
2	`calculate`	`{"value": 13.96, "expression": "13960000 / 1000000"}`
3	`query_db`	`{"rows": [{"name": "Tokyo", "population": 13960000, "country": "JP"}], ...}`
4	`http_get`	`{"status": 200, "url": "http://agentlab.local/tokyo.json", "body": "..."}`
5	`extract_json`	`{"data": {"city": "Tokyo", "population": 13960000, "country": "JP"}, ...}`
6	`summarize`	`{"summary": "Tokyo is the capital of Japan...", "sentence_count": 2}`
7	`finish`	`{"answer": "Tokyo, capital of Japan, population ~13.96M", "done": true}`

Trace numbers from one local run:

Metric	Value
Spans emitted	17
Step spans	8
Attempt spans	8 (no retries; happy-path run)
Sum of step latencies	41.8 ms
Wall-clock end-to-end (including subprocess spawn)	sub-second

Latencies vary by machine. The shape (17 spans, 8 steps, 0 retries on the happy path) is asserted by the integration test in tests/integration/demo_test.go.

What this is not

Not a real MCP implementation. The wire format and tools/list / tools/call names borrow from MCP, but resources, prompts, content blocks, capability negotiation, and sampling are out of scope. See docs/mcp-comparison.md for the full list of deltas.
No real LLM in CI. The FakeProvider reads a scripted YAML plan; every test and the make demo flow uses it. A ClaudeProvider stub exists in internal/provider/claude.go to document the BYOK swap path, but it is env-gated and never invoked in CI.
No auth on tool servers. The subprocess trust model assumes that whoever spawned the server gets to talk to it. There is no transport encryption or authentication.
No streaming tool outputs. Strict request/response.
No tool composition or sub-agents. A scripted plan is one linear list of (tool, arguments) pairs.
No parallel tool calls. Tool calls are sequential. The companion agentic-runner repo studies a different shape (provider-driven replan); parallel tool dispatch is left for a future repo.

Project layout

agentlab/
  cmd/agentlab/          Go CLI entrypoint
  internal/
    orchestrator/        agent loop, retry classifier
    jsonrpc/             JSON-RPC 2.0 client + stdio transport
    tools/               registry, schema cache, validation
    provider/            Provider interface, FakeProvider, ClaudeProvider stub
    trace/               span model and exporters (in-memory + OTLP/JSON)
    config/              YAML loader
    chaos/               in-process fault injection for tests
  agentlab_tools/        the 8 Python tool servers + shared protocol base
  tasks/                 the canonical demo plan + top-level config
  tests/                 Go integration + Python unit suites
  docs/                  wire format, MCP comparison, tool-author guide, tracing

License

MIT, see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
agentlab_tools		agentlab_tools
cmd/agentlab		cmd/agentlab
docs		docs
internal		internal
tasks		tasks
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
demo-result.json		demo-result.json
demo-trace.jsonl		demo-trace.jsonl
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentLab

What this studies

How this differs from `SAY-5/agentic-runner`

The 8 tools

Architecture

Quickstart

Demo run output

What this is not

Project layout

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AgentLab

What this studies

How this differs from SAY-5/agentic-runner

The 8 tools

Architecture

Quickstart

Demo run output

What this is not

Project layout

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

How this differs from `SAY-5/agentic-runner`

Packages