Timbal

Timbal 2.0 is in beta. The API is stable — we're finalizing docs and tooling before the full release.

Simple, performant, battle-tested framework for building reliable AI applications.

Installation

pip install timbal

Timbal is modular. The bare install includes the agent/workflow engine, both Anthropic and OpenAI providers, MCP support, and tracing. Install extras only when you need them:

Extra	What it adds	When to use
`timbal[server]`	FastAPI + uvicorn	Serving agents over HTTP
`timbal[documents]`	PyMuPDF + openpyxl + python-docx	Reading PDFs, Excel, Word files
`timbal[evals]`	rich	Running the evals CLI
`timbal[codegen]`	libcst + ruff	Using the code generation tools
`timbal[all]`	Everything above

pip install 'timbal[server]'
pip install 'timbal[documents,evals]'
pip install 'timbal[all]'

From source

git clone https://github.com/timbal-ai/timbal.git
cd timbal
uv sync --dev

Two patterns, one interface

Agent — autonomous reasoning

The LLM decides what to do. You provide tools and a goal.

from timbal import Agent
from timbal.tools import WebSearch
from datetime import datetime

def get_datetime() -> str:
    return datetime.now().isoformat()

agent = Agent(
    name="assistant",
    model="anthropic/claude-sonnet-4-6",
    tools=[get_datetime, WebSearch()],
    max_tokens=1024,
)

result = await agent.collect(prompt="What time is it in Tokyo right now?")
print(result.output)

Workflow — explicit pipelines

You define the steps. The framework handles concurrency and dependency resolution.

import httpx
from timbal import Workflow
from timbal.state import get_run_context
from timbal.tools import Write

async def fetch(url: str) -> str:
    async with httpx.AsyncClient(follow_redirects=True) as client:
        return (await client.get(url)).text

workflow = (
    Workflow(name="scraper")
    .step(fetch)
    .step(
        Write(),
        path="./output.html",
        content=lambda: get_run_context().step_span("fetch").output,
    )
)

await workflow.collect(url="https://timbal.ai")

Independent steps run concurrently. Dependencies are inferred automatically from step_span() references — no manual wiring needed.

Calling runnables

All Agent, Workflow, and Tool instances share the same interface.

# Collect all events, return final OutputEvent
result = await agent.collect(prompt="Hello")
print(result.output)           # final result
print(result.status.code)      # "success" | "error" | "cancelled"
print(result.usage)            # {"anthropic/claude-sonnet-4-6:input_tokens": 42, ...}

# Or stream events
async for event in agent(prompt="Hello"):
    if isinstance(event, DeltaEvent) and isinstance(event.item, TextDelta):
        print(event.item.text_delta, end="", flush=True)

Models

Any provider, one interface. Model strings follow provider/model-name:

anthropic/claude-sonnet-4-6       openai/gpt-4o
anthropic/claude-opus-4-6         openai/gpt-4o-mini
anthropic/claude-haiku-4-5        openai/o3
google/gemini-2.5-flash           groq/llama-3.3-70b-versatile
google/gemini-2.5-pro-preview     xai/grok-3
cerebras/llama-3.3-70b            sambanova/Meta-Llama-3.3-70B-Instruct

Full list and context window sizes in python/timbal/core/models.py.

Structured output

from pydantic import BaseModel

class Analysis(BaseModel):
    sentiment: str
    confidence: float
    summary: str

agent = Agent(model="openai/gpt-4o-mini", output_model=Analysis)
result = await agent.collect(prompt="Analyse: 'Timbal makes AI easy'")
print(result.output.sentiment)     # "positive"
print(result.output.confidence)    # 0.97

Streaming

from timbal.types.events import DeltaEvent
from timbal.types.events.delta import TextDelta

async for event in agent(prompt="Write a short poem"):
    if isinstance(event, DeltaEvent) and isinstance(event.item, TextDelta):
        print(event.item.text_delta, end="", flush=True)

Memory compaction

Long conversations grow large. Timbal has built-in strategies to keep context under control.

from timbal.core.memory_compaction import (
    compact_tool_results,
    keep_last_n_messages,
    keep_last_n_turns,
    summarize,
)

agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    max_tokens=2048,
    memory_compaction=[
        compact_tool_results(keep_last_n=2),   # compress old tool outputs
        keep_last_n_turns(10),                 # keep last 10 user↔assistant turns
    ],
    memory_compaction_ratio=0.75,              # trigger at 75% context window usage
)

Strategies:

compact_tool_results(keep_last_n, threshold, replacement) — strips old tool results, optionally replacing with a summary string
keep_last_n_messages(n) — hard truncation, structure-aware (no orphaned tool pairs)
keep_last_n_turns(n) — keep last N user+assistant pairs
summarize(threshold, model, keep_last_n, max_summary_tokens) — async LLM-based summarization of old messages

Strategies are applied in order. Multiple strategies can be combined.

Skills

Skills are reusable, self-documenting tool packages. They sit on disk and are loaded into the agent's context only when the LLM explicitly requests them via the auto-injected read_skill tool.

skills/
└── web_research/
    ├── SKILL.md          # frontmatter + docs shown to the LLM
    └── tools/
        ├── search.py
        └── scrape.py

# SKILL.md
---
name: "web_research"
description: "Search the web and scrape pages"
---
Use `search(query)` to find pages, then `scrape(url)` to get the content.

agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    skills_path="./skills",
    max_tokens=2048,
)

The agent sees skill names and descriptions at startup. It calls read_skill("web_research") to load the tools and documentation when needed — keeping context clean until the skill is actually required.

MCP servers

Connect agents to any Model Context Protocol server.

from timbal.core import MCPServer

# Local server via stdio
mcp = MCPServer(
    transport="stdio",
    command="npx",
    args=["-y", "@modelcontextprotocol/server-filesystem", "."],
)

# Remote server via HTTP
mcp = MCPServer(
    transport="http",
    url="https://my-mcp-server.com",
    headers={"Authorization": "Bearer token"},
)

agent = Agent(
    model="anthropic/claude-sonnet-4-6",
    tools=[mcp],
    max_tokens=2048,
)

Conditional workflows

workflow = (
    Workflow(name="pipeline")
    .step(validate_input)
    .step(
        process,
        when=lambda: get_run_context().step_span("validate_input").output["valid"],
        data=lambda: get_run_context().step_span("validate_input").output["data"],
    )
    .step(
        notify_failure,
        when=lambda: not get_run_context().step_span("validate_input").output["valid"],
    )
)

Steps with when= are skipped (not failed) when the condition is False. Downstream steps that depend on a skipped step are also skipped automatically.

Observability

Timbal has a layered tracing system. Every run produces a full span trace.

from timbal.state.tracing.providers import JsonlTracingProvider
from timbal.state.tracing.exporters import OTelExporter
from pathlib import Path

provider = JsonlTracingProvider.configured(
    _path=Path("traces.jsonl"),
    _exporters=[
        OTelExporter(
            endpoint="http://localhost:4318",
            service_name="my-agent",
            headers={"x-honeycomb-team": "YOUR_KEY"},
        ),
    ],
)

agent = Agent(model="...", tracing_provider=provider)

OTelExporter is fire-and-forget — it never adds latency to your runs. Compatible with Jaeger, Honeycomb, Datadog, Grafana Tempo, and any OTLP backend. Custom exporters:

from timbal.state.tracing.providers.base import Exporter

class MyExporter(Exporter):
    async def export(self, run_context) -> None:
        spans = list(run_context._trace.values())
        await my_backend.send(spans)

Session chaining

Link runs so an agent can recall what happened in a previous session — even across process restarts.

from timbal.state.tracing.providers import JsonlTracingProvider
from pathlib import Path

provider = JsonlTracingProvider.configured(_path=Path("sessions.jsonl"))
agent = Agent(model="...", tracing_provider=provider)

run1 = await agent.collect(prompt="My name is Alice.")
print(run1.run_id)   # "abc123"

# Next session — agent remembers the previous one
from timbal.state.context import RunContext
ctx = RunContext(parent_id="abc123", tracing_provider=provider)
run2 = await agent.collect(prompt="What's my name?", run_context=ctx)

HTTP serving

Requires pip install 'timbal[server]'. Serve any agent or workflow over HTTP with one command.

python -m timbal.server.http \
  --import_spec path/to/agent.py::my_agent \
  --host 0.0.0.0 \
  --port 4444 \
  --workers 4

Endpoint	Method	Description
`/healthcheck`	GET	Returns 204
`/params_model_schema`	GET	JSON schema of inputs
`/return_model_schema`	GET	JSON schema of output
`/run`	POST	Execute and wait
`/stream`	POST	Stream events as SSE
`/cancel/{run_id}`	POST	Cancel a running execution

curl -X POST http://localhost:4444/run \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello"}'

Evals

Declarative evaluation suite with built-in validators.

# evals.yaml
evals:
  - name: "greets_user"
    params:
      prompt: "Say hello to Alice"
    agent!:
      output!:
        contains_all!: ["Hello", "Alice"]
      duration!:
        lt!: 5

from timbal.evals.runner import run_eval

result = await run_eval(eval_config, agent=my_agent)
print(result.passed)
print(result.validator_results)

Validators: contains, contains_all, contains_any, starts_with, ends_with, pattern, length, min_length, max_length, eq, lt, gt, type, email, json, semantic (LLM-based), language. All support not_ negation.

Testing without API calls

from timbal.core.test_model import TestModel

model = TestModel(responses=["The answer is 42."])
agent = Agent(name="test", model=model, tools=[])

result = await agent.collect(prompt="What is the answer?")
assert result.output.collect_text() == "The answer is 42."
assert result.status.code == "success"

Responses cycle to the last item when exhausted. Pass Message objects to test tool-calling flows. No network calls.

Hooks

Pre/post hooks run around every execution and have access to the full run context.

def log_pre():
    ctx = get_run_context()
    print(f"Starting run {ctx.id}")

def log_post():
    span = get_run_context().current_span()
    print(f"Output: {span.output}")

tool = Tool(handler=my_fn, pre_hook=log_pre, post_hook=log_post)

Hooks are parameterless callables. Both sync and async are supported.

Running tests

uv run pytest

uv run pytest python/tests/core/test_jsonl_tracing_provider.py
uv run pytest python/tests/core/test_otel_exporter.py::TestRetry

Benchmarks

cd benchmarks/langchain
uv pip install langchain-core langsmith langgraph

# Quick mode (default)
uv run pytest bench_*.py -v

# Full mode
TIMBAL_BENCH_MODE=full uv run pytest bench_*.py -v

See benchmarks/README.md for methodology and how to read results.

Repository structure

timbal/
├── python/
│   ├── timbal/
│   │   ├── core/             # Agent, Workflow, Tool, LLM router, Skills, MCP
│   │   ├── state/            # RunContext, tracing providers + exporters
│   │   ├── types/            # Message, File, Events
│   │   ├── collectors/       # Output processing
│   │   ├── evals/            # Evaluation framework
│   │   ├── server/           # HTTP serving
│   │   ├── platform/         # Timbal platform integration
│   │   └── tools/            # Built-in tool library
│   └── tests/core/
├── benchmarks/
│   ├── README.md
│   └── langchain/
├── CLAUDE.md                 # Codebase guide for AI agents
└── pyproject.toml

Why Timbal

Transparent by default. No hidden magic. Under the hood it's async functions, Pydantic validation, and event-driven streaming — nothing you couldn't build yourself, just already built well.

Production-shaped. The core abstractions were refined through real production deployments before the framework was open-sourced. Fast failure, clear error messages, stable interfaces.

One interface for everything. Agents, workflows, and tools all share the same __call__ / .collect() convention and the same event stream. Compose them freely.

Provider-agnostic. Anthropic, OpenAI, Google, Groq, xAI, Cerebras, SambaNova — same code, swap the model string.

Documentation

docs.timbal.ai

Contributing

Pull requests and issues welcome.

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 808 Commits
benchmarks		benchmarks
cli		cli
docs		docs
examples		examples
python		python
scripts		scripts
tui		tui
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
test.py		test.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Timbal

Installation

From source

Two patterns, one interface

Agent — autonomous reasoning

Workflow — explicit pipelines

Calling runnables

Models

Structured output

Streaming

Memory compaction

Skills

MCP servers

Conditional workflows

Observability

Session chaining

HTTP serving

Evals

Testing without API calls

Hooks

Running tests

Benchmarks

Repository structure

Why Timbal

Documentation

Contributing

License

About

Uh oh!

Releases 31

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Timbal

Installation

From source

Two patterns, one interface

Agent — autonomous reasoning

Workflow — explicit pipelines

Calling runnables

Models

Structured output

Streaming

Memory compaction

Skills

MCP servers

Conditional workflows

Observability

Session chaining

HTTP serving

Evals

Testing without API calls

Hooks

Running tests

Benchmarks

Repository structure

Why Timbal

Documentation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 31

Contributors

Uh oh!

Languages