TraceFlow Lite

An Agent Reliability & Evaluation Control Plane SDK.
Tracing + replay (SQLite), eval gates (PASS/REVISE/FALLBACK), provider routing/fallback, and cost/latency budgets — so agentic systems are safe to ship and easy to operate.

Portfolio summary: Built an agent reliability control plane SDK with eval gates (PASS/REVISE/FALLBACK), trace + replay (SQLite), provider routing/fallback, and cost/latency budgets — implemented as an internal LangGraph-orchestrated pipeline.

Proof: Live Streamlit demo + video walkthrough + persisted traces you can replay.

TraceFlow Lite sits between your app and LLM providers, enforcing change safety (quality gates + revise loops) and operability (traces, replay, budgets, retries).
Internally, TraceFlow orchestrates a multi-node workflow using LangGraph.

Why TraceFlow Lite?

Building reliable AI agents is hard. You need to handle:

Cost blowouts — A single runaway query can drain your budget
Latency spikes — Users abandon slow responses
Quality inconsistency — LLMs hallucinate and go off-topic
Debugging nightmares — "What prompt caused that output?"
Provider lock-in — Switching from OpenAI to Anthropic shouldn't require rewrites

TraceFlow Lite solves these by providing a control plane that sits between your application and LLM providers.

Features

Feature	Description
🧠 Workflow Orchestration (LangGraph)	Internal multi-step workflow with routing + loops
🛡️ Eval Gates	Automatic cost, latency, and quality checks before responses are finalized
💰 Cost Tracking	Per-request token counting via tiktoken with USD cost calculation
🔁 Retry & Revision	Tenacity-powered retries + intelligent revision loop for quality
📊 Trace Persistence	SQLite storage (WAL mode) for debugging, analytics, and replay
🔌 Pluggable Retriever	Bring your own RAG with flexible callback interface
🏭 Provider Abstraction	Easily swap or add LLM providers without code changes

Architecture

Workflow Overview

Client receives a user query and configuration
Intake Node extracts and validates the input
Planner Node decides if context retrieval is needed
Retriever Node (optional) fetches relevant documents via your RAG callback
Executor Node calls the LLM provider with retry logic
Evaluator Node checks cost/latency constraints and quality
Router Node directs traffic based on eval decision:
- PASS → Return final answer
- REVISE → Loop back to executor with refinement instructions
- FALLBACK → Return graceful fallback message

All interactions are traced to SQLite for debugging and replay.

Installation

Using Poetry (recommended)

git clone https://github.com/khalilCodeX/traceflow-lite.git
cd traceflow-lite
poetry install

Using pip

git clone https://github.com/khalilCodeX/traceflow-lite.git
cd traceflow-lite
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Development Setup

# Install with dev dependencies
poetry install --with dev

# Run tests
poetry run pytest

# Run linter
poetry run ruff check .

Adding Dependencies

# Add a runtime dependency
poetry add <package>

# Add a dev dependency
poetry add --group dev <package>

# After adding, sync requirements.txt for Streamlit Cloud:
poetry export --without-hashes -o requirements.txt

Quickstart

Get up and running in 60 seconds:

# 1. Clone and install
git clone https://github.com/khalilCodeX/traceflow-lite.git
cd traceflow-lite
poetry install

# 2. Set your API key
export OPENAI_API_KEY="sk-..."

# 3. Launch the UI
make ui

Open http://localhost:8501 — select a model, type a query, and hit Execute. That's it!

Or try it programmatically:

poetry run python -c "
from client import TraceFlowClient
result = TraceFlowClient().run('What is machine learning?')
print(result.answer)
"

Programmatic Usage

from client import TraceFlowClient
from tf_types import RunConfig, Mode

client = TraceFlowClient()

# Basic usage with defaults
result = client.run("What is machine learning?")
print(result.answer)

# With custom configuration
config = RunConfig(
    mode=Mode.GROUNDED_QA,
    model="gpt-3.5-turbo",
    max_tokens=500,
    max_cost_usd=0.10,      # Budget limit
    max_latency_ms=10000,   # Latency limit
    max_revisions=2         # Max retry attempts
)
result = client.run("Explain neural networks", config)

print(f"Answer: {result.answer}")
print(f"Status: {result.status}")
print(f"Trace ID: {result.trace_id}")

Using a Custom Retriever (RAG)

TraceFlow Lite doesn't lock you into a specific vector database. Provide your own retriever function:

from tf_types import RunConfig, RetrievedChunk

def my_retriever(query: str) -> list[RetrievedChunk]:
    # Your Chroma/Pinecone/Weaviate/custom implementation
    return [
        RetrievedChunk(
            chunk_id="doc_1",
            content="Relevant context here...",
            source="knowledge_base",
            relevance_score=0.95
        )
    ]

config = RunConfig(retriever_fn=my_retriever)
result = client.run("Question needing context", config)

Using the Built-in Chroma Helper

from utils.retriever_utils import chroma_retriever
from utils.vector_types import chroma_params

# Setup retriever with your documents
documents = [
    "AI is the simulation of human intelligence by machines.",
    "Machine learning is a subset of AI that learns from data.",
]

params = chroma_params(
    documents=documents,
    collection="my_knowledge_base",
    directory="./chroma_db"
)

retriever = chroma_retriever(local=True, params=params)
retriever.create_vector_store(documents)

# Use in your runs
config = RunConfig(retriever_fn=retriever.retrieve_similar_docs)
result = client.run("What is AI?", config)

Trace Persistence & Replay

Every run is automatically persisted. Debug issues or experiment with different configs:

# List recent traces
traces = client.list_traces(limit=10)
for trace in traces:
    print(f"{trace.trace_id}: {trace.user_input[:50]}...")

# Get a specific trace
trace = client.get_trace("abc123...")
print(trace.final_answer)

# Replay with different configuration
result = client.replay(
    trace_id="abc123...",
    overrides=RunConfig(model="gpt-4o", max_tokens=1000)
)

Configuration Reference

Field	Type	Default	Description
`mode`	`Mode`	`GROUNDED_QA`	Workflow mode (GROUNDED_QA, TRIAGE_PLAN, CHANGE_SAFETY)
`model`	`str`	`gpt-3.5-turbo`	LLM model identifier
`provider`	`str`	`openai`	Provider name
`max_tokens`	`int`	`1024`	Max output tokens
`temperature`	`float`	`0.2`	Sampling temperature
`max_cost_usd`	`float`	`1.50`	Budget limit per run (USD)
`max_latency_ms`	`int`	`30000`	Latency limit (milliseconds)
`max_revisions`	`int`	`3`	Max revision attempts before fallback
`strictness`	`Strictness`	`BALANCED`	Eval gate strictness (LENIENT, BALANCED, STRICT)
`retriever_fn`	`Callable`	`None`	Custom retriever callback for RAG
`enable_cache`	`bool`	`True`	Enable LLM response caching

Project Structure

traceflow-lite/
├── client.py                 # Public SDK entrypoint
├── tf_types.py               # Types, enums, dataclasses
├── state.py                  # Pydantic state models
│
├── graph_flow/
│   ├── graph.py              # LangGraph workflow definition
│   └── nodes/
│       └── nodes.py          # Node implementations
│
├── providers/
│   ├── base.py               # BaseProvider ABC
│   ├── openai_provider.py    # OpenAI implementation
│   ├── router.py             # Provider factory
│   ├── cost.py               # Token counting & pricing
│   └── retry.py              # Tenacity retry decorator
│
├── persistence/
│   ├── sqlite.py             # DB connection & schema
│   └── trace_store.py        # CRUD operations
│
├── utils/
│   ├── retriever_utils.py    # Chroma helper
│   └── vector_types.py       # Retriever config types
│
└── tests/
    └── test_client.py        # Integration tests

Environment Variables

Create a .env file in your project root:

OPENAI_API_KEY=sk-...

# Optional for Chroma Cloud:
CHROMA_API_KEY=...
CHROMA_TENANT=...
CHROMA_DATABASE=...

Running Tests

# Run all tests
pytest tests/test_client.py -v

# Run specific test
pytest tests/test_client.py::test_basic_run_without_retriever -v

Roadmap

Contributing

Contributions are welcome! Please read our contributing guidelines and submit PRs.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.devcontainer		.devcontainer
.github		.github
Architecture		Architecture
docs/adr		docs/adr
graph_flow		graph_flow
persistence		persistence
providers		providers
tests		tests
ui		ui
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
client.py		client.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
state.py		state.py
tf_types.py		tf_types.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TraceFlow Lite

Why TraceFlow Lite?

Features

Architecture

Workflow Overview

Installation

Using Poetry (recommended)

Using pip

Development Setup

Adding Dependencies

Quickstart

Programmatic Usage

Using a Custom Retriever (RAG)

Using the Built-in Chroma Helper

Trace Persistence & Replay

Configuration Reference

Project Structure

Environment Variables

Running Tests

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TraceFlow Lite

Why TraceFlow Lite?

Features

Architecture

Workflow Overview

Installation

Using Poetry (recommended)

Using pip

Development Setup

Adding Dependencies

Quickstart

Programmatic Usage

Using a Custom Retriever (RAG)

Using the Built-in Chroma Helper

Trace Persistence & Replay

Configuration Reference

Project Structure

Environment Variables

Running Tests

Roadmap

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages