🚀 Live Demo • 📺 Watch Video Demo
An Agent Reliability & Evaluation Control Plane SDK.
Tracing + replay (SQLite), eval gates (PASS/REVISE/FALLBACK), provider routing/fallback, and cost/latency budgets — so agentic systems are safe to ship and easy to operate.
Portfolio summary: Built an agent reliability control plane SDK with eval gates (PASS/REVISE/FALLBACK), trace + replay (SQLite), provider routing/fallback, and cost/latency budgets — implemented as an internal LangGraph-orchestrated pipeline.
Proof: Live Streamlit demo + video walkthrough + persisted traces you can replay.
TraceFlow Lite sits between your app and LLM providers, enforcing change safety (quality gates + revise loops) and operability (traces, replay, budgets, retries).
Internally, TraceFlow orchestrates a multi-node workflow using LangGraph.
Building reliable AI agents is hard. You need to handle:
- Cost blowouts — A single runaway query can drain your budget
- Latency spikes — Users abandon slow responses
- Quality inconsistency — LLMs hallucinate and go off-topic
- Debugging nightmares — "What prompt caused that output?"
- Provider lock-in — Switching from OpenAI to Anthropic shouldn't require rewrites
TraceFlow Lite solves these by providing a control plane that sits between your application and LLM providers.
| Feature | Description |
|---|---|
| 🧠 Workflow Orchestration (LangGraph) | Internal multi-step workflow with routing + loops |
| 🛡️ Eval Gates | Automatic cost, latency, and quality checks before responses are finalized |
| 💰 Cost Tracking | Per-request token counting via tiktoken with USD cost calculation |
| 🔁 Retry & Revision | Tenacity-powered retries + intelligent revision loop for quality |
| 📊 Trace Persistence | SQLite storage (WAL mode) for debugging, analytics, and replay |
| 🔌 Pluggable Retriever | Bring your own RAG with flexible callback interface |
| 🏭 Provider Abstraction | Easily swap or add LLM providers without code changes |
- Client receives a user query and configuration
- Intake Node extracts and validates the input
- Planner Node decides if context retrieval is needed
- Retriever Node (optional) fetches relevant documents via your RAG callback
- Executor Node calls the LLM provider with retry logic
- Evaluator Node checks cost/latency constraints and quality
- Router Node directs traffic based on eval decision:
PASS→ Return final answerREVISE→ Loop back to executor with refinement instructionsFALLBACK→ Return graceful fallback message
All interactions are traced to SQLite for debugging and replay.
git clone https://github.com/khalilCodeX/traceflow-lite.git
cd traceflow-lite
poetry installgit clone https://github.com/khalilCodeX/traceflow-lite.git
cd traceflow-lite
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt# Install with dev dependencies
poetry install --with dev
# Run tests
poetry run pytest
# Run linter
poetry run ruff check .# Add a runtime dependency
poetry add <package>
# Add a dev dependency
poetry add --group dev <package>
# After adding, sync requirements.txt for Streamlit Cloud:
poetry export --without-hashes -o requirements.txtGet up and running in 60 seconds:
# 1. Clone and install
git clone https://github.com/khalilCodeX/traceflow-lite.git
cd traceflow-lite
poetry install
# 2. Set your API key
export OPENAI_API_KEY="sk-..."
# 3. Launch the UI
make uiOpen http://localhost:8501 — select a model, type a query, and hit Execute. That's it!
Or try it programmatically:
poetry run python -c "
from client import TraceFlowClient
result = TraceFlowClient().run('What is machine learning?')
print(result.answer)
"from client import TraceFlowClient
from tf_types import RunConfig, Mode
client = TraceFlowClient()
# Basic usage with defaults
result = client.run("What is machine learning?")
print(result.answer)
# With custom configuration
config = RunConfig(
mode=Mode.GROUNDED_QA,
model="gpt-3.5-turbo",
max_tokens=500,
max_cost_usd=0.10, # Budget limit
max_latency_ms=10000, # Latency limit
max_revisions=2 # Max retry attempts
)
result = client.run("Explain neural networks", config)
print(f"Answer: {result.answer}")
print(f"Status: {result.status}")
print(f"Trace ID: {result.trace_id}")TraceFlow Lite doesn't lock you into a specific vector database. Provide your own retriever function:
from tf_types import RunConfig, RetrievedChunk
def my_retriever(query: str) -> list[RetrievedChunk]:
# Your Chroma/Pinecone/Weaviate/custom implementation
return [
RetrievedChunk(
chunk_id="doc_1",
content="Relevant context here...",
source="knowledge_base",
relevance_score=0.95
)
]
config = RunConfig(retriever_fn=my_retriever)
result = client.run("Question needing context", config)from utils.retriever_utils import chroma_retriever
from utils.vector_types import chroma_params
# Setup retriever with your documents
documents = [
"AI is the simulation of human intelligence by machines.",
"Machine learning is a subset of AI that learns from data.",
]
params = chroma_params(
documents=documents,
collection="my_knowledge_base",
directory="./chroma_db"
)
retriever = chroma_retriever(local=True, params=params)
retriever.create_vector_store(documents)
# Use in your runs
config = RunConfig(retriever_fn=retriever.retrieve_similar_docs)
result = client.run("What is AI?", config)Every run is automatically persisted. Debug issues or experiment with different configs:
# List recent traces
traces = client.list_traces(limit=10)
for trace in traces:
print(f"{trace.trace_id}: {trace.user_input[:50]}...")
# Get a specific trace
trace = client.get_trace("abc123...")
print(trace.final_answer)
# Replay with different configuration
result = client.replay(
trace_id="abc123...",
overrides=RunConfig(model="gpt-4o", max_tokens=1000)
)| Field | Type | Default | Description |
|---|---|---|---|
mode |
Mode |
GROUNDED_QA |
Workflow mode (GROUNDED_QA, TRIAGE_PLAN, CHANGE_SAFETY) |
model |
str |
gpt-3.5-turbo |
LLM model identifier |
provider |
str |
openai |
Provider name |
max_tokens |
int |
1024 |
Max output tokens |
temperature |
float |
0.2 |
Sampling temperature |
max_cost_usd |
float |
1.50 |
Budget limit per run (USD) |
max_latency_ms |
int |
30000 |
Latency limit (milliseconds) |
max_revisions |
int |
3 |
Max revision attempts before fallback |
strictness |
Strictness |
BALANCED |
Eval gate strictness (LENIENT, BALANCED, STRICT) |
retriever_fn |
Callable |
None |
Custom retriever callback for RAG |
enable_cache |
bool |
True |
Enable LLM response caching |
traceflow-lite/
├── client.py # Public SDK entrypoint
├── tf_types.py # Types, enums, dataclasses
├── state.py # Pydantic state models
│
├── graph_flow/
│ ├── graph.py # LangGraph workflow definition
│ └── nodes/
│ └── nodes.py # Node implementations
│
├── providers/
│ ├── base.py # BaseProvider ABC
│ ├── openai_provider.py # OpenAI implementation
│ ├── router.py # Provider factory
│ ├── cost.py # Token counting & pricing
│ └── retry.py # Tenacity retry decorator
│
├── persistence/
│ ├── sqlite.py # DB connection & schema
│ └── trace_store.py # CRUD operations
│
├── utils/
│ ├── retriever_utils.py # Chroma helper
│ └── vector_types.py # Retriever config types
│
└── tests/
└── test_client.py # Integration tests
Create a .env file in your project root:
OPENAI_API_KEY=sk-...
# Optional for Chroma Cloud:
CHROMA_API_KEY=...
CHROMA_TENANT=...
CHROMA_DATABASE=...# Run all tests
pytest tests/test_client.py -v
# Run specific test
pytest tests/test_client.py::test_basic_run_without_retriever -v- Multi-provider support (OpenAI, Anthropic)
- Response caching implementation
- Streamlit ops dashboard
- Trace persistence with SQLite + WAL mode
- Eval gate pattern with revision loop
- Cost tracking per request
- CI/CD with GitHub Actions
- Architecture Decision Records (ADRs)
- CLI tool (
traceflow run "query") - Budget-aware model fallback
- Advanced evaluators (relevance scoring, citation validation)
- Async execution support
- OpenTelemetry export integration
Contributions are welcome! Please read our contributing guidelines and submit PRs.
Apache 2.0
