| title | description | sidebarTitle |
|---|---|---|
Quickstart |
Get started with Handit.ai's complete AI observability and optimization platform in under 30 minutes. |
Quickstart |
import { Callout } from "nextra/components"; import { Steps } from "nextra/components"; import { Tabs } from "nextra/components";
**What you'll build:** A fully observable, continuously evaluated, and automatically optimizing AI system that improves itself based on real production data.The Open Source Engine that Auto-Improves Your AI.
Handit evaluates every agent decision, auto-generates better prompts and datasets, A/B-tests the fix, and lets you control what goes live.
Here's what we'll accomplish in three phases:
### [Phase 1: AI Observability](#phase-1-ai-observability-5-minutes) ⏱️ 5 minutes Set up comprehensive tracing to see inside your AI agents and understand what they're doingPhase 2: Quality Evaluation ⏱️ 10 minutes
Add automated evaluation to continuously assess performance across multiple quality dimensions
Phase 3: Self-Improving AI ⏱️ 15 minutes
Enable automatic optimization that generates better prompts, tests them, and provides proven improvements
**The Result**: Complete visibility into performance with automated optimization recommendations based on real production data.Before we start, make sure you have:
- A Handit.ai Account (sign up if needed)
- 15-30 minutes to complete the setup
Let's add comprehensive tracing to see exactly what your AI is doing.
<Tabs items={["Python", "JavaScript"]} defaultIndex="0"> <Tabs.Tab>
pip install handit_ai</Tabs.Tab> <Tabs.Tab>
npm i @handit.ai/handit-ai</Tabs.Tab>
- Log into your Handit.ai Dashboard
- Go to Settings → Integrations
- Copy your integration token
<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}
Now, let's add tracing to your main agent function using our simplified approach. You only need to instrument the entry point - no need to trace individual child functions.
<Tabs items={["Python", "JavaScript"]} defaultIndex="0"> <Tabs.Tab>
Simplified Python Approach - Just add the decorator to your entry point:
# Auto-generated by handit-cli setup
from handit_ai import tracing, configure
import os
configure(HANDIT_API_KEY=os.getenv("HANDIT_API_KEY"))
# Tracing added to your main agent function (entry point)
@tracing(agent="customer-service-agent")
async def process_customer_request(user_message: str):
# Your existing agent logic (unchanged)
intent = await classify_intent(user_message) # Not traced individually
context = await search_knowledge(intent) # Not traced individually
response = await generate_response(context) # Not traced individually
return responseFor FastAPI endpoints, put the decorator below the endpoint:
from handit_ai import tracing, configure
import os
from fastapi import FastAPI
configure(HANDIT_API_KEY=os.getenv("HANDIT_API_KEY"))
app = FastAPI()
@app.post("/process")
@tracing(agent="customer-service-agent")
async def process_customer_request(user_message: str):
# Your existing agent logic (unchanged)
intent = await classify_intent(user_message) # Not traced individually
context = await search_knowledge(intent) # Not traced individually
response = await generate_response(context) # Not traced individually
return response</Tabs.Tab> <Tabs.Tab>
Simplified JavaScript Approach - Just wrap your entry point:
// Auto-generated by handit-cli setup
import { configure, startTracing, endTracing } from '@handit.ai/handit-ai';
configure({
HANDIT_API_KEY: process.env.HANDIT_API_KEY
});
// Tracing added to your main agent function (entry point)
export const processCustomerRequest = async (userMessage) => {
startTracing({ agent: "customer-service-agent" });
try {
// Your existing agent logic (unchanged)
const intent = await classifyIntent(userMessage); // Not traced individually
const context = await searchKnowledge(intent); // Not traced individually
const response = await generateResponse(context); // Not traced individually
return response;
} finally {
endTracing();
}
};</Tabs.Tab>
**Simplified Approach:** With this new simplified approach, you only need to add tracing to your entry point function. Handit.ai will automatically trace the entire execution flow from there. **Phase 1 Complete!** 🎉 You now have full observability with automatic tracing of your entire agent execution flow from the entry point.➡️ Want to dive deeper? Check out our detailed Tracing Quickstart for advanced features and best practices.
Now let's add automated evaluation to continuously assess quality across multiple dimensions.
- Go to Settings → Model Tokens
- Add your OpenAI or other model credentials
- These models will act as "judges" to evaluate responses
<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}
Create separate evaluators for each quality aspect. Critical principle: One evaluator = one quality dimension.
- Go to Evaluation → Evaluation Suite
- Click Create New Evaluator
Example Evaluator 1: Response Completeness
You are evaluating whether an AI response completely addresses the user's question.
Focus ONLY on completeness - ignore other quality aspects.
User Question: {input}
AI Response: {output}
Rate on a scale of 1-10:
1-2 = Missing major parts of the question
3-4 = Addresses some parts but incomplete
5-6 = Addresses most parts adequately
7-8 = Addresses all parts well
9-10 = Thoroughly addresses every aspect
Output format:
Score: [1-10]
Reasoning: [Brief explanation]
Example Evaluator 2: Accuracy Check
You are checking if an AI response contains accurate information.
Focus ONLY on factual accuracy - ignore other aspects.
User Question: {input}
AI Response: {output}
Rate on a scale of 1-10:
1-2 = Contains obvious false information
3-4 = Contains questionable claims
5-6 = Mostly accurate with minor concerns
7-8 = Accurate information
9-10 = Completely accurate and verifiable
Output format:
Score: [1-10]
Reasoning: [Brief explanation]
<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}
- Go to Agent Performance
- Select your LLM node (e.g., "response-generator")
- Click on Manage Evaluators on the menu
- Add your evaluators
<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}
View real-time evaluation results in:
- Tracing tab: Individual evaluation scores
- Agent Performance: Quality trends over time
Tracing Dashboard - Individual Evaluation Scores:

Agent Performance Dashboard - Quality Trends:

➡️ Want more sophisticated evaluators? Check out our detailed Evaluation Quickstart for advanced techniques.
Finally, let's enable automatic optimization that generates better prompts and provides proven improvements.
- Go to Settings → Model Tokens
- Select optimization model tokens
- Self-improving AI automatically activates once configured
<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}
- Review Recommendations in Release Hub
- Compare Performance between current and optimized prompts
- Mark as Production for prompts you want to deploy
- Fetch via SDK in your application
<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}
Fetch Optimized Prompts:
<Tabs items={["Python", "JavaScript"]} defaultIndex="0"> <Tabs.Tab>
from handit_ai import HanditClient
# Initialize client
handit = HanditClient(api_key="your-api-key")
# Fetch current production prompt
optimized_prompt = handit.fetch_optimized_prompt(
model_id="response-generator"
)
# Use in your LLM calls
response = your_llm_client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": optimized_prompt},
{"role": "user", "content": user_query}
]
)</Tabs.Tab> <Tabs.Tab>
import { HanditClient } from '@handit.ai/handit-ai';
const handit = new HanditClient({ apiKey: 'your-api-key' });
// Fetch current production prompt
const optimizedPrompt = await handit.fetchOptimizedPrompt({
modelId: 'response-generator'
});
// Use in your LLM calls
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: optimizedPrompt },
{ role: 'user', content: userQuery }
]
});</Tabs.Tab>
**Phase 3 Complete!** 🎉 You now have a self-improving AI that automatically detects quality issues, generates better prompts, tests them in the background, and provides proven improvements.➡️ Want advanced optimization features? Check out our detailed Optimization Quickstart for CI/CD integration and deployment strategies.
Congratulations! You now have a complete AI observability and optimization system:
- Complete visibility into operations
- Real-time monitoring of all LLM calls and tools
- Detailed execution traces with timing and error tracking
- Automated quality assessment across multiple dimensions
- Real-time evaluation scores and trends
- Quality insights to identify improvement opportunities
- Automatic detection of quality issues
- AI-generated prompt optimizations
- Background A/B testing with statistical confidence
- Production-ready improvements delivered via SDK
- Join our Discord community for support
- Check out GitHub Issues for additional help
- Explore Tracing to monitor your AI agents
- Set up Evaluation to grade your AI outputs
- Configure Optimization for continuous improvement
- Tracing Documentation - Monitor AI agent performance
- Evaluation Documentation - Grade AI outputs automatically
- Optimization Documentation - Improve prompts continuously
- Visit our GitHub Issues page
Tracing Not Working?
- Verify your API key is correct and set as environment variable
- Ensure you're using the functions correct
Evaluations Not Running?
- Confirm model tokens are valid and have sufficient credits
- Verify LLM nodes are receiving traffic
- Check evaluation percentages are > 0%
Optimizations Not Generating?
- Ensure evaluation data shows quality issues (scores below threshold)
- Verify optimization model tokens are configured
- Confirm sufficient evaluation data has been collected
Need Help?
- Visit our Support page
- Join our Discord community
- Check individual quickstart guides for detailed troubleshooting