Skip to content

Latest commit

 

History

History
449 lines (345 loc) · 13.8 KB

File metadata and controls

449 lines (345 loc) · 13.8 KB
title description sidebarTitle
Quickstart
Get started with Handit.ai's complete AI observability and optimization platform in under 30 minutes.
Quickstart

import { Callout } from "nextra/components"; import { Steps } from "nextra/components"; import { Tabs } from "nextra/components";

Complete Handit.ai Quickstart

The Open Source Engine that Auto-Improves Your AI.
Handit evaluates every agent decision, auto-generates better prompts and datasets, A/B-tests the fix, and lets you control what goes live.

**What you'll build:** A fully observable, continuously evaluated, and automatically optimizing AI system that improves itself based on real production data.

Overview: The Complete Journey

Here's what we'll accomplish in three phases:

### [Phase 1: AI Observability](#phase-1-ai-observability-5-minutes) ⏱️ 5 minutes Set up comprehensive tracing to see inside your AI agents and understand what they're doing

Phase 2: Quality Evaluation ⏱️ 10 minutes

Add automated evaluation to continuously assess performance across multiple quality dimensions

Phase 3: Self-Improving AI ⏱️ 15 minutes

Enable automatic optimization that generates better prompts, tests them, and provides proven improvements

**The Result**: Complete visibility into performance with automated optimization recommendations based on real production data.

Prerequisites

Before we start, make sure you have:

Phase 1: AI Observability (5 minutes)

Let's add comprehensive tracing to see exactly what your AI is doing.

Step 1: Install the SDK

<Tabs items={["Python", "JavaScript"]} defaultIndex="0"> <Tabs.Tab>

pip install handit_ai

</Tabs.Tab> <Tabs.Tab>

npm i @handit.ai/handit-ai

</Tabs.Tab>

Step 2: Get Your Integration Token

  1. Log into your Handit.ai Dashboard
  2. Go to SettingsIntegrations
  3. Copy your integration token

<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}

Your browser does not support the video tag.

Step 3: Add Simplified Tracing

Now, let's add tracing to your main agent function using our simplified approach. You only need to instrument the entry point - no need to trace individual child functions.

<Tabs items={["Python", "JavaScript"]} defaultIndex="0"> <Tabs.Tab>

Simplified Python Approach - Just add the decorator to your entry point:

# Auto-generated by handit-cli setup
from handit_ai import tracing, configure
import os

configure(HANDIT_API_KEY=os.getenv("HANDIT_API_KEY"))

# Tracing added to your main agent function (entry point)
@tracing(agent="customer-service-agent")
async def process_customer_request(user_message: str):
    # Your existing agent logic (unchanged)
    intent = await classify_intent(user_message)      # Not traced individually
    context = await search_knowledge(intent)          # Not traced individually  
    response = await generate_response(context)       # Not traced individually
    return response

For FastAPI endpoints, put the decorator below the endpoint:

from handit_ai import tracing, configure
import os
from fastapi import FastAPI

configure(HANDIT_API_KEY=os.getenv("HANDIT_API_KEY"))

app = FastAPI()

@app.post("/process")
@tracing(agent="customer-service-agent")
async def process_customer_request(user_message: str):
    # Your existing agent logic (unchanged)
    intent = await classify_intent(user_message)      # Not traced individually
    context = await search_knowledge(intent)          # Not traced individually  
    response = await generate_response(context)       # Not traced individually
    return response

</Tabs.Tab> <Tabs.Tab>

Simplified JavaScript Approach - Just wrap your entry point:

// Auto-generated by handit-cli setup
import { configure, startTracing, endTracing } from '@handit.ai/handit-ai';

configure({
  HANDIT_API_KEY: process.env.HANDIT_API_KEY
});

// Tracing added to your main agent function (entry point)
export const processCustomerRequest = async (userMessage) => {
  startTracing({ agent: "customer-service-agent" });
  try {
    // Your existing agent logic (unchanged)
    const intent = await classifyIntent(userMessage);     // Not traced individually
    const context = await searchKnowledge(intent);       // Not traced individually
    const response = await generateResponse(context);     // Not traced individually
    return response;
  } finally {
    endTracing();
  }
};

</Tabs.Tab>

**Simplified Approach:** With this new simplified approach, you only need to add tracing to your entry point function. Handit.ai will automatically trace the entire execution flow from there. **Phase 1 Complete!** 🎉 You now have full observability with automatic tracing of your entire agent execution flow from the entry point.

➡️ Want to dive deeper? Check out our detailed Tracing Quickstart for advanced features and best practices.

Phase 2: Quality Evaluation (10 minutes)

Now let's add automated evaluation to continuously assess quality across multiple dimensions.

Step 1: Connect Evaluation Models

  1. Go to SettingsModel Tokens
  2. Add your OpenAI or other model credentials
  3. These models will act as "judges" to evaluate responses

<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}

Your browser does not support the video tag.

Step 2: Create Focused Evaluators

Create separate evaluators for each quality aspect. Critical principle: One evaluator = one quality dimension.

  1. Go to EvaluationEvaluation Suite
  2. Click Create New Evaluator

Example Evaluator 1: Response Completeness

You are evaluating whether an AI response completely addresses the user's question.

Focus ONLY on completeness - ignore other quality aspects.

User Question: {input}
AI Response: {output}

Rate on a scale of 1-10:
1-2 = Missing major parts of the question
3-4 = Addresses some parts but incomplete
5-6 = Addresses most parts adequately  
7-8 = Addresses all parts well
9-10 = Thoroughly addresses every aspect

Output format:
Score: [1-10]
Reasoning: [Brief explanation]

Example Evaluator 2: Accuracy Check

You are checking if an AI response contains accurate information.

Focus ONLY on factual accuracy - ignore other aspects.

User Question: {input}
AI Response: {output}

Rate on a scale of 1-10:
1-2 = Contains obvious false information
3-4 = Contains questionable claims
5-6 = Mostly accurate with minor concerns
7-8 = Accurate information
9-10 = Completely accurate and verifiable

Output format:
Score: [1-10]
Reasoning: [Brief explanation]

<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}

Your browser does not support the video tag.

Step 3: Associate Evaluators to Your LLM Nodes

  1. Go to Agent Performance
  2. Select your LLM node (e.g., "response-generator")
  3. Click on Manage Evaluators on the menu
  4. Add your evaluators

<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}

Your browser does not support the video tag. ### Step 4: Monitor Results

View real-time evaluation results in:

  • Tracing tab: Individual evaluation scores
  • Agent Performance: Quality trends over time

Tracing Dashboard - Individual Evaluation Scores: AI Agent Tracing Dashboard

Agent Performance Dashboard - Quality Trends: Agent Performance Dashboard

**Phase 2 Complete!** 🎉 Continuous evaluation is now running across multiple quality dimensions with real-time insights into performance trends.

➡️ Want more sophisticated evaluators? Check out our detailed Evaluation Quickstart for advanced techniques.

Phase 3: Self-Improving AI (15 minutes)

Finally, let's enable automatic optimization that generates better prompts and provides proven improvements.

Step 1: Connect Optimization Models

  1. Go to SettingsModel Tokens
  2. Select optimization model tokens
  3. Self-improving AI automatically activates once configured

<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}

Your browser does not support the video tag. **Automatic Activation**: Once optimization tokens are configured, the system automatically begins analyzing evaluation data and generating optimizations. No additional setup required!

Step 2: Deploy Optimizations

  1. Review Recommendations in Release Hub
  2. Compare Performance between current and optimized prompts
  3. Mark as Production for prompts you want to deploy
  4. Fetch via SDK in your application

<video width="100%" autoPlay loop muted playsInline style={{ borderRadius: '8px', border: '1px solid #e5e7eb' }}

Your browser does not support the video tag.

Fetch Optimized Prompts:

<Tabs items={["Python", "JavaScript"]} defaultIndex="0"> <Tabs.Tab>

from handit_ai import HanditClient

# Initialize client
handit = HanditClient(api_key="your-api-key")

# Fetch current production prompt
optimized_prompt = handit.fetch_optimized_prompt(
    model_id="response-generator"
)

# Use in your LLM calls
response = your_llm_client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": optimized_prompt},
        {"role": "user", "content": user_query}
    ]
)

</Tabs.Tab> <Tabs.Tab>

import { HanditClient } from '@handit.ai/handit-ai';

const handit = new HanditClient({ apiKey: 'your-api-key' });

// Fetch current production prompt
const optimizedPrompt = await handit.fetchOptimizedPrompt({ 
  modelId: 'response-generator' 
});

// Use in your LLM calls
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: optimizedPrompt },
    { role: 'user', content: userQuery }
  ]
});

</Tabs.Tab>

**Phase 3 Complete!** 🎉 You now have a self-improving AI that automatically detects quality issues, generates better prompts, tests them in the background, and provides proven improvements.

➡️ Want advanced optimization features? Check out our detailed Optimization Quickstart for CI/CD integration and deployment strategies.

What You've Accomplished

Congratulations! You now have a complete AI observability and optimization system:

✅ Full Observability

  • Complete visibility into operations
  • Real-time monitoring of all LLM calls and tools
  • Detailed execution traces with timing and error tracking

✅ Continuous Evaluation

  • Automated quality assessment across multiple dimensions
  • Real-time evaluation scores and trends
  • Quality insights to identify improvement opportunities

✅ Self-Improving AI

  • Automatic detection of quality issues
  • AI-generated prompt optimizations
  • Background A/B testing with statistical confidence
  • Production-ready improvements delivered via SDK

Next Steps

Resources

**Ready to transform your AI?** Visit [beta.handit.ai](https://beta.handit.ai) to get started with the complete Handit.ai platform today.

Troubleshooting

Tracing Not Working?

  • Verify your API key is correct and set as environment variable
  • Ensure you're using the functions correct

Evaluations Not Running?

  • Confirm model tokens are valid and have sufficient credits
  • Verify LLM nodes are receiving traffic
  • Check evaluation percentages are > 0%

Optimizations Not Generating?

  • Ensure evaluation data shows quality issues (scores below threshold)
  • Verify optimization model tokens are configured
  • Confirm sufficient evaluation data has been collected

Need Help?