Adapter Examples

This directory contains examples demonstrating how to use the various data source adapters available in the Eval Protocol system.

Available Adapters

1. Langfuse Adapter (`langfuse_example.py`)

Connects to Langfuse deployments to pull conversation traces and convert them to EvaluationRow format.

Features:

Pull chat conversations and tool calling traces
Filter by time ranges, tags, users, and sessions
Preserve metadata from Langfuse traces
Support for both cloud and self-hosted Langfuse instances

Prerequisites:

pip install 'eval-protocol[langfuse]'

Environment Variables:

export LANGFUSE_PUBLIC_KEY="your_public_key"
export LANGFUSE_SECRET_KEY="your_secret_key"
export LANGFUSE_HOST="https://your-langfuse-instance.com"  # optional
export LANGFUSE_PROJECT_ID="your_project_id"  # optional

2. HuggingFace Adapter (`huggingface_example.py`)

Loads datasets from HuggingFace Hub and converts them to EvaluationRow format.

Features:

Generic adapter with arbitrary transformation functions
Built-in convenience functions for popular datasets (GSM8K, MATH, etc.)
Support for local dataset files (JSON, JSONL, CSV)
Full control over data transformation and formatting
Support for dataset revisions/commits

Prerequisites:

pip install 'eval-protocol[huggingface]'

Tracing provider IO references

Provider-specific IO references (input logging + output pulling) have moved under:

examples/tracing/<provider>/

For Weave, see examples/tracing/weave/ which contains a focused converter.py illustrating how to map provider payloads to EP messages and metadata.

These examples are designed to be self-contained and usable as references for building or validating provider adapters.

Running the Examples

Basic Usage

# Run Langfuse example
python examples/adapters/langfuse_example.py

# Run HuggingFace example
python examples/adapters/huggingface_example.py

# Run GSM8K replacement example
python examples/adapters/gsm8k_replacement_example.py

With Environment Setup

# Set up Langfuse credentials
export LANGFUSE_PUBLIC_KEY="pk_..."
export LANGFUSE_SECRET_KEY="sk_..."
python examples/adapters/langfuse_example.py

# HuggingFace works without credentials for public datasets
python examples/adapters/huggingface_example.py

Integration Patterns

1. Replace Static Dataset Files

Instead of using static JSONL files, use adapters to pull fresh data:

# Old approach
input_dataset=["development/gsm8k_sample.jsonl"]

# New approach with HuggingFace adapter
from eval_protocol.adapters.huggingface import create_gsm8k_adapter

adapter = create_gsm8k_adapter()
evaluation_rows = list(adapter.get_evaluation_rows(split="test", limit=100))

# Or for complete control:
def custom_gsm8k_transform(row):
    return {
        'messages': [
            {'role': 'system', 'content': 'Your custom prompt'},
            {'role': 'user', 'content': row['question']}
        ],
        'ground_truth': row['answer'],
        'metadata': {'custom_field': 'value'}
    }

from eval_protocol.adapters.huggingface import create_huggingface_adapter
custom_adapter = create_huggingface_adapter(
    dataset_id="gsm8k",
    config_name="main",
    transform_fn=custom_gsm8k_transform
)

2. Real-time Data from Production Systems

Pull recent conversations from your production systems:

from eval_protocol.adapters.langfuse import create_langfuse_adapter
from datetime import datetime, timedelta

adapter = create_langfuse_adapter(...)
recent_rows = list(adapter.get_evaluation_rows(
    from_timestamp=datetime.now() - timedelta(hours=24),
    tags=["production"],
))

3. Batch Processing Large Datasets

Process datasets in manageable batches:

batch_size = 100
for offset in range(0, 1000, batch_size):
    batch = list(adapter.get_evaluation_rows(
        limit=batch_size,
        offset=offset,
    ))
    # Process batch...

Common Use Cases

Evaluation Pipeline Integration

from eval_protocol.adapters.huggingface import create_gsm8k_adapter
from eval_protocol.rewards.math import math_reward

# Load dataset
adapter = create_gsm8k_adapter()
rows = list(adapter.get_evaluation_rows(limit=10))

# Run evaluation
for row in rows:
    # Add model response (you would generate this)
    row.messages.append(Message(role="assistant", content="..."))

    # Evaluate
    result = math_reward(messages=row.messages, ground_truth=row.ground_truth)
    print(f"Score: {result.score}")

Training Data Preparation

# Convert to training format
training_data = []
for row in adapter.get_evaluation_rows():
    training_data.append({
        "messages": [{"role": msg.role, "content": msg.content} for msg in row.messages],
        "ground_truth": row.ground_truth,
    })

A/B Testing with Live Data

# Compare models on recent production data
langfuse_adapter = create_langfuse_adapter(...)
recent_conversations = list(langfuse_adapter.get_evaluation_rows(
    from_timestamp=datetime.now() - timedelta(days=1),
    limit=100,
))

# Test both models on the same data
for row in recent_conversations:
    # Test model A and B, compare results...

Custom Adapter Development

Option 1: Use Generic HuggingFace Adapter (Recommended)

For datasets on HuggingFace Hub, use the generic adapter with a transform function:

from eval_protocol.adapters.huggingface import create_huggingface_adapter

def my_transform(row):
    return {
        'messages': [
            {'role': 'system', 'content': 'Your system prompt'},
            {'role': 'user', 'content': row['input_field']},
        ],
        'ground_truth': row['output_field'],
        'metadata': {'custom': row.get('metadata_field')}
    }

adapter = create_huggingface_adapter(
    dataset_id="your-dataset-id",
    transform_fn=my_transform,
    revision="main"  # optional
)

Option 2: Create Custom Adapter Class

See eval_protocol/adapters/CONTRIBUTING.md for detailed instructions on creating full custom adapters.

Quick template:

from eval_protocol.models import EvaluationRow, Message, InputMetadata

class MyCustomAdapter:
    def __init__(self, **config):
        # Initialize your data source connection
        pass

    def get_evaluation_rows(self, **kwargs) -> Iterator[EvaluationRow]:
        # Fetch data and convert to EvaluationRow format
        pass

Troubleshooting

Common Issues

Import Errors: Make sure you have the right optional dependencies installed

pip install 'eval-protocol[langfuse]'  # for Langfuse
pip install 'eval-protocol[huggingface]'  # for HuggingFace
pip install 'eval-protocol[adapters]'  # for all adapters

Authentication Errors: Check your environment variables and API keys
Network Errors: Verify connectivity to external services
Data Format Issues: Check that your data source has the expected fields

Debug Mode

Enable debug logging to see what's happening:

import logging
logging.basicConfig(level=logging.DEBUG)

# Your adapter code here...

Getting Help

Check the main CONTRIBUTING.md for project setup
Review adapter-specific documentation in eval_protocol/adapters/CONTRIBUTING.md
Open an issue on GitHub for bugs or feature requests
Join the community discussions for questions

Contributing

We welcome contributions of new adapters! Popular integrations that would be valuable:

Database adapters: PostgreSQL, MongoDB, etc.
API adapters: OpenAI Evals, Anthropic datasets, etc.
File format adapters: Parquet, Excel, etc.
Monitoring platform adapters: DataDog, New Relic, etc.

See the adapter contributing guide for detailed instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapter Examples

Available Adapters

1. Langfuse Adapter (`langfuse_example.py`)

2. HuggingFace Adapter (`huggingface_example.py`)

Tracing provider IO references

Running the Examples

Basic Usage

With Environment Setup

Integration Patterns

1. Replace Static Dataset Files

2. Real-time Data from Production Systems

3. Batch Processing Large Datasets

Common Use Cases

Evaluation Pipeline Integration

Training Data Preparation

A/B Testing with Live Data

Custom Adapter Development

Option 1: Use Generic HuggingFace Adapter (Recommended)

Option 2: Create Custom Adapter Class

Troubleshooting

Common Issues

Debug Mode

Getting Help

Contributing

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Adapter Examples

Available Adapters

1. Langfuse Adapter (langfuse_example.py)

2. HuggingFace Adapter (huggingface_example.py)

Tracing provider IO references

Running the Examples

Basic Usage

With Environment Setup

Integration Patterns

1. Replace Static Dataset Files

2. Real-time Data from Production Systems

3. Batch Processing Large Datasets

Common Use Cases

Evaluation Pipeline Integration

Training Data Preparation

A/B Testing with Live Data

Custom Adapter Development

Option 1: Use Generic HuggingFace Adapter (Recommended)

Option 2: Create Custom Adapter Class

Troubleshooting

Common Issues

Debug Mode

Getting Help

Contributing

1. Langfuse Adapter (`langfuse_example.py`)

2. HuggingFace Adapter (`huggingface_example.py`)