Skip to content

Commit 8f3c0e3

Browse files
author
Dylan Huang
authored
update README (#178)
* update README * update * simplify
1 parent c36c692 commit 8f3c0e3

File tree

1 file changed

+73
-6
lines changed

1 file changed

+73
-6
lines changed

README.md

Lines changed: 73 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,9 @@
66

77
When you have multiple AI models to choose from—different versions, providers, or configurations—how do you know which one is best for your use case?
88

9-
## Quick Example
9+
## Quick Examples
10+
11+
### Basic Model Comparison
1012

1113
Compare models on a simple formatting task:
1214

@@ -21,10 +23,10 @@ from eval_protocol.pytest import default_single_turn_rollout_processor, evaluati
2123
Message(role="user", content="Explain why evaluations matter for AI agents. Make it dramatic!"),
2224
],
2325
],
24-
model=[
25-
"fireworks_ai/accounts/fireworks/models/llama-v3p1-8b-instruct",
26-
"openai/gpt-4",
27-
"anthropic/claude-3-sonnet"
26+
completion_params=[
27+
{"model": "fireworks/accounts/fireworks/models/llama-v3p1-8b-instruct"},
28+
{"model": "openai/gpt-4"},
29+
{"model": "anthropic/claude-3-sonnet"}
2830
],
2931
rollout_processor=default_single_turn_rollout_processor,
3032
mode="pointwise",
@@ -45,21 +47,86 @@ def test_bold_format(row: EvaluationRow) -> EvaluationRow:
4547
return row
4648
```
4749

50+
### Using Datasets
51+
52+
Evaluate models on existing datasets:
53+
54+
```python
55+
from eval_protocol.pytest import evaluation_test
56+
from eval_protocol.adapters.huggingface import create_gsm8k_adapter
57+
58+
@evaluation_test(
59+
input_dataset=["development/gsm8k_sample.jsonl"], # Local JSONL file
60+
dataset_adapter=create_gsm8k_adapter(), # Adapter to convert data
61+
completion_params=[
62+
{"model": "openai/gpt-4"},
63+
{"model": "anthropic/claude-3-sonnet"}
64+
],
65+
mode="pointwise"
66+
)
67+
def test_math_reasoning(row: EvaluationRow) -> EvaluationRow:
68+
# Your evaluation logic here
69+
return row
70+
```
71+
72+
## 🚀 Features
73+
74+
- **Custom Evaluations**: Write evaluations tailored to your specific business needs
75+
- **Auto-Evaluation**: Stack-rank models using LLMs as judges with just model traces
76+
- **Model Context Protocol (MCP) Integration**: Build reinforcement learning environments and trigger user simulations for complex scenarios
77+
- **Consistent Testing**: Test across various models and configurations with a unified framework
78+
- **Resilient Runtime**: Automatic retries for unstable LLM APIs and concurrent execution for long-running evaluations
79+
- **Rich Visualizations**: Built-in pivot tables and visualizations for result analysis
80+
- **Data-Driven Decisions**: Make informed model deployment decisions based on comprehensive evaluation results
81+
4882
## 📚 Resources
4983

5084
- **[Documentation](https://evalprotocol.io)** - Complete guides and API reference
5185
- **[Discord](https://discord.com/channels/1137072072808472616/1400975572405850155)** - Community discussions
86+
- **[GitHub](https://github.com/eval-protocol/python-sdk)** - Source code and examples
5287

5388
## Installation
5489

5590
**This library requires Python >= 3.10.**
5691

92+
### Basic Installation
93+
5794
Install with pip:
5895

59-
```
96+
```bash
6097
pip install eval-protocol
6198
```
6299

100+
### Recommended Installation with uv
101+
102+
For better dependency management and faster installs, we recommend using [uv](https://docs.astral.sh/uv/):
103+
104+
```bash
105+
# Install uv if you haven't already
106+
curl -LsSf https://astral.sh/uv/install.sh | sh
107+
108+
# Install eval-protocol
109+
uv add eval-protocol
110+
```
111+
112+
### Optional Dependencies
113+
114+
Install with additional features:
115+
116+
```bash
117+
# For Langfuse integration
118+
pip install 'eval-protocol[langfuse]'
119+
120+
# For HuggingFace datasets
121+
pip install 'eval-protocol[huggingface]'
122+
123+
# For all adapters
124+
pip install 'eval-protocol[adapters]'
125+
126+
# For development
127+
pip install 'eval-protocol[dev]'
128+
```
129+
63130
## License
64131

65132
[MIT](LICENSE)

0 commit comments

Comments
 (0)