Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Evaluation
This is a work-in-progress tool for evaluating Semantic Workbench Assistants for quality.
# Data Generation

This is a tool for generating data for testing Semantic Workbench assistants.

## Automation and Data Generation
There is currently one part to this which is automation to populate a Workbench conversation automatically without human intervention.
The core functionality of this library is an automation to populate a Workbench conversation automatically without human intervention.
This is implemented using a specialized version of the guided conversation engine (GCE).
The GCE here focuses on the agenda and using an exact resource constraint to force the GCE to have a long running conversation.

Expand All @@ -12,7 +11,7 @@ There is also a quick `generate_scenario.py` script that can be used to generate
### Setup

1. Run the workbench service running locally (at http://127.0.0.1:3000), an assistant service, and create the assistant you want to test.
2. Have LLM provider configured. Check [pydantic_ai_utils.py](./assistant_evaluations/pydantic_ai_utils.py) for an example of how it is configured for Pydantic AI.
2. Have LLM provider configured. Check [pydantic_ai_utils.py](./assistant_data_gen/pydantic_ai_utils.py) for an example of how it is configured for Pydantic AI.
1. For example, create a `.env` file with your Azure OpenAI endpoint set as `ASSISTANT__AZURE_OPENAI_ENDPOINT=<your_endpoint>`
3. Create a configuration file. See [document_assistant_example_config.yaml](./configs/document_assistant_example_config.yaml) for an example.
1. The scenarios field is a list that allows you to specify multiple test scenarios (different conversation paths).
Expand All @@ -33,4 +32,3 @@ python scripts/generate_scenario.py --config path/to/custom_config.yaml

### Recommendations
1. Be as specific as possible with your conversation flows. Generic conversation flows and/or resource constraints that are too high can lead to the agents getting stuck in a thank you loop.

Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import yaml
from pydantic import BaseModel, Field

from assistant_evaluations.gce.gce_agent import ResourceConstraintMode
from assistant_data_gen.gce.gce_agent import ResourceConstraintMode


class ScenarioConfig(BaseModel):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai.tools import ToolDefinition

from assistant_evaluations.gce.prompts import (
from assistant_data_gen.gce.prompts import (
AGENDA_SYSTEM_PROMPT,
CONVERSATION_SYSTEM_PROMPT,
FIRST_USER_MESSAGE,
Expand All @@ -49,7 +49,7 @@
TERMINATION_INSTRUCTIONS_EXACT,
TERMINATION_INSTRUCTIONS_MAXIMUM,
)
from assistant_evaluations.pydantic_ai_utils import create_model
from assistant_data_gen.pydantic_ai_utils import create_model


class ResourceConstraintMode(Enum):
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
general:
assistant_name: "Document Assistant 6-20 v1"
assistant_name: "Document Assistant 7-7 v1"
conversation_title: "GCE - Auto-Generated Conversation"
assistant_details: >-
The Document assistant you are talking with help you with things like web search and writing documents.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[project]
name = "assistant-evaluations"
name = "assistant_data_gen"
version = "0.1.0"
description = "Assistant evaluations"
description = "Assistant Data Generation"
authors = [{ name = "Semantic Workbench Team" }]
readme = "README.md"
requires-python = ">=3.11"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
import time
from pathlib import Path

from assistant_evaluations.assistant_api import (
from assistant_data_gen.assistant_api import (
create_test_jwt_token,
get_all_messages,
get_assistant,
get_user_from_workbench_db,
poll_assistant_status,
)
from assistant_evaluations.config import EvaluationConfig
from assistant_evaluations.gce.gce_agent import (
from assistant_data_gen.config import EvaluationConfig
from assistant_data_gen.gce.gce_agent import (
Agenda,
GuidedConversationInput,
GuidedConversationState,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@
import asyncio
from pathlib import Path

from assistant_evaluations.config import EvaluationConfig
from assistant_evaluations.pydantic_ai_utils import create_model
from assistant_data_gen.config import EvaluationConfig
from assistant_data_gen.pydantic_ai_utils import create_model
from dotenv import load_dotenv
from liquid import render
from pydantic import BaseModel, Field
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.