Scripts to create a Dbizi dataset and evaluate an assistant #227
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Added 2 scripts in scripts/langsmith:
-dbizi_dataset.py (creates a dataset in LangSmith with 15 questions about dbizi)
-evaluate_assistant.py (an llm-as-a-judge evaluates a given lamb assistant with the given dataset in LangSmith)
To try them, first configure these LangSmith env variables in backend/.env:
-LANGCHAIN_TRACING_V2=true
-LANGCHAIN_API_KEY=YOUR_LANGCHAIN_API_KEY
-LANGCHAIN_PROJECT=lamb-assistants
-LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
And these 2 variables in evaluate_assistant.py:
-JWT_TOKEN = "your_jwt_token_here"
-ASSISTANT_ID = 1 (this should be the id of the dbizi assistant in lamb)
Then, run the scripts:
-first, dbizi_dataset.py
-then, evaluate_assistant.py