This project enables you to run Braintrust evals as part of your CI/CD workflow in Github, using Github actions. To use this action, simply include the following step in an action file:
- name: Run Evals
uses: braintrustdata/eval-action@v1
with:
api_key: ${{ secrets.BRAINTRUST_API_KEY }}
runtime: nodeYou can configure the following variables:
api_key: Your Braintrust API key.root: The root directory containing your evals (defaults to'.'). The root directory must either havenodeorpythonconfigured.paths: Specific paths, relative to the root, containing evals you'd like to run.runtime: Eithernodeorpythonpackage_manager: Eithernpm,pnpm, oryarnfor anoderuntime, orpiporuvfor apythonruntime.use_proxy: Eithertrueorfalse. If set,OPENAI_BASE_URLwill be set tohttps://braintrustproxy.com/v1, which will automatically cache repetitive LLM calls and run your evals faster. Defaults totrue.terminate_on_failure: Eithertrueorfalse. If set totrue, the evaluation process will stop when an error occurs. Defaults tofalse.
name: Run pnpm evals
on:
push:
# Uncomment to run only when files in the 'evals' directory change
# - paths:
# - "evals/**"
permissions:
pull-requests: write
contents: read
jobs:
eval:
name: Run evals
runs-on: ubuntu-latest
steps:
- name: Checkout
id: checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Node.js
id: setup-node
uses: actions/setup-node@v4
with:
node-version: 20
- uses: pnpm/action-setup@v3
with:
version: 8
- name: Install Dependencies
id: install
run: pnpm install
- name: Run Evals
uses: braintrustdata/eval-action@v1
with:
api_key: ${{ secrets.BRAINTRUST_API_KEY }}
runtime: node
root: my_eval_dir[!IMPORTANT] You must specify
permissionsfor the action to leave comments on your PR. Without these permissions, you'll see Github API errors.
To see examples of fully configured templates, see the examples directory:
The action runs braintrust eval and collects experiment results, which are
posted as a comment in the PR alongside a link to Braintrust. For example:
| Score | Average | Improvements | Regressions |
|---|---|---|---|
| Levenshtein | 0.83 (+3pp) | 8 🟢 | 4 🔴 |
| Duration | 1s (0s) | 16 🟢 | 1 🔴 |