|
| 1 | +# Remote Eval Parameters: Contracts |
| 2 | + |
| 3 | +## SDK |
| 4 | + |
| 5 | +### Evaluator |
| 6 | + |
| 7 | +#### `parameters` |
| 8 | + |
| 9 | +A map from parameter name to parameter spec, declared in the evaluator definition. This is the source of truth for what parameters exist and what their defaults are. |
| 10 | + |
| 11 | +```pseudocode |
| 12 | +evaluator.parameters = { |
| 13 | + "model": { type: "model", default: "gpt-4", description: "Model to use" }, |
| 14 | + "temperature": { type: "data", default: 0.7, description: "Sampling temperature" }, |
| 15 | + "max_length": { type: "data", default: 100, description: "Max output length" } |
| 16 | +} |
| 17 | +``` |
| 18 | + |
| 19 | +Each parameter spec: |
| 20 | + |
| 21 | +| Field | Type | Required | Description | |
| 22 | +|-------|------|----------|-------------| |
| 23 | +| `default` | `any` | No | Value used when the `POST /eval` request does not include this parameter | |
| 24 | +| `description` | `string` | No | Human-readable description shown in the Playground UI | |
| 25 | +| `type` | `string` | No | Type hint — `"data"` (default), `"model"`, or `"prompt"`. See `parameter` entry under `GET /list` Response Format. | |
| 26 | + |
| 27 | +#### `task` |
| 28 | + |
| 29 | +A callable that optionally declares a `parameters` argument. When declared, it receives the merged parameter map (request values overlaid on evaluator defaults) as a plain string-keyed object. |
| 30 | + |
| 31 | +Tasks that do not declare `parameters` must continue to work unchanged — the SDK must not pass `parameters` to functions that don't accept it. |
| 32 | + |
| 33 | +**Side effect**: the merged `parameters` map is passed to the task function on every test case invocation during a `POST /eval` run. |
| 34 | + |
| 35 | +#### `scorers` |
| 36 | + |
| 37 | +Local scorer functions follow the same contract as `task` with respect to parameters — they optionally declare `parameters` and receive the same merged map if they do. The SDK must not pass `parameters` to scorers that don't declare it. |
| 38 | + |
| 39 | +Remote scorers (sent by the Playground in the `POST /eval` request) also receive the merged parameters via the SDK's remote scorer invocation mechanism. |
| 40 | + |
| 41 | +**Side effect**: the merged `parameters` map is passed to every scorer function (local and remote) on every test case invocation during a `POST /eval` run. |
| 42 | + |
| 43 | +### Dev Server |
| 44 | + |
| 45 | +#### `GET /list` |
| 46 | + |
| 47 | +##### Request Format |
| 48 | + |
| 49 | +No body. Accepts both `GET` and `POST`. |
| 50 | + |
| 51 | +``` |
| 52 | +GET /list |
| 53 | +Authorization: Bearer <token> |
| 54 | +X-Bt-Org-Name: <org> |
| 55 | +``` |
| 56 | + |
| 57 | +##### Response Format |
| 58 | + |
| 59 | +``` |
| 60 | +HTTP 200 OK |
| 61 | +Content-Type: application/json |
| 62 | +``` |
| 63 | + |
| 64 | +Body: a JSON object keyed by evaluator name. For each evaluator, the `parameters` field contains a `parameters` object serialized from the evaluator's `parameters` definition, or `null` if the evaluator defines no parameters. |
| 65 | + |
| 66 | +```json |
| 67 | +{ |
| 68 | + "food-classifier": { |
| 69 | + "scores": [{ "name": "exact_match" }], |
| 70 | + "parameters": { |
| 71 | + "type": "braintrust.staticParameters", |
| 72 | + "schema": { |
| 73 | + "model": { |
| 74 | + "type": "data", |
| 75 | + "schema": { "type": "string" }, |
| 76 | + "default": "gpt-4", |
| 77 | + "description": "Model to use" |
| 78 | + }, |
| 79 | + "temperature": { |
| 80 | + "type": "data", |
| 81 | + "schema": { "type": "number" }, |
| 82 | + "default": 0.7, |
| 83 | + "description": "Sampling temperature" |
| 84 | + } |
| 85 | + }, |
| 86 | + "source": null |
| 87 | + } |
| 88 | + }, |
| 89 | + "text-summarizer": { |
| 90 | + "scores": [], |
| 91 | + "parameters": null |
| 92 | + } |
| 93 | +} |
| 94 | +``` |
| 95 | + |
| 96 | +**`parameters` object:** |
| 97 | + |
| 98 | +| Field | Type | Description | |
| 99 | +|-------|------|-------------| |
| 100 | +| `type` | `string` | Always `"braintrust.staticParameters"` for inline (code-defined) parameters | |
| 101 | +| `schema` | `Record<string, parameter>` | Map of parameter name to definition | |
| 102 | +| `source` | `null` | Always `null` for static parameters. Non-null values reference remotely-stored parameter definitions — out of scope for baseline. | |
| 103 | + |
| 104 | +When the evaluator defines no parameters, set `"parameters": null` or omit the field. |
| 105 | + |
| 106 | +> **Note for existing SDK implementors**: Prior to the introduction of the container format, some SDKs returned the `schema` map directly (i.e. `Record<string, parameter>`) rather than wrapping it in a `parameters` object with `type` and `source` fields. The container was introduced to distinguish static (inline) parameters from dynamic (remotely-stored) ones. If updating an existing SDK, check whether it predates this format and update accordingly. |
| 107 | +
|
| 108 | +**`parameter` entry** (each value in `schema`): |
| 109 | + |
| 110 | +| Field | Type | Required | Description | |
| 111 | +|-------|------|----------|-------------| |
| 112 | +| `type` | `string` | Yes | `"data"` for generic values; `"model"` for a model picker; `"prompt"` for a prompt editor. For a baseline implementation, `"data"` is sufficient. | |
| 113 | +| `schema` | `object` | No | JSON Schema fragment describing the value shape. Set `type` to `"string"`, `"number"`, `"boolean"`, `"object"`, or `"array"` to match the parameter's value type. Used by the Playground to render appropriate input controls. Omit if the type is unknown or mixed. | |
| 114 | +| `default` | `any` | No | Default value. Should match the type described by `schema`. | |
| 115 | +| `description` | `string` | No | Human-readable description shown in the Playground UI. | |
| 116 | + |
| 117 | +**Serialization**: each entry in `evaluator.parameters` maps to a `parameter` entry in the `schema` object. The parameter name becomes the key; the spec fields (`default`, `description`, `type`) are preserved as-is. |
| 118 | + |
| 119 | +##### Error Responses |
| 120 | + |
| 121 | +| Status | Condition | |
| 122 | +|--------|-----------| |
| 123 | +| `401 Unauthorized` | Missing or invalid auth token | |
| 124 | + |
| 125 | +#### `POST /eval` |
| 126 | + |
| 127 | +##### Request Format |
| 128 | + |
| 129 | +``` |
| 130 | +POST /eval |
| 131 | +Content-Type: application/json |
| 132 | +Authorization: Bearer <token> |
| 133 | +X-Bt-Org-Name: <org> |
| 134 | +``` |
| 135 | + |
| 136 | +The `parameters` field in the request body carries the user's chosen values from the Playground UI: |
| 137 | + |
| 138 | +```json |
| 139 | +{ |
| 140 | + "name": "food-classifier", |
| 141 | + "data": { ... }, |
| 142 | + "parameters": { |
| 143 | + "model": "gpt-4o", |
| 144 | + "temperature": 0.9 |
| 145 | + } |
| 146 | +} |
| 147 | +``` |
| 148 | + |
| 149 | +| Field | Type | Required | Description | |
| 150 | +|-------|------|----------|-------------| |
| 151 | +| `parameters` | `Record<string, unknown>` | No | Parameter values chosen by the user. Keys match the evaluator's parameter names. Absent, `null`, and `{}` all mean no overrides were provided. | |
| 152 | + |
| 153 | +See the [Dev Server specification](../server/specification.md) for the full `POST /eval` request schema (all fields beyond `parameters`). |
| 154 | + |
| 155 | +##### Response Format |
| 156 | + |
| 157 | +An SSE stream. The `parameters` field has no effect on the response format — progress, summary, and done events are the same structure as without parameters. |
| 158 | + |
| 159 | +See the [Dev Server specification](../server/specification.md) for the full SSE event schema. |
| 160 | + |
| 161 | +**Side effect**: the merged parameters (request values overlaid on evaluator defaults) are forwarded to the task and all scorers on every test case invocation. Output values in the SSE stream reflect whatever the task produced using those parameters. |
| 162 | + |
| 163 | +##### Error Responses |
| 164 | + |
| 165 | +| Status | Condition | |
| 166 | +|--------|-----------| |
| 167 | +| `400 Bad Request` | `parameters` field is present but not a JSON object | |
| 168 | +| `401 Unauthorized` | Missing or invalid auth token | |
| 169 | +| `404 Not Found` | No evaluator registered with the given `name` | |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## References |
| 174 | + |
| 175 | +- [Braintrust: Remote evals guide](https://www.braintrust.dev/docs/evaluate/remote-evals) |
| 176 | +- [Dev Server specification](../server/specification.md) — full `POST /eval` and `GET /list` schemas |
0 commit comments