You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Weave] Reference dataset when using eval logger (#2288)
## Description
Resolves DOCS-1234. Adds documentation demonstrating how to reference an
already published dataset to keep users from resubmitting their data
every time.
---------
Co-authored-by: Matt Linville <matt@linville.me>
Copy file name to clipboardExpand all lines: weave/guides/evaluation/evaluation_logger.mdx
+55Lines changed: 55 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -319,6 +319,61 @@ While TypeScript doesn't have automatic cleanup with context managers, `logSumma
319
319
320
320
321
321
322
+
### Link to an existing dataset
323
+
324
+
When you pass raw datasets as `inputs` to `log_prediction`, Weave re-ingests the data with every evaluation run. This stores duplicate data, which may waste space if the dataset is large or if a large number of evaluations reuse it.
325
+
326
+
To avoid this duplication, publish your dataset to Weave before running any evaluations, then pass the published dataset's rows as `inputs`. Weave resolves references to published rows using internal references instead of re-ingesting the data. This technique gives you the same linked experience as the standard [Evaluation framework](../core-types/evaluations), where each prediction links back to a specific dataset row in the Weave UI.
327
+
328
+
The following example publishes a dataset and links to it in the `EvaluationLogger`, before retrieving and iterating over it like any other dataset.
329
+
330
+
<Tabs>
331
+
<Tabtitle="Python">
332
+
```python
333
+
import weave
334
+
from weave import EvaluationLogger
335
+
336
+
weave.init("your-team-name/your-project-name")
337
+
338
+
# Publish the dataset (only needs to happen once)
339
+
dataset = weave.Dataset(
340
+
name="my_eval_dataset",
341
+
rows=[
342
+
{"question": "What is the capitol of France?", "expected": "Paris"},
343
+
{"question": "What U.S. state is Seattle in?", "expected": "Washington"},
344
+
{"question": "In what country is Mount Fuji located in?", "expected": "Japan"},
// Publish the dataset (only needs to happen once)
360
+
const dataset =newDataset({
361
+
name: 'my_eval_dataset',
362
+
rows: [
363
+
{"question": "What is the capitol of France?", "expected": "Paris"},
364
+
{"question": "What U.S. state is Seattle in?", "expected": "Washington"},
365
+
{"question": "In what country is Mount Fuji located in?", "expected": "Japan"},
366
+
],
367
+
});
368
+
const datasetRef =awaitdataset.save();
369
+
370
+
// Retrieve the published dataset
371
+
const published =awaitdatasetRef.get();
372
+
```
373
+
</Tab>
374
+
</Tabs>
375
+
376
+
322
377
### Get outputs before logging
323
378
324
379
You can first compute your model outputs, then separately log predictions and scores. This allows for better separation of evaluation and logging logic.
0 commit comments