Skip to content

Commit d4c2817

Browse files
dbrian57mdlinville
andauthored
[Weave] Reference dataset when using eval logger (#2288)
## Description Resolves DOCS-1234. Adds documentation demonstrating how to reference an already published dataset to keep users from resubmitting their data every time. --------- Co-authored-by: Matt Linville <matt@linville.me>
1 parent eefc00f commit d4c2817

1 file changed

Lines changed: 55 additions & 0 deletions

File tree

weave/guides/evaluation/evaluation_logger.mdx

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,6 +319,61 @@ While TypeScript doesn't have automatic cleanup with context managers, `logSumma
319319

320320

321321

322+
### Link to an existing dataset
323+
324+
When you pass raw datasets as `inputs` to `log_prediction`, Weave re-ingests the data with every evaluation run. This stores duplicate data, which may waste space if the dataset is large or if a large number of evaluations reuse it.
325+
326+
To avoid this duplication, publish your dataset to Weave before running any evaluations, then pass the published dataset's rows as `inputs`. Weave resolves references to published rows using internal references instead of re-ingesting the data. This technique gives you the same linked experience as the standard [Evaluation framework](../core-types/evaluations), where each prediction links back to a specific dataset row in the Weave UI.
327+
328+
The following example publishes a dataset and links to it in the `EvaluationLogger`, before retrieving and iterating over it like any other dataset.
329+
330+
<Tabs>
331+
<Tab title="Python">
332+
```python
333+
import weave
334+
from weave import EvaluationLogger
335+
336+
weave.init("your-team-name/your-project-name")
337+
338+
# Publish the dataset (only needs to happen once)
339+
dataset = weave.Dataset(
340+
name="my_eval_dataset",
341+
rows=[
342+
{"question": "What is the capitol of France?", "expected": "Paris"},
343+
{"question": "What U.S. state is Seattle in?", "expected": "Washington"},
344+
{"question": "In what country is Mount Fuji located in?", "expected": "Japan"},
345+
],
346+
)
347+
weave.publish(dataset)
348+
349+
# Retrieve the published dataset
350+
dataset = weave.ref("my_eval_dataset").get()
351+
```
352+
</Tab>
353+
<Tab title="TypeScript">
354+
```typescript
355+
import weave, {EvaluationLogger, Dataset} from 'weave';
356+
357+
await weave.init('your-team-name/your-project-name');
358+
359+
// Publish the dataset (only needs to happen once)
360+
const dataset = new Dataset({
361+
name: 'my_eval_dataset',
362+
rows: [
363+
{"question": "What is the capitol of France?", "expected": "Paris"},
364+
{"question": "What U.S. state is Seattle in?", "expected": "Washington"},
365+
{"question": "In what country is Mount Fuji located in?", "expected": "Japan"},
366+
],
367+
});
368+
const datasetRef = await dataset.save();
369+
370+
// Retrieve the published dataset
371+
const published = await datasetRef.get();
372+
```
373+
</Tab>
374+
</Tabs>
375+
376+
322377
### Get outputs before logging
323378

324379
You can first compute your model outputs, then separately log predictions and scores. This allows for better separation of evaluation and logging logic.

0 commit comments

Comments
 (0)