Skip to content

refactor: rename dataset to suite across codebase #943

@christso

Description

@christso

Objective

Rename datasetsuite everywhere. An eval file is a test suite (lifecycle hooks, workspace setup/teardown, execution config), not a dataset (passive input/output pairs).

Context

The field has ping-ponged: dataseteval_set (#678) → dataset (#814). The #814 rename to dataset cited "industry conventions (Braintrust, LangSmith, DeepEval)" — but those platforms use "dataset" because their datasets really are just input/output pairs with no execution semantics. In agentv, eval files have:

  • before_all / after_all lifecycle hooks
  • before_each / after_each per-test hooks
  • Workspace setup/teardown and pooling
  • Execution config (trials, timeout, concurrency)
  • Target matrix evaluation

That's a test suite, not a dataset. suite is the accurate term.

Scope

~190 occurrences across ~27 files:

Area Files Occurrences
packages/core/ types, orchestrator, yaml-parser, jsonl-parser, schema, validator, trace ~50
apps/cli/ artifact-writer, trace commands, pipeline, results/serve, tests ~80
apps/studio/ api, types, Sidebar, RunDetail, routes, Breadcrumbs ~110
apps/web/ (docs) TBD TBD
examples/ baseline JSONL files ~42 files

Wire format (JSONL results)

// Before
{"test_id":"foo","dataset":"my-eval","score":1.0,...}

// After
{"test_id":"foo","suite":"my-eval","score":1.0,...}

Backward compatibility

Same pattern as the #814 rename:

  • JSONL parser accepts both suite and dataset (deprecated alias)
  • Zod schema accepts both field names
  • CLI --group-by dataset accepted as deprecated alias for --group-by suite
  • Pipeline bench/grade read manifest.suite ?? manifest.dataset

Internal types

// Before
interface EvalTest {
  readonly dataset?: string;
}
interface EvaluationResult {
  readonly dataset?: string;
}

// After
interface EvalTest {
  readonly suite?: string;
}
interface EvaluationResult {
  readonly suite?: string;
}

Studio UI

  • "Datasets" → "Suites" in headings, sidebar labels
  • Route /runs/:runId/dataset/:dataset/runs/:runId/suite/:suite
  • DatasetSidebarSuiteSidebar component rename (or keep generic)
  • API endpoint /api/runs/:filename/datasets/suites

YAML

No change needed — the top-level name field already names the suite. The dataset field on test cases becomes suite (optional grouping tag).

Acceptance Signals

  • EvalTest.suite and EvaluationResult.suite replace .dataset
  • JSONL output writes suite field
  • JSONL parser reads both suite and dataset (backward compat)
  • CLI commands use suite (--group-by suite, trace stats, etc.)
  • Studio UI labels say "Suites", routes use /suite/
  • Studio API endpoints renamed
  • Example baseline JSONL files updated
  • Docs updated on agentv.dev

Non-Goals

  • Changing the YAML name field (already neutral)
  • Changing the file extension (.eval.yaml is fine)
  • Removing backward-compat aliases in this PR (deprecate only)

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreAnything pertaining to core functionality of AgentV

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions