Skip to content

feat: Eval benchmark repo sync to remote targets #1232

@christso

Description

@christso

Objective

Let AgentV sync eval benchmark repos on demand via Studio UI, following an ArgoCD-style model where git is the source of truth and sync is an explicit action.

Current Problem

Today, getting a benchmark repo up to date is left to the caller. Users script ad-hoc clones/pulls per environment. There's no unified way to say "this benchmark lives in a git repo" and have Studio manage it.

Design

1. Add source to benchmark entries

Extend the existing benchmark registry at ~/.agentv/benchmarks.yaml with an optional source field. Uses the existing interpolateEnv() from packages/core/src/evaluation/interpolation.ts.

benchmarks:
  - id: eval-benchmarks
    name: Eval Benchmarks
    path: evals
    source:
      url: ${{ BENCHMARK_REPO_URL }}
      ref: ${{ BENCHMARK_REPO_REF:-main }}
    added_at: "2026-03-20T10:00:00Z"
  • source is optional. If absent, path is used as-is (current behaviour).
  • No sync field. If source exists, the benchmark is git-backed and syncable.

2. Sync as explicit action (ArgoCD model)

  • Studio UI: Benchmarks screen shows a "Sync" button for git-backed benchmarks. Click → oneshot git clone --depth 1 or git pull --ff-only.
  • CLI: agentv benchmark sync <id> triggers the same oneshot pull.
  • Docker/CI: Run agentv benchmark sync as a pre-step in the container entrypoint or CI pipeline.

No background daemon. No continuous mode. No git-sync dependency.

3. Behaviour

State Action
No source Local benchmark. No sync button. Path used as-is.
Has source, first time git clone --depth 1 --filter=blob:none to path.
Has source, already cloned git pull --ff-only from source.ref.
Docker/CI agentv benchmark sync in entrypoint or script.

4. Interaction with existing eval workspace.repos

Individual .eval.yaml files can declare additional repos via workspace.repos. This is per-eval, inline, and unrelated to the benchmark registry. The two coexist:

  • Benchmark source — project-level "where does this benchmark live." Drives sync.
  • Eval workspace.repos — per-eval "this test needs this additional repo." Used at eval runtime.

Acceptance Criteria

  • A benchmark entry with source.url + source.ref can be synced to path via Studio UI "Sync" button or agentv benchmark sync.
  • agentv benchmark sync does git clone --depth 1 (first time) or git pull --ff-only (subsequent).
  • Existing benchmark entries without source continue to work unchanged.
  • ${{ ENV_VAR }} interpolation works in source.url and source.ref.
  • Studio benchmarks screen shows sync button for git-backed benchmarks.

Non-Goals

  • Continuous sync / background daemon.
  • git-sync dependency or binary distribution.
  • Two-way sync, conflict resolution, or write-back to source.
  • Auto-sync on interval.

Metadata

Metadata

Assignees

No one assigned

    Labels

    coreAnything pertaining to core functionality of AgentV

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions