feat: zeph-bench benchmark harness

## Overview

Add `zeph-bench`: a `bench` feature-gated crate that runs Zeph against standard AI-agent benchmarks (LongMemEval, LOCOMO, FRAMES, tau-bench, GAIA) in a fully automated, reproducible manner.

**Spec**: `.local/specs/zeph-bench/spec.md`

## Motivation

Zeph's persistent semantic memory and tool-use capabilities have no external, reproducible measurement. This epic adds the harness to demonstrate and regression-track those differentiators against standard leaderboards.

## Architecture

- New crate `zeph-bench` at Layer 4 (same tier as `zeph-channels`, `zeph-tui`)
- `BenchmarkChannel` implements `zeph-core::Channel` — zero changes to agent core
- Dedicated Qdrant collection prefix and SQLite DB per run — never touches production state
- `bench` feature flag — excluded from `full` bundle
- CLI: `zeph bench list | download | run | show`

## Child Issues

- [ ] Crate scaffold and BenchmarkChannel
- [ ] CLI subcommand (zeph bench)
- [ ] Memory isolation (Qdrant + SQLite reset per scenario)
- [ ] Deterministic mode (temperature=0 override)
- [ ] LongMemEval dataset loader and evaluator
- [ ] JSON + Markdown result writer
- [ ] Baseline comparison (--baseline flag)
- [ ] Resume interrupted run (--resume flag)
- [ ] LOCOMO dataset loader
- [ ] FRAMES dataset loader
- [ ] tau-bench dataset loader
- [ ] GAIA dataset loader

## Acceptance Criteria

- `zeph bench run --dataset longmemeval` completes end-to-end and produces valid `results.json`
- Two identical runs produce identical scores (determinism)
- Memory-enabled score >= memory-disabled score on LongMemEval
- No writes to production Qdrant/SQLite during bench run
- `bench` feature excluded from `full` build


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: zeph-bench benchmark harness #2827

Overview

Motivation

Architecture

Child Issues

Acceptance Criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: zeph-bench benchmark harness #2827

Description

Overview

Motivation

Architecture

Child Issues

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions