Skip to content

Commit edc46d9

Browse files
committed
docs: prioritize trend --last workflow
1 parent da5573c commit edc46d9

2 files changed

Lines changed: 31 additions & 27 deletions

File tree

apps/web/src/content/docs/docs/tools/trend.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,15 @@ Analyze the last 8 canonical runs in the current workspace:
1717
agentv trend --last 8
1818
```
1919

20+
This is the primary day-to-day workflow. In most cases, users should start with `--last`.
21+
2022
Filter to one dataset and target:
2123

2224
```bash
2325
agentv trend --last 8 --dataset code-review --target claude-sonnet
2426
```
2527

26-
Point directly at run workspaces or `index.jsonl` manifests:
28+
Point directly at run workspaces or `index.jsonl` manifests when you need a specific historical slice or want a reproducible example:
2729

2830
```bash
2931
agentv trend \

examples/features/trend/README.md

Lines changed: 28 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,23 @@ sample-runs/
2020
2026-03-15T10-00-00-000Z/index.jsonl
2121
```
2222

23-
These are canonical run directories with `index.jsonl`, so you can point `agentv trend` at them directly.
23+
These are canonical run directories with `index.jsonl`.
2424

2525
## End-User Flow
2626

27-
From this directory, run:
27+
Most real users will run `trend` against their latest eval history with `--last`.
28+
29+
To reproduce that flow from this example directory, first copy the sample runs into the normal runtime layout:
2830

2931
```bash
30-
bun ../../../apps/cli/src/cli.ts trend \
31-
sample-runs/2026-03-01T10-00-00-000Z \
32-
sample-runs/2026-03-08T10-00-00-000Z \
33-
sample-runs/2026-03-15T10-00-00-000Z \
34-
--dataset code-review \
35-
--target claude-sonnet
32+
mkdir -p .agentv/results/runs
33+
cp -R sample-runs/* .agentv/results/runs/
34+
```
35+
36+
Then run:
37+
38+
```bash
39+
bun ../../../apps/cli/src/cli.ts trend --last 3 --dataset code-review --target claude-sonnet
3640
```
3741

3842
Expected output:
@@ -56,9 +60,22 @@ Regression Gate: threshold=0.010 fail_on_degrading=false triggered=false
5660

5761
Interpretation:
5862

59-
- The command uses the matched intersection of test IDs across all runs.
60-
- Mean score declines each run, so the slope is negative.
61-
- The verdict is `degrading`.
63+
- The command auto-discovers the most recent three runs.
64+
- It filters to `dataset=code-review` and `target=claude-sonnet`.
65+
- It intersects matched test IDs across runs and detects a steady downward score trend.
66+
67+
## Explicit Inputs
68+
69+
If you want to see the same analysis without copying files into `.agentv/results/runs/`, point `trend` at the sample runs directly:
70+
71+
```bash
72+
bun ../../../apps/cli/src/cli.ts trend \
73+
sample-runs/2026-03-01T10-00-00-000Z \
74+
sample-runs/2026-03-08T10-00-00-000Z \
75+
sample-runs/2026-03-15T10-00-00-000Z \
76+
--dataset code-review \
77+
--target claude-sonnet
78+
```
6279

6380
## CI Gate Example
6481

@@ -76,18 +93,3 @@ bun ../../../apps/cli/src/cli.ts trend \
7693
```
7794

7895
This exits with code `1` because the degrading slope magnitude exceeds `0.01`.
79-
80-
## `--last` Workflow
81-
82-
If you want to test the exact runtime layout used by `agentv eval`, copy the sample runs into `.agentv/results/runs/` first:
83-
84-
```bash
85-
mkdir -p .agentv/results/runs
86-
cp -R sample-runs/* .agentv/results/runs/
87-
```
88-
89-
Then run:
90-
91-
```bash
92-
bun ../../../apps/cli/src/cli.ts trend --last 3 --dataset code-review --target claude-sonnet
93-
```

0 commit comments

Comments
 (0)