bug: claude-cli provider crashes after ~4 tests in combined eval run (SessionStart hook error)

## Summary

When running multiple eval files in a single `agentv eval run` invocation against the `claude-cli` target, the provider crashes after ~4 tests with a `SessionStart` hook error. All subsequent tests fail with the same error.

## Reproduction

```bash
# This fails after ~4 tests:
agentv eval run --target claude-cli --workers 1 "evals/hivespec/*.eval.yaml"

# This works — each invocation gets a fresh session:
agentv eval run --target claude-cli --workers 1 evals/hivespec/hs-claim.eval.yaml
agentv eval run --target claude-cli --workers 1 evals/hivespec/hs-explore.eval.yaml
# etc.
```

## Error

```json
{"type":"system","subtype":"hook_started","hook_id":"...","hook_name":"SessionStart:startup","hook_event":"SessionStart","session_id":"..."}
```

Claude CLI exits with code 1. The error is a raw JSON event from the session hook, not a proper error message.

## Observed behavior

- Tests 1-4: all PASS (1.000)
- Tests 5-17: all ERROR with the same SessionStart hook failure
- When the same 17 tests are run as 5 separate invocations (one per eval file), all complete successfully

## Expected behavior

All 17 tests should complete when run in a single invocation. The claude-cli provider should clean up sessions between tests and handle hook errors gracefully.

## Investigation starting points

1. Claude CLI provider: \`packages/core/src/evaluation/providers/\` — find the claude-cli provider implementation
2. Check how sessions are created and torn down between test cases
3. Check if there's a session cleanup or process kill between tests
4. Compare with pi-cli provider which handles 17 sequential tests without issues

## Acceptance signals

- [ ] \`agentv eval run --target claude-cli --workers 1 "evals/hivespec/*.eval.yaml"\` completes all 17 tests without SessionStart errors
- [ ] Each test gets a fresh Claude CLI session (no state leakage between tests)
- [ ] If a SessionStart hook fails, the provider retries with a fresh session before marking the test as an execution error

## Non-goals

- Parallel Claude CLI sessions (workers > 1) — that's a separate concern
- Changing how Claude CLI hooks work — the fix should be in agentv's provider layer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: claude-cli provider crashes after ~4 tests in combined eval run (SessionStart hook error) #830

Summary

Reproduction

Error

Observed behavior

Expected behavior

Investigation starting points

Acceptance signals

Non-goals

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: claude-cli provider crashes after ~4 tests in combined eval run (SessionStart hook error) #830

Description

Summary

Reproduction

Error

Observed behavior

Expected behavior

Investigation starting points

Acceptance signals

Non-goals

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions