cron: run-top-query has no observability — failed runs are invisible until the demand row stays pending

## Problem

`#162` / PR `#173` fixed the cron's maxDuration alignment and added poison-query backoff. Good. But the cron still has no surfaced telemetry — a hung outbound fetch, a 500 from `/api/agent`, a Socrata 5xx, or a `markAttemptFailed` storm all live only in Vercel function logs (which nobody tails) and as a row that quietly stays `pending` in the demand queue.

For a system whose marketing claim is "ask a question, the agent answers it on its own schedule," the right SLO surface is "what % of cron picks ran to phase=done in the last 24h, and which queries are stuck in poison-backoff." We don't have that.

## Fix

Two additions in `app/lib/demand.ts` + `app/api/cron/run-top-query/route.ts`:

1. **Per-run record** — when the cron picks a query, INSERT a `cron_runs` row with `(query_hash, started_at, status='running')`. On success: `status='done', finished_at, duration_ms`. On `markAttemptFailed`: `status='failed', error_excerpt, finished_at`.
2. **Admin endpoint** — `GET /api/admin/cron-runs?limit=50` returning the newest rows, plus `?status=failed` filter. Add a small panel to `app/admin/AdminConsole.tsx` showing the last 24h success rate + the 5 most-recent failures with their error excerpts.

Bonus: a single counter `cron_runs_failed_24h` exposed via `/api/cache-stats` (or a sibling endpoint) so the existing watchdog can alert when it crosses a threshold.

## Code pointers

- `app/api/cron/run-top-query/route.ts:1-40` — entry, no telemetry
- `app/lib/demand.ts` — add `recordCronRun()`, `finishCronRun()`, `listCronRuns()`
- `app/api/admin/runs/route.ts` — pattern to copy for the new admin endpoint
- `app/admin/AdminConsole.tsx` — where the new panel lands
- `scripts/watchdog.sh` — could optionally curl the new counter

## Acceptance

- [ ] `cron_runs` table created via the existing migration mechanism
- [ ] Every cron tick writes one row with start + finish + status
- [ ] `/api/admin/cron-runs` returns the last 50 with status filter
- [ ] AdminConsole panel renders 24h success rate + 5 most recent failures
- [ ] One smoke test covers the row write on a successful + a failed path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cron: run-top-query has no observability — failed runs are invisible until the demand row stays pending #178

Problem

Fix

Code pointers

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

cron: run-top-query has no observability — failed runs are invisible until the demand row stays pending #178

Description

Problem

Fix

Code pointers

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions