Skip to content

Dashboard enrichment follow-ups: pending-count index + DNS-error visibility #129

@jiashuoz

Description

@jiashuoz

Follow-up from #114 review. Two medium-severity items that don't block but are worth cleaning up.

1. Missing supporting index for the PendingCount subquery in `ListAgentsByUser`

The enriched-agent path runs five correlated subqueries per agent row (identity/store.go:584-630). Four of them hit good indexes; the PendingCount subquery does not.

```sql
(SELECT count(*) FROM messages
WHERE agent_id = a.id AND status = 'pending_approval')
```

Existing indexes that the planner can use:

  • `idx_messages_agent_created(agent_id, created_at DESC)` — leading column matches, but `status` is residual.
  • `idx_messages_pending_approval(approval_expires_at) WHERE status='pending_approval'` — partial, but leading column is wrong for an agent lookup.

In practice the planner falls back to `idx_messages_agent_created` and filters residually. For users with normal pending volumes this is fine. For users with many agents × growing archival pending volume, this could degrade list-agents latency over time.

Author already acknowledged the cost concern in #114's commit body ("Switch to denormalized columns if read cost ever bites"). Two cleaner fixes:

Option A (cheapest): add a composite index.
```sql
CREATE INDEX IF NOT EXISTS idx_messages_agent_pending
ON messages (agent_id)
WHERE status = 'pending_approval';
```
Partial index on the pending subset, leading column `agent_id` — exactly what the subquery wants. Idempotent.

Option B (denormalize): cache the count on `agent_identities` with triggers or a periodic refresh job. More moving parts; defer until A proves insufficient.

Recommend A as a 5-line migration.

2. `handleVerifyDomain` silently swallows DNS lookup errors

`internal/agent/api.go` — `checkDomainRecords` was reshaped in #114 to return a per-record diagnostic. Previously, `net.LookupTXT` failures short-circuited with HTTP 400 ("DNS lookup failed for X: ..."). Now the error is swallowed:

```go
if txts, err := net.LookupTXT(domain); err == nil {
// ... process records ...
}
// no else branch — err is dropped
```

Same pattern for `net.LookupMX`. If TXT is genuinely missing, the response is 412 with `mx/spf` all "missing" — fine for the UI which renders chips. But true infrastructure failures (network down, resolver returning SERVFAIL, etc.) now look identical to "TXT record missing" → 412 with same shape. CLI and SDK callers lose the distinction.

Fix: preserve the structured diagnostic but surface the lookup error. Two options:

Option A: add a top-level field.
```json
{
"mx": "missing",
"spf": "missing",
"dkim": "deferred",
"verified": false,
"dns_error": "lookup_failed: dial tcp ..." // optional, only set on err
}
```

Option B: tri-state per record.
```go
type RecordStatus string
const (
RecordFound RecordStatus = "found"
RecordMissing RecordStatus = "missing"
RecordError RecordStatus = "error" // new
RecordDeferred RecordStatus = "deferred"
)
```
Per-record granularity (TXT errors don't affect MX lookup status, etc.) — but more surface area.

Option A is the lower-friction choice. Also: log the actual `err` server-side regardless (the current code drops it on the floor).

Effort

Both small — A1 is a 5-line migration + index, A2 is ~10 lines of Go + a new optional field in the response struct. Could land as one PR.

Test coverage gaps

  • No test that exercises the real-DNS failure path on `handleVerifyDomain`. Hard without DNS mocking — acceptable as long as the new error field is at least visible to the operator via log.
  • No explicit negative-window test for `?window=-5` on `/api/dashboard/stats` (Atoi succeeds, `<= 0` falls back to default). Behavior is correct but unpinned.

Existing mitigations

  • The PR's commit body acknowledges the cost concern on (1) — this is documented, just unaddressed.
  • DKIM "deferred" is correctly handled end-to-end (backend hardcodes; UI hides the row) so (2) doesn't propagate to that surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions