Follow-up from #114 review. Two medium-severity items that don't block but are worth cleaning up.
1. Missing supporting index for the PendingCount subquery in `ListAgentsByUser`
The enriched-agent path runs five correlated subqueries per agent row (identity/store.go:584-630). Four of them hit good indexes; the PendingCount subquery does not.
```sql
(SELECT count(*) FROM messages
WHERE agent_id = a.id AND status = 'pending_approval')
```
Existing indexes that the planner can use:
- `idx_messages_agent_created(agent_id, created_at DESC)` — leading column matches, but `status` is residual.
- `idx_messages_pending_approval(approval_expires_at) WHERE status='pending_approval'` — partial, but leading column is wrong for an agent lookup.
In practice the planner falls back to `idx_messages_agent_created` and filters residually. For users with normal pending volumes this is fine. For users with many agents × growing archival pending volume, this could degrade list-agents latency over time.
Author already acknowledged the cost concern in #114's commit body ("Switch to denormalized columns if read cost ever bites"). Two cleaner fixes:
Option A (cheapest): add a composite index.
```sql
CREATE INDEX IF NOT EXISTS idx_messages_agent_pending
ON messages (agent_id)
WHERE status = 'pending_approval';
```
Partial index on the pending subset, leading column `agent_id` — exactly what the subquery wants. Idempotent.
Option B (denormalize): cache the count on `agent_identities` with triggers or a periodic refresh job. More moving parts; defer until A proves insufficient.
Recommend A as a 5-line migration.
2. `handleVerifyDomain` silently swallows DNS lookup errors
`internal/agent/api.go` — `checkDomainRecords` was reshaped in #114 to return a per-record diagnostic. Previously, `net.LookupTXT` failures short-circuited with HTTP 400 ("DNS lookup failed for X: ..."). Now the error is swallowed:
```go
if txts, err := net.LookupTXT(domain); err == nil {
// ... process records ...
}
// no else branch — err is dropped
```
Same pattern for `net.LookupMX`. If TXT is genuinely missing, the response is 412 with `mx/spf` all "missing" — fine for the UI which renders chips. But true infrastructure failures (network down, resolver returning SERVFAIL, etc.) now look identical to "TXT record missing" → 412 with same shape. CLI and SDK callers lose the distinction.
Fix: preserve the structured diagnostic but surface the lookup error. Two options:
Option A: add a top-level field.
```json
{
"mx": "missing",
"spf": "missing",
"dkim": "deferred",
"verified": false,
"dns_error": "lookup_failed: dial tcp ..." // optional, only set on err
}
```
Option B: tri-state per record.
```go
type RecordStatus string
const (
RecordFound RecordStatus = "found"
RecordMissing RecordStatus = "missing"
RecordError RecordStatus = "error" // new
RecordDeferred RecordStatus = "deferred"
)
```
Per-record granularity (TXT errors don't affect MX lookup status, etc.) — but more surface area.
Option A is the lower-friction choice. Also: log the actual `err` server-side regardless (the current code drops it on the floor).
Effort
Both small — A1 is a 5-line migration + index, A2 is ~10 lines of Go + a new optional field in the response struct. Could land as one PR.
Test coverage gaps
- No test that exercises the real-DNS failure path on `handleVerifyDomain`. Hard without DNS mocking — acceptable as long as the new error field is at least visible to the operator via log.
- No explicit negative-window test for `?window=-5` on `/api/dashboard/stats` (Atoi succeeds, `<= 0` falls back to default). Behavior is correct but unpinned.
Existing mitigations
- The PR's commit body acknowledges the cost concern on (1) — this is documented, just unaddressed.
- DKIM "deferred" is correctly handled end-to-end (backend hardcodes; UI hides the row) so (2) doesn't propagate to that surface.
Follow-up from #114 review. Two medium-severity items that don't block but are worth cleaning up.
1. Missing supporting index for the PendingCount subquery in `ListAgentsByUser`
The enriched-agent path runs five correlated subqueries per agent row (identity/store.go:584-630). Four of them hit good indexes; the PendingCount subquery does not.
```sql
(SELECT count(*) FROM messages
WHERE agent_id = a.id AND status = 'pending_approval')
```
Existing indexes that the planner can use:
In practice the planner falls back to `idx_messages_agent_created` and filters residually. For users with normal pending volumes this is fine. For users with many agents × growing archival pending volume, this could degrade list-agents latency over time.
Author already acknowledged the cost concern in #114's commit body ("Switch to denormalized columns if read cost ever bites"). Two cleaner fixes:
Option A (cheapest): add a composite index.
```sql
CREATE INDEX IF NOT EXISTS idx_messages_agent_pending
ON messages (agent_id)
WHERE status = 'pending_approval';
```
Partial index on the pending subset, leading column `agent_id` — exactly what the subquery wants. Idempotent.
Option B (denormalize): cache the count on `agent_identities` with triggers or a periodic refresh job. More moving parts; defer until A proves insufficient.
Recommend A as a 5-line migration.
2. `handleVerifyDomain` silently swallows DNS lookup errors
`internal/agent/api.go` — `checkDomainRecords` was reshaped in #114 to return a per-record diagnostic. Previously, `net.LookupTXT` failures short-circuited with HTTP 400 ("DNS lookup failed for X: ..."). Now the error is swallowed:
```go
if txts, err := net.LookupTXT(domain); err == nil {
// ... process records ...
}
// no else branch — err is dropped
```
Same pattern for `net.LookupMX`. If TXT is genuinely missing, the response is 412 with `mx/spf` all "missing" — fine for the UI which renders chips. But true infrastructure failures (network down, resolver returning SERVFAIL, etc.) now look identical to "TXT record missing" → 412 with same shape. CLI and SDK callers lose the distinction.
Fix: preserve the structured diagnostic but surface the lookup error. Two options:
Option A: add a top-level field.
```json
{
"mx": "missing",
"spf": "missing",
"dkim": "deferred",
"verified": false,
"dns_error": "lookup_failed: dial tcp ..." // optional, only set on err
}
```
Option B: tri-state per record.
```go
type RecordStatus string
const (
RecordFound RecordStatus = "found"
RecordMissing RecordStatus = "missing"
RecordError RecordStatus = "error" // new
RecordDeferred RecordStatus = "deferred"
)
```
Per-record granularity (TXT errors don't affect MX lookup status, etc.) — but more surface area.
Option A is the lower-friction choice. Also: log the actual `err` server-side regardless (the current code drops it on the floor).
Effort
Both small — A1 is a 5-line migration + index, A2 is ~10 lines of Go + a new optional field in the response struct. Could land as one PR.
Test coverage gaps
Existing mitigations