sync run: failed status comes back with no error info — agents can't tell rate-limit from auth from network

## Summary

`one --agent sync run <platform>` reports `status: "failed"` for individual profiles without surfacing **any** error context. Agents have no way to distinguish a transient 429 from an auth failure from a `not_in_channel` failure from a malformed-profile failure. The CLI's documented error handling (Retry-After, exponential backoff, adaptive throttle) appears to work internally, but its outputs and on-disk artifacts don't reflect any of it.

**Environment**
- `@withone/cli` 1.43.4
- Platform: slack (16 per-channel `conversations_<slug>` profiles + 1 `slack/users`)
- macOS 25.3.0 (darwin-arm64)

## Concrete repro

We ran a one-time pull across 16 channels (sequential — single `sync run slack` over 17 profiles total). 13 profiles completed (~880 records across 8.5 minutes; the slowest were `new_food_menu` 3m33s and `ordering` 3m32s — clearly hitting backoffs internally). The last 4 alphabetically failed:

```json
{
  "model": "conversations_receipts_dev",
  "recordsSynced": 0,
  "pagesProcessed": 0,
  "duration": "0s",
  "status": "failed",
  "deletedStale": 0,
  "statusCounts": {"active": 0, "archived": 0}
}
```

No `error` field. No HTTP status. No `Retry-After` value. No mention of which Slack API method failed. The same shape comes back for `sync list slack` — `status: "failed"` and nothing else.

A minute later, retrying just those 4 succeeded — `sun_may_24_moneka_arabic_jazz` took 1m31s for 21 records / 2 pages, which is consistent with several rounds of internal Retry-After backoff. **The only way to even guess "rate limit" was per-row timing math.**

Compounding evidence: `sync test slack/conversations_receipts_dev` against the same rate-limit window surfaces the real error cleanly:

```json
{
  "name": "single-page fetch",
  "ok": false,
  "detail": "HTTP 429: {\"ok\":false,\"error\":\"ratelimited\"}"
}
```

So the underlying signal is reachable — `sync test` propagates it. `sync run` swallows it.

## What's missing — proposed fields on a failed result

```json
{
  "model": "...",
  "status": "failed",
  "error": {
    "phase": "list_fetch | enrich | transform | upsert | hook",
    "message": "HTTP 429: ratelimited",
    "httpStatus": 429,
    "retryAfter": 60,
    "lastSuccessfulPage": null,
    "context": { "actionId": "...", "url": "..." }
  }
}
```

At minimum, populate `error` with whatever `sync test` already returns when it hits the same failure.

## Documented but missing on-disk artifacts

The mem/sync guide describes:
```
.one/sync/
  events/{platform}_{model}.jsonl     # change event logs (if onChange: "log")
  logs/{platform}.log                 # cron run logs
```

After ~30 minutes of repeated `sync run slack` calls (one cron-style and several manual runs that hit failures), `.one/sync/logs/` does not exist on this machine. The `cron run logs` line in the docs implies this is only populated by cron-mode runs; if so, the docs are misleading — operators reading the doc will expect run logs to land there regardless. Either:
- Have `sync run` write to `.one/sync/logs/<platform>.log` unconditionally, or
- Update the docs to make it crystal that manual runs leave no trace.

## Operational impact for our use case

We're building a per-channel mirror of Slack for an FDE prototype (`one --agent sync run slack` over ~16 channels, eventually scheduled `--every 5m`). Without:
1. Per-failure error context, and
2. `dateFilter`-driven incremental fetches (which we just added — but its absence by default in our manually-authored profiles meant every run was a full re-paginate, multiplying rate-limit pressure),

…sync schedules are operationally fragile. A scheduled tick that silently fails because of rate limits looks identical to a tick that fails because of an expired token, a renamed channel, or a withone outage.

## Mitigations we applied locally

1. Added `dateFilter: {param: "oldest", location: "query", format: "unix"}` to all conversation profiles so subsequent runs only fetch since `last_synced`.
2. Sequenced our runs with `~60s cooldown` between full re-fetches during the prototype to stay under Slack Tier-3 limits.
3. Wrote a tail-based "watch the timing" heuristic to guess at rate-limit-induced failures (a record-count of 0 with a duration of 0s = likely auth/scope; a slow record-count of N with high duration = backoffs).

Without visibility into the actual error, every operational decision about the sync engine becomes a guess.

---

Related: I filed [#138](https://github.com/withoneai/cli/issues/138) earlier today for the four embedded-postgres plugin issues that block `mem init` on darwin-arm64. This issue is about runtime observability for the sync engine — independent surface.

Happy to send a small PR to plumb the error through `sync run`'s result rows if a maintainer can point me at the right place in the bundled (or source) code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync run: failed status comes back with no error info — agents can't tell rate-limit from auth from network #139

Summary

Concrete repro

What's missing — proposed fields on a failed result

Documented but missing on-disk artifacts

Operational impact for our use case

Mitigations we applied locally

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

sync run: failed status comes back with no error info — agents can't tell rate-limit from auth from network #139

Description

Summary

Concrete repro

What's missing — proposed fields on a failed result

Documented but missing on-disk artifacts

Operational impact for our use case

Mitigations we applied locally

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions