Summary
one --agent sync run <platform> reports status: "failed" for individual profiles without surfacing any error context. Agents have no way to distinguish a transient 429 from an auth failure from a not_in_channel failure from a malformed-profile failure. The CLI's documented error handling (Retry-After, exponential backoff, adaptive throttle) appears to work internally, but its outputs and on-disk artifacts don't reflect any of it.
Environment
@withone/cli 1.43.4
- Platform: slack (16 per-channel
conversations_<slug> profiles + 1 slack/users)
- macOS 25.3.0 (darwin-arm64)
Concrete repro
We ran a one-time pull across 16 channels (sequential — single sync run slack over 17 profiles total). 13 profiles completed (~880 records across 8.5 minutes; the slowest were new_food_menu 3m33s and ordering 3m32s — clearly hitting backoffs internally). The last 4 alphabetically failed:
{
"model": "conversations_receipts_dev",
"recordsSynced": 0,
"pagesProcessed": 0,
"duration": "0s",
"status": "failed",
"deletedStale": 0,
"statusCounts": {"active": 0, "archived": 0}
}
No error field. No HTTP status. No Retry-After value. No mention of which Slack API method failed. The same shape comes back for sync list slack — status: "failed" and nothing else.
A minute later, retrying just those 4 succeeded — sun_may_24_moneka_arabic_jazz took 1m31s for 21 records / 2 pages, which is consistent with several rounds of internal Retry-After backoff. The only way to even guess "rate limit" was per-row timing math.
Compounding evidence: sync test slack/conversations_receipts_dev against the same rate-limit window surfaces the real error cleanly:
{
"name": "single-page fetch",
"ok": false,
"detail": "HTTP 429: {\"ok\":false,\"error\":\"ratelimited\"}"
}
So the underlying signal is reachable — sync test propagates it. sync run swallows it.
What's missing — proposed fields on a failed result
{
"model": "...",
"status": "failed",
"error": {
"phase": "list_fetch | enrich | transform | upsert | hook",
"message": "HTTP 429: ratelimited",
"httpStatus": 429,
"retryAfter": 60,
"lastSuccessfulPage": null,
"context": { "actionId": "...", "url": "..." }
}
}
At minimum, populate error with whatever sync test already returns when it hits the same failure.
Documented but missing on-disk artifacts
The mem/sync guide describes:
.one/sync/
events/{platform}_{model}.jsonl # change event logs (if onChange: "log")
logs/{platform}.log # cron run logs
After ~30 minutes of repeated sync run slack calls (one cron-style and several manual runs that hit failures), .one/sync/logs/ does not exist on this machine. The cron run logs line in the docs implies this is only populated by cron-mode runs; if so, the docs are misleading — operators reading the doc will expect run logs to land there regardless. Either:
- Have
sync run write to .one/sync/logs/<platform>.log unconditionally, or
- Update the docs to make it crystal that manual runs leave no trace.
Operational impact for our use case
We're building a per-channel mirror of Slack for an FDE prototype (one --agent sync run slack over ~16 channels, eventually scheduled --every 5m). Without:
- Per-failure error context, and
dateFilter-driven incremental fetches (which we just added — but its absence by default in our manually-authored profiles meant every run was a full re-paginate, multiplying rate-limit pressure),
…sync schedules are operationally fragile. A scheduled tick that silently fails because of rate limits looks identical to a tick that fails because of an expired token, a renamed channel, or a withone outage.
Mitigations we applied locally
- Added
dateFilter: {param: "oldest", location: "query", format: "unix"} to all conversation profiles so subsequent runs only fetch since last_synced.
- Sequenced our runs with
~60s cooldown between full re-fetches during the prototype to stay under Slack Tier-3 limits.
- Wrote a tail-based "watch the timing" heuristic to guess at rate-limit-induced failures (a record-count of 0 with a duration of 0s = likely auth/scope; a slow record-count of N with high duration = backoffs).
Without visibility into the actual error, every operational decision about the sync engine becomes a guess.
Related: I filed #138 earlier today for the four embedded-postgres plugin issues that block mem init on darwin-arm64. This issue is about runtime observability for the sync engine — independent surface.
Happy to send a small PR to plumb the error through sync run's result rows if a maintainer can point me at the right place in the bundled (or source) code.
Summary
one --agent sync run <platform>reportsstatus: "failed"for individual profiles without surfacing any error context. Agents have no way to distinguish a transient 429 from an auth failure from anot_in_channelfailure from a malformed-profile failure. The CLI's documented error handling (Retry-After, exponential backoff, adaptive throttle) appears to work internally, but its outputs and on-disk artifacts don't reflect any of it.Environment
@withone/cli1.43.4conversations_<slug>profiles + 1slack/users)Concrete repro
We ran a one-time pull across 16 channels (sequential — single
sync run slackover 17 profiles total). 13 profiles completed (~880 records across 8.5 minutes; the slowest werenew_food_menu3m33s andordering3m32s — clearly hitting backoffs internally). The last 4 alphabetically failed:{ "model": "conversations_receipts_dev", "recordsSynced": 0, "pagesProcessed": 0, "duration": "0s", "status": "failed", "deletedStale": 0, "statusCounts": {"active": 0, "archived": 0} }No
errorfield. No HTTP status. NoRetry-Aftervalue. No mention of which Slack API method failed. The same shape comes back forsync list slack—status: "failed"and nothing else.A minute later, retrying just those 4 succeeded —
sun_may_24_moneka_arabic_jazztook 1m31s for 21 records / 2 pages, which is consistent with several rounds of internal Retry-After backoff. The only way to even guess "rate limit" was per-row timing math.Compounding evidence:
sync test slack/conversations_receipts_devagainst the same rate-limit window surfaces the real error cleanly:{ "name": "single-page fetch", "ok": false, "detail": "HTTP 429: {\"ok\":false,\"error\":\"ratelimited\"}" }So the underlying signal is reachable —
sync testpropagates it.sync runswallows it.What's missing — proposed fields on a failed result
{ "model": "...", "status": "failed", "error": { "phase": "list_fetch | enrich | transform | upsert | hook", "message": "HTTP 429: ratelimited", "httpStatus": 429, "retryAfter": 60, "lastSuccessfulPage": null, "context": { "actionId": "...", "url": "..." } } }At minimum, populate
errorwith whateversync testalready returns when it hits the same failure.Documented but missing on-disk artifacts
The mem/sync guide describes:
After ~30 minutes of repeated
sync run slackcalls (one cron-style and several manual runs that hit failures),.one/sync/logs/does not exist on this machine. Thecron run logsline in the docs implies this is only populated by cron-mode runs; if so, the docs are misleading — operators reading the doc will expect run logs to land there regardless. Either:sync runwrite to.one/sync/logs/<platform>.logunconditionally, orOperational impact for our use case
We're building a per-channel mirror of Slack for an FDE prototype (
one --agent sync run slackover ~16 channels, eventually scheduled--every 5m). Without:dateFilter-driven incremental fetches (which we just added — but its absence by default in our manually-authored profiles meant every run was a full re-paginate, multiplying rate-limit pressure),…sync schedules are operationally fragile. A scheduled tick that silently fails because of rate limits looks identical to a tick that fails because of an expired token, a renamed channel, or a withone outage.
Mitigations we applied locally
dateFilter: {param: "oldest", location: "query", format: "unix"}to all conversation profiles so subsequent runs only fetch sincelast_synced.~60s cooldownbetween full re-fetches during the prototype to stay under Slack Tier-3 limits.Without visibility into the actual error, every operational decision about the sync engine becomes a guess.
Related: I filed #138 earlier today for the four embedded-postgres plugin issues that block
mem initon darwin-arm64. This issue is about runtime observability for the sync engine — independent surface.Happy to send a small PR to plumb the error through
sync run's result rows if a maintainer can point me at the right place in the bundled (or source) code.