Skip to content

F.13: Crosspost scheduler + log/moderation GC#20

Open
kh0pper wants to merge 1 commit intof12-cross-app-bridgingfrom
f13-crosspost-scheduler-gc
Open

F.13: Crosspost scheduler + log/moderation GC#20
kh0pper wants to merge 1 commit intof12-cross-app-bridgingfrom
f13-crosspost-scheduler-gc

Conversation

@kh0pper
Copy link
Copy Markdown
Owner

@kh0pper kh0pper commented Apr 12, 2026

Summary

Follow-up after Phase 2. Completes F.12.2's queued-crosspost story by adding the dispatcher that fires scheduled_at-passed entries, and sweeps stale rows from crosspost_log + moderation_actions.

What lands

  • Schema: crosspost_log.transformed_payload_json column via addColumnIfMissing. crow_crosspost now stores the transformed payload on insert. Legacy F.12 rows without stored payloads are gracefully marked status='manual' with an explanatory error.
  • servers/gateway/crossposting/scheduler.js (new):
    • Publish loop (15s): scans for ready or queued AND scheduled_at <= now. Batches up to 20 rows/tick; in-flight set prevents overlap.
    • Publishers: mastodon, gotosocial, crow-blog (direct DB insert since blog is in-process). Each uses its bundle's env vars (MASTODON_URL + MASTODON_ACCESS_TOKEN, etc.) — bypasses MCP entirely, which is simpler and survives transport changes.
    • MANUAL_TARGETS: pixelfed / peertube / funkwhale (media-heavy, need binary data we don't store) and writefreely / lemmy / matrix-dendrite (text-but-need-context — collection alias / community_id / room_id). Scheduler marks these status='manual' so the operator completes the publish with local knowledge.
    • GC loop (1h, kicked on start): prunes crosspost_log rows >30 days old in terminal statuses only. Also sweeps moderation_actions past their F.11 expires_at into status='expired' — closing the TTL loop F.11 documented but didn't wire.
    • CROW_DISABLE_CROSSPOST_SCHEDULER=1 for testing.
  • servers/gateway/index.js: starts the scheduler alongside the existing orchestrator pipeline runner.
  • skills/crow-crosspost.md + CLAUDE.md: docs updated to reflect auto-publish for text targets.

Design notes

  • Publishers bypass MCP and call the target REST APIs directly. No in-process MCP client; no layering round-trip for something that's fundamentally HTTP.
  • status='manual' is not a failure — it's the scheduler saying "I don't have what I need to publish autonomously." The operator completes these and calls crow_crosspost_mark_published to close the audit trail.
  • GC preserves in-flight queued/ready rows past the 30-day cutoff. In practice they should never get that old, but if the operator forgot to cancel one, GC won't silently delete it.

Test plan

  • node --check on all files
  • node scripts/init-db.js lands the new column cleanly
  • One-tick test: media-target row → status='manual'; legacy no-payload row → status='manual' with error note
  • Publishers throw clean errors when tokens aren't set
  • Blog publisher round-trip (DB insert → cleanup)
  • npm run check passes
  • End-to-end against live mastodon/gotosocial (deferred — needs real federated instances)
  • GC tick against a populated log (manually run in a test DB OK; live GC waits for operational data)

Remaining punch list (next PR)

  • Writefreely / lemmy / matrix-dendrite publishers (need target-specific context — probably pulled from a default config setting or a crosspost_rules row).
  • Pixelfed / peertube / funkwhale media publishers (need binary storage — scheduler would need to hold the upload file addressably).

Rollout position

Stacked on F.12. Independent follow-up — can merge anytime after F.12.

🤖 Generated with Claude Code

Follow-up after Phase 2. Completes F.12.2's queued-crosspost story by
adding the dispatcher that fires scheduled_at-passed entries, and
sweeps stale rows from crosspost_log + moderation_actions.

**What lands**

- scripts/init-db.js — adds `transformed_payload_json` column to
  `crosspost_log` via `addColumnIfMissing`. Scheduler needs the
  transformed payload to publish; earlier F.12 rows (no stored payload)
  are marked status='manual' so the operator can handle them by hand.
- servers/sharing/server.js — `crow_crosspost` now stores the
  transformed payload in the new column on insert. Tool surface is
  unchanged; rows produced after this PR are auto-publishable.
- servers/gateway/crossposting/scheduler.js — NEW module:
    • Publish loop (every 15s): scans crosspost_log for
      status='ready' OR (status='queued' AND scheduled_at <= now).
      Batches up to 20 rows per tick; in-flight set prevents overlap.
    • Publishers: mastodon (/api/v1/statuses with Bearer token),
      gotosocial (same shape), crow-blog (direct DB insert since
      blog is in-process). Each reads its bundle's env vars
      (MASTODON_URL + MASTODON_ACCESS_TOKEN, etc.).
    • MANUAL_TARGETS set for media-heavy bundles (pixelfed/peertube/
      funkwhale) + text-but-needs-context (writefreely, lemmy,
      matrix-dendrite) — scheduler marks these status='manual' so
      operator can complete the publish with local knowledge (which
      collection, which community, which room).
    • GC loop (every 1h, kicked on start): prunes crosspost_log rows
      >30 days old in terminal statuses (published/cancelled/error/
      manual; queued/ready are preserved). Also sweeps
      moderation_actions past their F.11 expires_at into
      status='expired' — closing the TTL loop the F.11 docs promised.
    • Success raises a low-priority Crow notification; errors raise
      high-priority with the 200-char error message.
    • CROW_DISABLE_CROSSPOST_SCHEDULER=1 for testing / disabling.
- servers/gateway/index.js — starts the scheduler alongside the
  existing orchestrator pipeline runner.

**Design notes**

- Publishers bypass MCP entirely and call the target app's REST API
  directly. Simpler than bootstrapping an in-process MCP client for
  each publish, and survives MCP transport changes cleanly.
- The "manual" status is not a failure. It's the scheduler saying "I
  can't do this automatically because I don't have the binary payload
  or the target-specific context." The operator completes these by
  calling `<app>_upload_track` / `pf_post_photo` / `pt_upload_video`
  with the transformed payload as a starting point, then
  `crow_crosspost_mark_published` closes the audit trail.
- GC only deletes from terminal statuses. In-flight queued/ready rows
  are preserved past the 30-day cutoff — in practice they should never
  be that old, but if the operator forgot to cancel one, GC won't
  delete it and lose it silently.

**Verified**

- node --check on all modified + new files
- node scripts/init-db.js runs cleanly; new column lands
- Inserted a media-target row + a legacy no-payload row; one tick
  of publishTick marks both status='manual' with the right reason
- Mastodon + GoToSocial publishers throw clean errors when their
  tokens aren't set
- Blog publisher round-trips (DB insert → cleanup)
- npm run check passes

**Remaining F.12 follow-ups in the punch list**

- Writefreely / lemmy / matrix-dendrite publishers (need target-
  specific context: collection_alias / community_id / room_id —
  either inferred from a default config setting or fetched from a
  crosspost_rules row). Shipping without these keeps the blast radius
  of F.13 small.
- Pixelfed / peertube / funkwhale media publishers (need binary
  storage — scheduler would need to hold the upload file somewhere
  addressable. Deferred; operators who want auto-media-crossposting
  drive the upload themselves and call mark_published).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant