From ca3026e0134503819d3dce0a0710fbde513f8061 Mon Sep 17 00:00:00 2001 From: Jeremy Blankenship Date: Thu, 21 May 2026 06:53:30 -0500 Subject: [PATCH 1/2] Align Mintlify docs with current control-plane implementation. Fix deployment install paths, expand API/config reference for v0.3.0 surfaces, and document multi-worker routing and store-backend semantics verified against enoch-agentic-research-system. Co-authored-by: Cursor --- concepts/worker-gate.mdx | 2 +- configuration/overview.mdx | 46 +++++++++++++++++++++-- configuration/worker.mdx | 4 +- current-runtime-snapshot.mdx | 2 +- deployment.mdx | 6 +-- guides/alerts.mdx | 11 ++++-- reference/api-endpoints.mdx | 73 +++++++++++++++++++++++++++++++++++- release-notes.mdx | 9 +++++ 8 files changed, 139 insertions(+), 14 deletions(-) diff --git a/concepts/worker-gate.mdx b/concepts/worker-gate.mdx index b423a01..3902c6c 100644 --- a/concepts/worker-gate.mdx +++ b/concepts/worker-gate.mdx @@ -15,7 +15,7 @@ and project decision artifact contract, see The code models process snapshots, process info, telemetry samples, run records, and gate callbacks. Configuration supports `sample_interval_sec`, `idle_sustain_sec`, CPU and GPU idle thresholds, VRAM delta thresholds, workload profiles, and stale-project process reaper settings. -Supported workload classes in the config model are `inference_eval`, `training`, `control_plane`, and `agent_harness`. +Supported workload classes in the config model are `unknown`, `cpu_only`, `gpu_required`, `inference_eval`, `training`, `control_plane`, and `agent_harness`. The `cpu_only` and `gpu_required` classes map to machine-target routing through `workload_machine_targets` and `worker_targets`. ## Gate idea diff --git a/configuration/overview.mdx b/configuration/overview.mdx index d9fc6fc..aa84b1e 100644 --- a/configuration/overview.mdx +++ b/configuration/overview.mdx @@ -34,13 +34,17 @@ Use dry-run dispatch until you have intentionally enabled and tested live dispat Configure sampling, sustain windows, CPU/GPU thresholds, VRAM delta limits, workload profiles, and maximum wait after idle through the `sample_interval_sec`, `idle_sustain_sec`, `cpu_idle_threshold_pct`, `gpu_idle_*`, `vram_delta_threshold_mib`, `workload_profiles`, and `max_wait_after_idle_sec` fields. -Supported workload classes are `inference_eval`, `training`, `control_plane`, and `agent_harness`. +Supported workload classes are `unknown`, `cpu_only`, `gpu_required`, `inference_eval`, `training`, `control_plane`, and `agent_harness`. ## Optional integrations - Pushover queue alerts: `pushover_alerts_enabled`, `pushover_app_token`, `pushover_user_key`, `pushover_api_url`, `queue_alert_*`. +- Queue pump (timer-driven dispatch): `queue_pump_enabled`, `queue_pump_followup_launch_enabled`, `queue_pump_paper_draft_enabled`. +- Multi-worker routing: `worker_targets`, `workload_machine_targets` (maps workload classes such as `cpu_only` and `gpu_required` to named worker targets). +- Store backend: `control_plane_store_backend` (`sqlite`, `supabase_readonly`, or `supabase`), `enoch_core_store_backend`, `supabase_database_url`. Production on `enoch-core` uses `supabase` with local Postgres; the setting name is a compatibility adapter label. - Paper writer: `paper_writer_provider`, `paper_writer_base_url`, `paper_writer_model`, `paper_writer_api_key`, `paper_writer_*` tuning, and fallback. - Evidence sync: `paper_evidence_sync_enabled`, `paper_evidence_sync_ssh_host`, `paper_evidence_sync_remote_root`, `paper_evidence_sync_timeout_sec`. +- Route observability (private diagnostics): `route_observability_enabled`, `route_observability_log_path`, `route_observability_slow_ms`, `route_observability_memory_warn_rss_mib`. ## Deprecated aliases @@ -77,7 +81,7 @@ Older private prototypes used `omx_inbound_bearer_token`, `n8n_callback_url`, `n - Workload class applied when a dispatch request does not specify one. Valid values are `inference_eval`, `training`, `control_plane`, and `agent_harness`. Each class maps to a threshold profile; `inference_eval` applies stricter idle requirements (longer sustain, lower CPU threshold) than `training`. + Workload class applied when a dispatch request does not specify one. Valid values are `unknown`, `cpu_only`, `gpu_required`, `inference_eval`, `training`, `control_plane`, and `agent_harness`. Each class maps to a threshold profile; `inference_eval` applies stricter idle requirements (longer sustain, lower CPU threshold) than `training`. @@ -118,6 +122,20 @@ The following JSON mirrors the repository `config.example.json` baseline at the "gpu_idle_avg_threshold_pct": 10.0, "gpu_idle_peak_threshold_pct": 20.0, "vram_delta_threshold_mib": 1024 + }, + "cpu_only": { + "idle_sustain_sec": 300, + "cpu_idle_threshold_pct": 20.0, + "gpu_idle_avg_threshold_pct": 10.0, + "gpu_idle_peak_threshold_pct": 20.0, + "vram_delta_threshold_mib": 1024 + }, + "gpu_required": { + "idle_sustain_sec": 180, + "cpu_idle_threshold_pct": 35.0, + "gpu_idle_avg_threshold_pct": 10.0, + "gpu_idle_peak_threshold_pct": 20.0, + "vram_delta_threshold_mib": 1024 } }, "max_wait_after_idle_sec": 43200, @@ -131,10 +149,27 @@ The following JSON mirrors the repository `config.example.json` baseline at the "vllm", "sglang" ], - "completion_callback_url": "https://automation.example.com/webhook/enoch-control-plane-wake-ready", + "completion_callback_url": "https://automation.example.com/webhook/enoch-worker-callback", "completion_callback_token": "replace-with-callback-token", "completion_callback_timeout_sec": 120, + "worker_wake_gate_url": "http://worker.example:8787", + "worker_wake_gate_bearer_token": "replace-with-worker-token", + "worker_targets": { + "gb10": { + "wake_gate_url": "http://gb10-worker.example:8787", + "bearer_token": "replace-with-gb10-worker-token", + "role": "gpu_worker" + } + }, + "workload_machine_targets": { + "cpu_only": "cpu-proxmox-1", + "gpu_required": "gb10" + }, "log_events": true, + "live_dispatch_enabled": false, + "queue_pump_enabled": false, + "queue_pump_followup_launch_enabled": false, + "queue_pump_paper_draft_enabled": false, "pushover_alerts_enabled": false, "pushover_app_token": "", "pushover_user_key": "", @@ -149,7 +184,10 @@ The following JSON mirrors the repository `config.example.json` baseline at the "paper_writer_timeout_sec": 180, "paper_writer_temperature": 0.2, "paper_writer_max_tokens": 12000, - "paper_writer_fallback_enabled": true + "paper_writer_fallback_enabled": true, + "control_plane_store_backend": "sqlite", + "enoch_core_store_backend": "control_plane", + "legacy_notion_api_enabled": false } ``` diff --git a/configuration/worker.mdx b/configuration/worker.mdx index 515304d..a219f52 100644 --- a/configuration/worker.mdx +++ b/configuration/worker.mdx @@ -13,8 +13,10 @@ For current runtime topology and worker-gate boundaries, see - `worker_wake_gate_url` — base URL for the worker-gate API. The field name remains for compatibility. - `worker_wake_gate_bearer_token` — bearer token the control plane uses when calling the worker. +- `worker_targets` — named worker endpoints with per-target URLs, tokens, roles, and optional memory floors. +- `workload_machine_targets` — maps workload classes such as `cpu_only` and `gpu_required` to a `worker_targets` key. -For local development, set both `worker_wake_gate_url` and `completion_callback_url` to the local service. For production, use distinct tokens and hostnames. +For local development, set both `worker_wake_gate_url` and `completion_callback_url` to the local service. For production with multiple workers, configure `worker_targets` and route dispatches through `workload_machine_targets` or explicit `machine_target` in preflight requests. ## Evidence sync fields diff --git a/current-runtime-snapshot.mdx b/current-runtime-snapshot.mdx index b9d1692..5c94eae 100644 --- a/current-runtime-snapshot.mdx +++ b/current-runtime-snapshot.mdx @@ -54,7 +54,7 @@ paper. It is not a broad queue drain. `write_needed` means decision-gated positive paper work only. Negative/non-positive runs become no-paper rows, not papers to write. `finalize_needed` is automation/package work for an existing draft. -`publish_ready` means finalized package missing a corpus import ledger row. +`publish_ready` means required evidence paths and a finalized package exist, but the corpus import ledger row is still missing. ## Callback/reconnect behavior diff --git a/deployment.mdx b/deployment.mdx index 9f59cb1..4fc39ff 100644 --- a/deployment.mdx +++ b/deployment.mdx @@ -49,13 +49,13 @@ The helper can copy the checkout into `/opt`, create config and state directorie ```bash sudo scripts/install-control-plane.sh \ - --prefix /opt/enoch-agentic-research-system \ - --config-dir /etc/enoch \ + --prefix /opt/enoch-control-plane \ + --config-dir /etc/enoch-control-plane \ --state-dir /var/lib/enoch-control-plane \ --user enoch ``` -Edit `/etc/enoch/config.json` before enabling the service. Replace every placeholder token and URL. +The script defaults to those paths when you omit the flags. Edit `/etc/enoch-control-plane/config.json` before enabling the service. Replace every placeholder token and URL. ## Configure required secrets diff --git a/guides/alerts.mdx b/guides/alerts.mdx index 06e6c39..96ab288 100644 --- a/guides/alerts.mdx +++ b/guides/alerts.mdx @@ -33,10 +33,15 @@ systemctl list-timers enoch-queue-alert-check.timer Run a one-shot check: ```bash -sudo ENOCH_CONFIG=/etc/enoch/config.json \ - /opt/enoch-agentic-research-system/deploy/enoch_queue_alert_check.py +sudo ENOCH_CONFIG=/etc/enoch-control-plane/config.json \ + /opt/enoch-control-plane/deploy/enoch_queue_alert_check.py ``` ## Dispatch pump note -The queue alert checker can also participate in queue pumping when configured. Treat that as live automation: test status, preflight, and dry-run dispatch first. +When `"queue_pump_enabled": true`, the queue alert checker can also dispatch queued work when the lane is safe. Optional flags control follow-up launch and paper drafting: + +- `queue_pump_followup_launch_enabled` — dry-run and launch one bounded follow-up when no queued candidate exists (defaults off). +- `queue_pump_paper_draft_enabled` — draft one eligible paper before dispatch (defaults off; compatibility path). + +Treat queue pumping as live automation: test status, preflight, and dry-run dispatch first. diff --git a/reference/api-endpoints.mdx b/reference/api-endpoints.mdx index 497f47d..0772461 100644 --- a/reference/api-endpoints.mdx +++ b/reference/api-endpoints.mdx @@ -58,7 +58,7 @@ Response: ### `GET /control/health` -Requires bearer authentication. Returns control-plane health and the configured SQLite database path. +Requires bearer authentication. Returns control-plane health, the active store backend, and the configured database path (SQLite file path for local dev; Postgres adapter URL or label for production). ```bash curl -fsS \ @@ -66,6 +66,20 @@ curl -fsS \ http://127.0.0.1:8787/control/health ``` +Response: + +```json +{ + "ok": true, + "service": "enoch-langgraph-control-plane", + "db_path": "/var/lib/enoch-control-plane/state/control_plane.sqlite3", + "store_backend": "sqlite", + "timestamp": "2025-05-01T12:00:00+00:00" +} +``` + +Production on `enoch-core` sets `store_backend` to `supabase` with local Postgres; treat that value as a compatibility adapter label, not Supabase Cloud. + --- ### `GET /control/api/status` @@ -174,6 +188,26 @@ Returns route-observability health, if enabled. Returns the controller process RSS and memory warning status. +### `GET /control/api/v1/automation-readiness` + +Returns long-haul automation readiness (`READY` or `BLOCKED`) with the first failing criterion when blocked. + +### `GET /control/api/v1/research-quality` + +Returns Research Facility quality signals and admission diagnostics for the operator dashboard. + +### `GET /control/api/v1/source-lineage` + +Returns source-lineage and dedupe context for intake and queue rows. + +### `GET /control/api/v1/projects` + +Returns a bounded project list with cursor pagination and search filters. + +### `POST /control/api/v1/followups/launch-next` + +Dry-runs or launches one bounded follow-up investigation from existing no-paper evidence. Used by queue-pump follow-up automation when enabled. + @@ -281,6 +315,35 @@ Response (`WorkerPreflightResponse`): } ``` +Optional request field `machine_target` selects a named entry from `worker_targets` instead of `wake_gate_url`. + +--- + +### `POST /control/papers/draft-next` + +Drafts the next decision-gated positive paper candidate. Use `dry_run: true` first to preview eligibility without writing artifacts. + +```bash +curl -fsS \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"dry_run": true, "requested_by": "operator"}' \ + http://127.0.0.1:8787/control/papers/draft-next \ + | python3 -m json.tool +``` + +Request body (`DraftNextRequest`): + +```json +{ + "dry_run": true, + "requested_by": "operator", + "force": false +} +``` + +Response `action` values: `"noop"`, `"dry_run_draft"`, `"drafted"`. When `paper_evidence_sync_enabled` is true and local evidence is missing, live calls can return HTTP 424 with an `evidence_sync` detail object. + @@ -302,6 +365,8 @@ Returns `404` if neither a project record nor a queue item is found for `{projec +Paper automation routes are also available under `/control/api/publication-automation/*` as aliases for `/control/api/paper-reviews/*`. + ### `GET /control/api/papers` Returns the full paper queue as a paginated list. @@ -547,6 +612,12 @@ curl -fsS \ http://127.0.0.1:8787/control/api/intake/notion ``` +--- + +### `POST /control/api/worker-callback` + +Receives idempotent worker completion callbacks from the worker gate. Callbacks use an `idempotency_key`; duplicate keys with the same payload are accepted, conflicting payloads return `409`. + diff --git a/release-notes.mdx b/release-notes.mdx index 92a4961..f34311c 100644 --- a/release-notes.mdx +++ b/release-notes.mdx @@ -9,6 +9,15 @@ The canonical runtime changelog lives in the system repository at [`CHANGELOG.md For the current runtime topology and compatibility boundary, see [current runtime snapshot](/current-runtime-snapshot). +## 0.3.0 — 2026-05-15 + +- Added durable worker callback outbox and replay support for transient callback failures. +- Added queue-alert auto-reconciliation for stale active lanes when a completed decision artifact is present. +- Added Research Facility quality signals, post-prompt diagnostics, bounded-follow-up prioritization, and janitor candidate review support. +- Tightened paper-production gates so only explicit positive decisions enter the write-needed lane. +- Fixed stale active-lane handling, database connection lock release, janitor run-cycle integration, and corpus ledger idempotency. +- Updated dependency locks and pinned GitHub Actions away from floating major tags. + ## 0.2.0 — 2026-05-09 - Package name changed to `enoch-control-plane`. From b2cb57d72699e75ab6f6cec22682db9e25e3a3fb Mon Sep 17 00:00:00 2001 From: Jeremy Blankenship Date: Thu, 21 May 2026 07:23:58 -0500 Subject: [PATCH 2/2] Fix PR accuracy issues found in v0.3.0 docs audit. Align the example config with config.example.json, correct draft-next and rewrite-draft HTTP behavior, and satisfy docs validation for runtime topology links. Co-authored-by: Cursor --- configuration/overview.mdx | 19 ++++++++++++++----- guides/alerts.mdx | 2 +- reference/api-endpoints.mdx | 8 ++++---- 3 files changed, 19 insertions(+), 10 deletions(-) diff --git a/configuration/overview.mdx b/configuration/overview.mdx index aa84b1e..7cd1adf 100644 --- a/configuration/overview.mdx +++ b/configuration/overview.mdx @@ -90,7 +90,7 @@ Older private prototypes used `omx_inbound_bearer_token`, `n8n_callback_url`, `n ## Annotated example config -The following JSON mirrors the repository `config.example.json` baseline at the time this docs page was audited. It includes optional/default-off sections such as live dispatch, Pushover alerts, evidence sync, and provider-backed paper writing. Copy it, remove fields you do not need, and replace every placeholder value before use. +The following JSON mirrors the repository `config.example.json` baseline at the time this docs page was audited. It includes optional/default-off sections such as queue pump, Pushover alerts, route observability, and provider-backed paper writing. Copy it, remove fields you do not need, and replace every placeholder value before use. ```json { @@ -159,6 +159,12 @@ The following JSON mirrors the repository `config.example.json` baseline at the "wake_gate_url": "http://gb10-worker.example:8787", "bearer_token": "replace-with-gb10-worker-token", "role": "gpu_worker" + }, + "cpu-proxmox-1": { + "wake_gate_url": "http://cpu-proxmox-1.example:8787", + "bearer_token": "replace-with-cpu-worker-token", + "role": "cpu_worker", + "min_memory_available_mib": 16384 } }, "workload_machine_targets": { @@ -166,10 +172,6 @@ The following JSON mirrors the repository `config.example.json` baseline at the "gpu_required": "gb10" }, "log_events": true, - "live_dispatch_enabled": false, - "queue_pump_enabled": false, - "queue_pump_followup_launch_enabled": false, - "queue_pump_paper_draft_enabled": false, "pushover_alerts_enabled": false, "pushover_app_token": "", "pushover_user_key": "", @@ -177,6 +179,8 @@ The following JSON mirrors the repository `config.example.json` baseline at the "queue_alert_cooldown_sec": 1800, "queue_alert_hang_after_sec": 3600, "queue_pump_enabled": false, + "queue_pump_followup_launch_enabled": false, + "queue_pump_paper_draft_enabled": false, "paper_writer_provider": "deterministic", "paper_writer_base_url": "https://api.synthetic.new/openai/v1", "paper_writer_model": "hf:zai-org/GLM-5.1", @@ -185,8 +189,13 @@ The following JSON mirrors the repository `config.example.json` baseline at the "paper_writer_temperature": 0.2, "paper_writer_max_tokens": 12000, "paper_writer_fallback_enabled": true, + "route_observability_enabled": false, + "route_observability_log_path": "", + "route_observability_slow_ms": 1000, + "route_observability_memory_warn_rss_mib": 0, "control_plane_store_backend": "sqlite", "enoch_core_store_backend": "control_plane", + "supabase_database_url": "", "legacy_notion_api_enabled": false } ``` diff --git a/guides/alerts.mdx b/guides/alerts.mdx index 96ab288..c710486 100644 --- a/guides/alerts.mdx +++ b/guides/alerts.mdx @@ -30,7 +30,7 @@ sudo systemctl enable --now enoch-queue-alert-check.timer systemctl list-timers enoch-queue-alert-check.timer ``` -Run a one-shot check: +Run a one-shot check on the control-plane host (see [current runtime snapshot](/current-runtime-snapshot) for topology): ```bash sudo ENOCH_CONFIG=/etc/enoch-control-plane/config.json \ diff --git a/reference/api-endpoints.mdx b/reference/api-endpoints.mdx index 0772461..a5f2415 100644 --- a/reference/api-endpoints.mdx +++ b/reference/api-endpoints.mdx @@ -315,7 +315,7 @@ Response (`WorkerPreflightResponse`): } ``` -Optional request field `machine_target` selects a named entry from `worker_targets` instead of `wake_gate_url`. +Optional request field `machine_target` selects a named entry from `worker_targets`. When set, the control plane resolves it to that target's `wake_gate_url`, bearer token, and memory floor. --- @@ -342,7 +342,7 @@ Request body (`DraftNextRequest`): } ``` -Response `action` values: `"noop"`, `"dry_run_draft"`, `"drafted"`. When `paper_evidence_sync_enabled` is true and local evidence is missing, live calls can return HTTP 424 with an `evidence_sync` detail object. +Response `action` values: `"noop"`, `"dry_run_draft"`, `"drafted"`. Live calls skip candidates with missing evidence and return `action: "noop"` with skipped candidate details; they do not return HTTP 424. @@ -365,7 +365,7 @@ Returns `404` if neither a project record nor a queue item is found for `{projec -Paper automation routes are also available under `/control/api/publication-automation/*` as aliases for `/control/api/paper-reviews/*`. +Paper automation routes are mounted at both `/control/api/publication-automation/*` and `/control/api/paper-reviews/*`. Prefer `publication-automation` in new integrations. ### `GET /control/api/papers` @@ -429,7 +429,7 @@ Returns `404` if no matching item exists. ### `POST /control/api/paper-reviews/{paper_id}/rewrite-draft` -Rewrites the paper draft artifacts for the given paper using the configured AI writer (e.g. GLM-5.1 via Synthetic.new). If `paper_evidence_sync_enabled` is `true`, the control plane syncs evidence from the worker before rewriting. +Rewrites the paper draft artifacts for the given paper using the configured AI writer (e.g. GLM-5.1 via Synthetic.new). If `paper_evidence_sync_enabled` is `true`, the control plane syncs evidence from the worker before rewriting. When sync is enabled and local evidence is still missing, live calls return HTTP 424 with an `evidence_sync` detail object. ```bash curl -fsS \