Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion concepts/worker-gate.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ and project decision artifact contract, see

The code models process snapshots, process info, telemetry samples, run records, and gate callbacks. Configuration supports `sample_interval_sec`, `idle_sustain_sec`, CPU and GPU idle thresholds, VRAM delta thresholds, workload profiles, and stale-project process reaper settings.

Supported workload classes in the config model are `inference_eval`, `training`, `control_plane`, and `agent_harness`.
Supported workload classes in the config model are `unknown`, `cpu_only`, `gpu_required`, `inference_eval`, `training`, `control_plane`, and `agent_harness`. The `cpu_only` and `gpu_required` classes map to machine-target routing through `workload_machine_targets` and `worker_targets`.

## Gate idea

Expand Down
57 changes: 52 additions & 5 deletions configuration/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,17 @@ Use dry-run dispatch until you have intentionally enabled and tested live dispat

Configure sampling, sustain windows, CPU/GPU thresholds, VRAM delta limits, workload profiles, and maximum wait after idle through the `sample_interval_sec`, `idle_sustain_sec`, `cpu_idle_threshold_pct`, `gpu_idle_*`, `vram_delta_threshold_mib`, `workload_profiles`, and `max_wait_after_idle_sec` fields.

Supported workload classes are `inference_eval`, `training`, `control_plane`, and `agent_harness`.
Supported workload classes are `unknown`, `cpu_only`, `gpu_required`, `inference_eval`, `training`, `control_plane`, and `agent_harness`.

## Optional integrations

- Pushover queue alerts: `pushover_alerts_enabled`, `pushover_app_token`, `pushover_user_key`, `pushover_api_url`, `queue_alert_*`.
- Queue pump (timer-driven dispatch): `queue_pump_enabled`, `queue_pump_followup_launch_enabled`, `queue_pump_paper_draft_enabled`.
- Multi-worker routing: `worker_targets`, `workload_machine_targets` (maps workload classes such as `cpu_only` and `gpu_required` to named worker targets).
- Store backend: `control_plane_store_backend` (`sqlite`, `supabase_readonly`, or `supabase`), `enoch_core_store_backend`, `supabase_database_url`. Production on `enoch-core` uses `supabase` with local Postgres; the setting name is a compatibility adapter label.
- Paper writer: `paper_writer_provider`, `paper_writer_base_url`, `paper_writer_model`, `paper_writer_api_key`, `paper_writer_*` tuning, and fallback.
- Evidence sync: `paper_evidence_sync_enabled`, `paper_evidence_sync_ssh_host`, `paper_evidence_sync_remote_root`, `paper_evidence_sync_timeout_sec`.
- Route observability (private diagnostics): `route_observability_enabled`, `route_observability_log_path`, `route_observability_slow_ms`, `route_observability_memory_warn_rss_mib`.

## Deprecated aliases

Expand Down Expand Up @@ -77,7 +81,7 @@ Older private prototypes used `omx_inbound_bearer_token`, `n8n_callback_url`, `n
</ParamField>

<ParamField path="default_workload_class" type="string" default="inference_eval">
Workload class applied when a dispatch request does not specify one. Valid values are `inference_eval`, `training`, `control_plane`, and `agent_harness`. Each class maps to a threshold profile; `inference_eval` applies stricter idle requirements (longer sustain, lower CPU threshold) than `training`.
Workload class applied when a dispatch request does not specify one. Valid values are `unknown`, `cpu_only`, `gpu_required`, `inference_eval`, `training`, `control_plane`, and `agent_harness`. Each class maps to a threshold profile; `inference_eval` applies stricter idle requirements (longer sustain, lower CPU threshold) than `training`.
</ParamField>

<Tip>
Expand All @@ -86,7 +90,7 @@ Older private prototypes used `omx_inbound_bearer_token`, `n8n_callback_url`, `n

## Annotated example config

The following JSON mirrors the repository `config.example.json` baseline at the time this docs page was audited. It includes optional/default-off sections such as live dispatch, Pushover alerts, evidence sync, and provider-backed paper writing. Copy it, remove fields you do not need, and replace every placeholder value before use.
The following JSON mirrors the repository `config.example.json` baseline at the time this docs page was audited. It includes optional/default-off sections such as queue pump, Pushover alerts, route observability, and provider-backed paper writing. Copy it, remove fields you do not need, and replace every placeholder value before use.

```json
{
Expand Down Expand Up @@ -118,6 +122,20 @@ The following JSON mirrors the repository `config.example.json` baseline at the
"gpu_idle_avg_threshold_pct": 10.0,
"gpu_idle_peak_threshold_pct": 20.0,
"vram_delta_threshold_mib": 1024
},
"cpu_only": {
"idle_sustain_sec": 300,
"cpu_idle_threshold_pct": 20.0,
"gpu_idle_avg_threshold_pct": 10.0,
"gpu_idle_peak_threshold_pct": 20.0,
"vram_delta_threshold_mib": 1024
},
"gpu_required": {
"idle_sustain_sec": 180,
"cpu_idle_threshold_pct": 35.0,
"gpu_idle_avg_threshold_pct": 10.0,
"gpu_idle_peak_threshold_pct": 20.0,
"vram_delta_threshold_mib": 1024
}
},
"max_wait_after_idle_sec": 43200,
Expand All @@ -131,9 +149,28 @@ The following JSON mirrors the repository `config.example.json` baseline at the
"vllm",
"sglang"
],
"completion_callback_url": "https://automation.example.com/webhook/enoch-control-plane-wake-ready",
"completion_callback_url": "https://automation.example.com/webhook/enoch-worker-callback",
"completion_callback_token": "replace-with-callback-token",
"completion_callback_timeout_sec": 120,
"worker_wake_gate_url": "http://worker.example:8787",
"worker_wake_gate_bearer_token": "replace-with-worker-token",
"worker_targets": {
"gb10": {
"wake_gate_url": "http://gb10-worker.example:8787",
"bearer_token": "replace-with-gb10-worker-token",
"role": "gpu_worker"
},
"cpu-proxmox-1": {
"wake_gate_url": "http://cpu-proxmox-1.example:8787",
"bearer_token": "replace-with-cpu-worker-token",
"role": "cpu_worker",
"min_memory_available_mib": 16384
}
},
"workload_machine_targets": {
"cpu_only": "cpu-proxmox-1",
"gpu_required": "gb10"
Comment thread
alias8818 marked this conversation as resolved.
},
"log_events": true,
"pushover_alerts_enabled": false,
"pushover_app_token": "",
Expand All @@ -142,14 +179,24 @@ The following JSON mirrors the repository `config.example.json` baseline at the
"queue_alert_cooldown_sec": 1800,
"queue_alert_hang_after_sec": 3600,
"queue_pump_enabled": false,
"queue_pump_followup_launch_enabled": false,
"queue_pump_paper_draft_enabled": false,
"paper_writer_provider": "deterministic",
"paper_writer_base_url": "https://api.synthetic.new/openai/v1",
"paper_writer_model": "hf:zai-org/GLM-5.1",
"paper_writer_api_key": "",
"paper_writer_timeout_sec": 180,
"paper_writer_temperature": 0.2,
"paper_writer_max_tokens": 12000,
"paper_writer_fallback_enabled": true
"paper_writer_fallback_enabled": true,
"route_observability_enabled": false,
"route_observability_log_path": "",
"route_observability_slow_ms": 1000,
"route_observability_memory_warn_rss_mib": 0,
"control_plane_store_backend": "sqlite",
"enoch_core_store_backend": "control_plane",
"supabase_database_url": "",
"legacy_notion_api_enabled": false
}
```

Expand Down
4 changes: 3 additions & 1 deletion configuration/worker.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ For current runtime topology and worker-gate boundaries, see

- `worker_wake_gate_url` — base URL for the worker-gate API. The field name remains for compatibility.
- `worker_wake_gate_bearer_token` — bearer token the control plane uses when calling the worker.
- `worker_targets` — named worker endpoints with per-target URLs, tokens, roles, and optional memory floors.
- `workload_machine_targets` — maps workload classes such as `cpu_only` and `gpu_required` to a `worker_targets` key.

For local development, set both `worker_wake_gate_url` and `completion_callback_url` to the local service. For production, use distinct tokens and hostnames.
For local development, set both `worker_wake_gate_url` and `completion_callback_url` to the local service. For production with multiple workers, configure `worker_targets` and route dispatches through `workload_machine_targets` or explicit `machine_target` in preflight requests.

## Evidence sync fields

Expand Down
2 changes: 1 addition & 1 deletion current-runtime-snapshot.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ paper. It is not a broad queue drain.
`write_needed` means decision-gated positive paper work only.
Negative/non-positive runs become no-paper rows, not papers to write.
`finalize_needed` is automation/package work for an existing draft.
`publish_ready` means finalized package missing a corpus import ledger row.
`publish_ready` means required evidence paths and a finalized package exist, but the corpus import ledger row is still missing.

## Callback/reconnect behavior

Expand Down
6 changes: 3 additions & 3 deletions deployment.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,13 @@ The helper can copy the checkout into `/opt`, create config and state directorie

```bash
sudo scripts/install-control-plane.sh \
--prefix /opt/enoch-agentic-research-system \
--config-dir /etc/enoch \
--prefix /opt/enoch-control-plane \
--config-dir /etc/enoch-control-plane \
--state-dir /var/lib/enoch-control-plane \
--user enoch
```

Edit `/etc/enoch/config.json` before enabling the service. Replace every placeholder token and URL.
The script defaults to those paths when you omit the flags. Edit `/etc/enoch-control-plane/config.json` before enabling the service. Replace every placeholder token and URL.

## Configure required secrets

Expand Down
13 changes: 9 additions & 4 deletions guides/alerts.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,18 @@ sudo systemctl enable --now enoch-queue-alert-check.timer
systemctl list-timers enoch-queue-alert-check.timer
```

Run a one-shot check:
Run a one-shot check on the control-plane host (see [current runtime snapshot](/current-runtime-snapshot) for topology):

```bash
sudo ENOCH_CONFIG=/etc/enoch/config.json \
/opt/enoch-agentic-research-system/deploy/enoch_queue_alert_check.py
sudo ENOCH_CONFIG=/etc/enoch-control-plane/config.json \
/opt/enoch-control-plane/deploy/enoch_queue_alert_check.py
```

## Dispatch pump note

The queue alert checker can also participate in queue pumping when configured. Treat that as live automation: test status, preflight, and dry-run dispatch first.
When `"queue_pump_enabled": true`, the queue alert checker can also dispatch queued work when the lane is safe. Optional flags control follow-up launch and paper drafting:

- `queue_pump_followup_launch_enabled` — dry-run and launch one bounded follow-up when no queued candidate exists (defaults off).
- `queue_pump_paper_draft_enabled` — draft one eligible paper before dispatch (defaults off; compatibility path).

Treat queue pumping as live automation: test status, preflight, and dry-run dispatch first.
75 changes: 73 additions & 2 deletions reference/api-endpoints.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,28 @@ Response:

### `GET /control/health`

Requires bearer authentication. Returns control-plane health and the configured SQLite database path.
Requires bearer authentication. Returns control-plane health, the active store backend, and the configured database path (SQLite file path for local dev; Postgres adapter URL or label for production).

```bash
curl -fsS \
-H "Authorization: Bearer $TOKEN" \
http://127.0.0.1:8787/control/health
```

Response:

```json
{
"ok": true,
"service": "enoch-langgraph-control-plane",
"db_path": "/var/lib/enoch-control-plane/state/control_plane.sqlite3",
"store_backend": "sqlite",
"timestamp": "2025-05-01T12:00:00+00:00"
}
```

Production on `enoch-core` sets `store_backend` to `supabase` with local Postgres; treat that value as a compatibility adapter label, not Supabase Cloud.

---

### `GET /control/api/status`
Expand Down Expand Up @@ -174,6 +188,26 @@ Returns route-observability health, if enabled.

Returns the controller process RSS and memory warning status.

### `GET /control/api/v1/automation-readiness`

Returns long-haul automation readiness (`READY` or `BLOCKED`) with the first failing criterion when blocked.

### `GET /control/api/v1/research-quality`

Returns Research Facility quality signals and admission diagnostics for the operator dashboard.

### `GET /control/api/v1/source-lineage`

Returns source-lineage and dedupe context for intake and queue rows.

### `GET /control/api/v1/projects`

Returns a bounded project list with cursor pagination and search filters.

### `POST /control/api/v1/followups/launch-next`

Dry-runs or launches one bounded follow-up investigation from existing no-paper evidence. Used by queue-pump follow-up automation when enabled.

</Accordion>

<Accordion title="Queue Operations">
Expand Down Expand Up @@ -281,6 +315,35 @@ Response (`WorkerPreflightResponse`):
}
```

Optional request field `machine_target` selects a named entry from `worker_targets`. When set, the control plane resolves it to that target's `wake_gate_url`, bearer token, and memory floor.

---

### `POST /control/papers/draft-next`

Drafts the next decision-gated positive paper candidate. Use `dry_run: true` first to preview eligibility without writing artifacts.

```bash
curl -fsS \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"dry_run": true, "requested_by": "operator"}' \
http://127.0.0.1:8787/control/papers/draft-next \
| python3 -m json.tool
```

Request body (`DraftNextRequest`):

```json
{
"dry_run": true,
"requested_by": "operator",
"force": false
}
```

Response `action` values: `"noop"`, `"dry_run_draft"`, `"drafted"`. Live calls skip candidates with missing evidence and return `action: "noop"` with skipped candidate details; they do not return HTTP 424.

</Accordion>

<Accordion title="Project Operations">
Expand All @@ -302,6 +365,8 @@ Returns `404` if neither a project record nor a queue item is found for `{projec

<Accordion title="Paper Operations">

Paper automation routes are mounted at both `/control/api/publication-automation/*` and `/control/api/paper-reviews/*`. Prefer `publication-automation` in new integrations.

### `GET /control/api/papers`

Returns the full paper queue as a paginated list.
Expand Down Expand Up @@ -364,7 +429,7 @@ Returns `404` if no matching item exists.

### `POST /control/api/paper-reviews/{paper_id}/rewrite-draft`

Rewrites the paper draft artifacts for the given paper using the configured AI writer (e.g. GLM-5.1 via Synthetic.new). If `paper_evidence_sync_enabled` is `true`, the control plane syncs evidence from the worker before rewriting.
Rewrites the paper draft artifacts for the given paper using the configured AI writer (e.g. GLM-5.1 via Synthetic.new). If `paper_evidence_sync_enabled` is `true`, the control plane syncs evidence from the worker before rewriting. When sync is enabled and local evidence is still missing, live calls return HTTP 424 with an `evidence_sync` detail object.

```bash
curl -fsS \
Expand Down Expand Up @@ -547,6 +612,12 @@ curl -fsS \
http://127.0.0.1:8787/control/api/intake/notion
```

---

### `POST /control/api/worker-callback`

Receives idempotent worker completion callbacks from the worker gate. Callbacks use an `idempotency_key`; duplicate keys with the same payload are accepted, conflicting payloads return `409`.

</Accordion>

<Accordion title="Worker Gate">
Expand Down
9 changes: 9 additions & 0 deletions release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,15 @@ The canonical runtime changelog lives in the system repository at [`CHANGELOG.md
For the current runtime topology and compatibility boundary, see
[current runtime snapshot](/current-runtime-snapshot).

## 0.3.0 — 2026-05-15

- Added durable worker callback outbox and replay support for transient callback failures.
- Added queue-alert auto-reconciliation for stale active lanes when a completed decision artifact is present.
- Added Research Facility quality signals, post-prompt diagnostics, bounded-follow-up prioritization, and janitor candidate review support.
- Tightened paper-production gates so only explicit positive decisions enter the write-needed lane.
- Fixed stale active-lane handling, database connection lock release, janitor run-cycle integration, and corpus ledger idempotency.
- Updated dependency locks and pinned GitHub Actions away from floating major tags.

## 0.2.0 — 2026-05-09

- Package name changed to `enoch-control-plane`.
Expand Down
Loading