diff --git a/docs/contract/agent-interface.md b/docs/contract/agent-interface.md index af41696..61f455f 100644 --- a/docs/contract/agent-interface.md +++ b/docs/contract/agent-interface.md @@ -385,6 +385,12 @@ real-LLM driver `agent_tools.py`): None of these are large lifts. The biggest addition is the `escurel` meta-skill, which is just one markdown file. +> **Status (delivered).** Every row above now ships in the Rust +> gateway: `neighbours`, `validate`, the CRDT write path + `update_page` +> fallback, `run_stored_query` over MCP, the mandatory auto-shipped +> `escurel` meta-skill, and `search(granularity='block'|'page')` with +> the frontmatter `filter`. + ## Decisions locked (2026-05-17) The four open questions above were resolved in the design diff --git a/docs/operations.md b/docs/operations.md index a692dea..ced4f96 100644 --- a/docs/operations.md +++ b/docs/operations.md @@ -18,12 +18,15 @@ on first boot of a fresh host (see [Node loss](#node-loss--fresh-host)). | `GET /healthz` | Liveness. Always `200 OK` while the process is up; dependency-free. Wire this to the Nomad/Consul check. | | `GET /readyz` | Readiness. `200` only when LaneStore + indexer + embedder are all up; `503` with a per-component JSON body otherwise. A degraded embedder (model failed to load) shows here as `{"components":{"embedder":false}}` — the process still serves liveness and read traffic. | | `GET /version` | The build version (`VERSION` / `ESCUREL_VERSION`). | -| `GET /metrics` | Prometheus exposition. `escurel_up`, `escurel_requests_total{route,status}`, plus the OTel-exported request metrics. Scrape via the tailnet-only `escurel-metrics` Consul service. | +| `GET /metrics` | Prometheus exposition on a **dedicated listener** (`ESCUREL_OBSERVABILITY_METRICS_LISTEN`, default `:9090`) — *not* the main HTTP port. Exposes `escurel_up`, `escurel_requests_total{route,status}`, and the per-tool families `escurel_tool_calls{tenant,tool,transport,status}`, `escurel_tool_latency_ms`, `escurel_live_sessions_open`, `escurel_audit_drift`. Scrape via the tailnet-only `escurel-metrics` Consul service. | Logs are structured JSON on stdout with `ts`, `level`, `msg`, `app`, `env`, `version`, `request_id` (per [`spec/platform.md`](spec/platform.md)). Every `/mcp` request carries an `X-Request-Id` (inbound header honoured, -else a fresh ULID) threaded into a `mcp.request` span. +else a fresh ULID) threaded into a `mcp.request` span (which also carries +`transport` + `trace_id`). Each `tools/call` emits a `tool.completed` +record adding `tenant`, `tool`, `subject`, `status`, and `duration_ms` — +the per-call audit line. ## The admin surface diff --git a/docs/spec/dx.md b/docs/spec/dx.md index aed0944..d13b048 100644 --- a/docs/spec/dx.md +++ b/docs/spec/dx.md @@ -1,6 +1,6 @@ # Downstream-app integration contract -**Status:** Proposal. Locked items move into the table in [`README.md`](README.md#locked-design-decisions); open items live at the bottom of this file. +**Status:** Delivered (contract honoured by `escurel-client` + `escurel-test-support`). Locked items move into the table in [`README.md`](README.md#locked-design-decisions); open items live at the bottom of this file. **Scope:** The contract escurel commits to for *applications built on top of escurel* — specifically, what their integration test harness can rely on. The rest of the spec describes the service from the operator's and implementer's seat; this doc describes it from the seat of someone wiring escurel into another product's tests. The motivating shape is concrete: a new application — frontend + backend — that uses escurel as its store and chains through triton (the DataZoo agent-ingress gateway) to its agents. The integration test the application's harness needs to write is: @@ -35,14 +35,14 @@ The escurel workspace already contains the *primitives* a downstream test needs; | Typed MCP test client | Raw JSON-RPC `POST /mcp` in `tests/mcp.rs` (`call_tool`). | `McpTestClient` in `escurel-test-support`, wrapping `escurel-client`. | | Recipe for `escurel + X` chaining | Not present. | §"Chaining recipe" below. | -The implementation of `escurel-test-support` and `escurel-client` is a separate milestone (see §"Implementation status"). This doc fixes the *contract* so the implementing PRs and the first consuming application can land in parallel. +`escurel-test-support` and `escurel-client` are now implemented (see §"Implementation status"); this doc remains the *contract* both honour. ## Test-process façade A downstream test imports one crate (`escurel-test-support` as a `dev-dependency`) and uses one type to bring escurel up. The contract is: ```rust -// not yet implemented; this is the committed shape. +// the shipped shape (crates/escurel-test-support). pub struct EscurelProcess { /* opaque */ } @@ -242,13 +242,13 @@ What it does **not** guarantee: ## Implementation status -Not yet implemented. This document fixes the contract; the implementing milestone delivers: +**Delivered.** All three pieces ship in the workspace: -1. **`crates/escurel-test-support/`** — `EscurelProcess`, `Opts`, `AuthMode`, `FixtureBuilder`, `McpTestClient`. Reuses the helpers already in `tests/auth_quota.rs` and `tests/mcp.rs`. -2. **`crates/escurel-client/`** — typed wrapper around `escurel-proto`'s tonic codegen, with HTTP and gRPC transports. -3. **`examples/echo-app/`** (or a sibling repo) — a minimal application demonstrating the full chaining recipe above, with its `tests/e2e.rs` as the executable proof that the contract holds. +1. **`crates/escurel-test-support/`** — `EscurelProcess`, `Opts`, `AuthMode`, `FixtureBuilder`, `McpTestClient`. Drives the gateway's own no-mock integration tests. +2. **`crates/escurel-client/`** — typed wrapper around `escurel-proto`'s tonic codegen (exercised by `crates/escurel-client/tests/client_roundtrip.rs`). +3. **`examples/echo-app/`** — a minimal application demonstrating the chaining recipe above, with its `tests/e2e.rs` as the executable proof that the contract holds. -The order is `escurel-client` → `escurel-test-support` (which depends on it) → example app. The example app's `tests/e2e.rs` is the acceptance test for this contract: if it does not read roughly like the §"Chaining recipe" snippet above, the contract has drifted from the implementation and one of them needs to move. +The dependency order is `escurel-client` → `escurel-test-support` (which depends on it) → example app. The example app's `tests/e2e.rs` is the acceptance test for this contract: if it drifts from the §"Chaining recipe" snippet above, the contract has diverged from the implementation and one of them needs to move. ## Open questions diff --git a/docs/spec/platform.md b/docs/spec/platform.md index c45d677..88f1de0 100644 --- a/docs/spec/platform.md +++ b/docs/spec/platform.md @@ -250,10 +250,20 @@ A small set of OTel-conventional metrics: | `escurel.storage_bytes` | gauge | `tenant`, `lane` (`markdown` / `duckdb` / `external_ducklake`) | | `escurel.audit_drift` | gauge | `tenant`, `category` (`mn-d` markdown-not-in-duckdb, `i-no-m` indexed-but-no-markdown) | -Exported via OTLP **and** scraped at `/metrics` on a separate -port (default `:9090`) for Prometheus operators who don't run -an OTLP collector. The `/metrics` endpoint is a thin Prometheus -text-format adapter over the same OTel metrics SDK. +Scraped at `/metrics` on a dedicated listener (default `:9090`, +tailnet-only — see [`operations.md`](../operations.md)). The live +gateway renders these through a Prometheus registry, so the wire +names are `_`-separated: `escurel.tool_calls` is exposed as +`escurel_tool_calls`, etc. Trace spans are exported via OTLP; +metric OTLP export is not yet wired. + +**Implemented today:** `escurel_tool_calls`, +`escurel_tool_latency_ms`, `escurel_live_sessions_open`, and +`escurel_audit_drift`, plus the gateway-level `escurel_up` and +`escurel_requests_total{route,status}`. The remaining +histograms/gauges in the table above (`write_lock_wait_ms`, +`embed_batch_size`, `embed_queue_depth`, `storage_bytes`) are +**reserved** — specified here, not yet populated. ### Logs