diff --git a/AGENTS.md b/AGENTS.md index 79e4fa71..065e28aa 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -73,32 +73,38 @@ Full diagram and concurrency model: [docs/dev/architecture.md](docs/dev/architec | **Lance docs index — fetch upstream Lance docs by problem domain** | **[docs/dev/lance.md](docs/dev/lance.md)** | | **Test coverage map — what's covered, what helpers to reuse, before-every-task checklist** | **[docs/dev/testing.md](docs/dev/testing.md)** | | Architecture, L1/L2 framing, concurrency model | [docs/dev/architecture.md](docs/dev/architecture.md) | -| Storage layout, `__manifest` schema, URI schemes, S3 env vars | [docs/user/storage.md](docs/user/concepts/storage.md) | -| `.pg` schema language, types, constraints, annotations, migration planning | [docs/user/schema-language.md](docs/user/schema/index.md) | -| Schema-lint codes (`OG-XXX-NNN`), families, severity, suppression | [docs/user/schema-lint.md](docs/user/schema/lint.md) | -| `.gq` query language, MATCH/RETURN/ORDER, search funcs, mutations, IR ops, lint codes | [docs/user/query-language.md](docs/user/queries/index.md) | -| Indexes (BTREE / inverted / vector / graph topology) | [docs/user/indexes.md](docs/user/search/indexes.md) | -| Embeddings (compiler + engine clients, env vars, `@embed`) | [docs/user/embeddings.md](docs/user/search/embeddings.md) | -| Branches, commit graph, snapshots, system branches | [docs/user/branches-commits.md](docs/user/branching/index.md) | -| Transactions and atomicity (per-query atomic; branches as multi-query transactions) | [docs/user/transactions.md](docs/user/branching/transactions.md) | +| Storage layout, `__manifest` schema, URI schemes, S3 env vars | [docs/user/concepts/storage.md](docs/user/concepts/storage.md) | +| `.pg` schema language, types, constraints, annotations, migration planning | [docs/user/schema/index.md](docs/user/schema/index.md) | +| Schema-lint codes (`OG-XXX-NNN`), families, severity, suppression | [docs/user/schema/lint.md](docs/user/schema/lint.md) | +| `.gq` query language, MATCH/RETURN/ORDER, IR ops, lint codes | [docs/user/queries/index.md](docs/user/queries/index.md) | +| Mutations — insert/update/delete, D2, atomicity | [docs/user/mutations/index.md](docs/user/mutations/index.md) | +| Search funcs (`nearest`/`bm25`/`rrf`), hybrid ranking | [docs/user/search/index.md](docs/user/search/index.md) | +| Indexes (BTREE / inverted / vector / graph topology) | [docs/user/search/indexes.md](docs/user/search/indexes.md) | +| Embeddings (compiler + engine clients, env vars, `@embed`) | [docs/user/search/embeddings.md](docs/user/search/embeddings.md) | +| Concepts — what OmniGraph is, L1/L2 framing | [docs/user/concepts/index.md](docs/user/concepts/index.md) | +| Quickstart — init → load → query → branch | [docs/user/quickstart.md](docs/user/quickstart.md) | +| Branches, commit graph, system branches | [docs/user/branching/index.md](docs/user/branching/index.md) | +| Snapshots & time travel | [docs/user/branching/time-travel.md](docs/user/branching/time-travel.md) | +| Three-way merge and conflict kinds (user-facing) | [docs/user/branching/merge.md](docs/user/branching/merge.md) | +| Transactions and atomicity (per-query atomic; branches as multi-query transactions) | [docs/user/branching/transactions.md](docs/user/branching/transactions.md) | | Direct-publish write path (staging, D2, recovery sidecars; the former Run state machine) | [docs/dev/writes.md](docs/dev/writes.md) | | Three-way merge and conflict kinds | [docs/dev/merge.md](docs/dev/merge.md) | -| Diff / change feed (`diff_between`, `diff_commits`) | [docs/user/changes.md](docs/user/branching/changes.md) | +| Diff / change feed (`diff_between`, `diff_commits`) | [docs/user/branching/changes.md](docs/user/branching/changes.md) | | Query execution, mutation execution, bulk loader, `load` vs `ingest` | [docs/dev/execution.md](docs/dev/execution.md) | -| `optimize` (compaction) and `cleanup` (version GC) | [docs/user/maintenance.md](docs/user/operations/maintenance.md) | -| Cluster operator guide (deploy/manage clusters, approvals, recovery, serving) | [docs/user/cluster.md](docs/user/clusters/index.md) | -| Cedar policy actions, scopes, CLI | [docs/user/policy.md](docs/user/operations/policy.md) | -| HTTP server endpoints, auth, error model, body limits | [docs/user/server.md](docs/user/operations/server.md) | -| CLI quick-start | [docs/user/cli.md](docs/user/cli/index.md) | -| CLI command surface and config schemas (`~/.omnigraph/config.yaml`, legacy `omnigraph.yaml`) | [docs/user/cli-reference.md](docs/user/cli/reference.md) | -| Audit / actor tracking | [docs/user/audit.md](docs/user/operations/audit.md) | -| Error taxonomy and result serialization | [docs/user/errors.md](docs/user/operations/errors.md) | +| `optimize` (compaction) and `cleanup` (version GC) | [docs/user/operations/maintenance.md](docs/user/operations/maintenance.md) | +| Cluster operator guide (deploy/manage clusters, approvals, recovery, serving) | [docs/user/clusters/index.md](docs/user/clusters/index.md) | +| Cedar policy actions, scopes, CLI | [docs/user/operations/policy.md](docs/user/operations/policy.md) | +| HTTP server endpoints, auth, error model, body limits | [docs/user/operations/server.md](docs/user/operations/server.md) | +| CLI quick-start | [docs/user/cli/index.md](docs/user/cli/index.md) | +| CLI command surface and config schemas (`~/.omnigraph/config.yaml`, legacy `omnigraph.yaml`) | [docs/user/cli/reference.md](docs/user/cli/reference.md) | +| Audit / actor tracking | [docs/user/operations/audit.md](docs/user/operations/audit.md) | +| Error taxonomy and result serialization | [docs/user/operations/errors.md](docs/user/operations/errors.md) | | Install (binary / Homebrew / source / channels) | [docs/user/install.md](docs/user/install.md) | | Deployment (binary / container / RustFS bootstrap / auth / build variants) | [docs/user/deployment.md](docs/user/deployment.md) | | CI / release workflows | [docs/dev/ci.md](docs/dev/ci.md) | | Code ownership (CODEOWNERS source of truth, roles, regeneration) | [docs/dev/codeowners.md](docs/dev/codeowners.md) | | Branch protection policy (declarative, applied via `scripts/apply-branch-protection.sh`) | [docs/dev/branch-protection.md](docs/dev/branch-protection.md) | -| Constants & tunables cheat sheet | [docs/user/constants.md](docs/user/reference/constants.md) | +| Constants & tunables cheat sheet | [docs/user/reference/constants.md](docs/user/reference/constants.md) | | Per-version release notes | [docs/releases/](docs/releases/) | --- @@ -257,7 +263,7 @@ omnigraph policy explain --actor act-alice --action change --branch main | Per-query atomic writes | — | In-memory `MutationStaging.pending` accumulator + `stage_*` / `commit_staged` per touched table at end-of-query + publisher CAS via `commit_with_expected` (single manifest commit per `mutate_as` / `load`); D₂ parse-time rule keeps inserts/updates and deletes from mixing | | Three-way row-level merge | — | `OrderedTableCursor` + `StagedTableWriter`, structured `MergeConflictKind` | | Change feeds | — | `diff_between` / `diff_commits` with manifest fast path + ID streaming | -| Cedar policy | — | Per-graph actions plus server-scoped actions (see [docs/user/policy.md](docs/user/operations/policy.md) for the current list), branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as` — the deprecated `ingest_as` shims route through it — `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. | +| Cedar policy | — | Per-graph actions plus server-scoped actions (see [docs/user/operations/policy.md](docs/user/operations/policy.md) for the current list), branch / target_branch / protected scopes, validate/test/explain CLI. **Engine-wide enforcement** (MR-722): every `_as` writer (`apply_schema_as`, `mutate_as`, `load_as` — the deprecated `ingest_as` shims route through it — `branch_create_as` / `branch_create_from_as`, `branch_delete_as`, `branch_merge_as`) calls `Omnigraph::enforce(action, scope, actor)` — HTTP, CLI, embedded SDK all hit the same gate. | | HTTP server | — | Axum, OpenAPI via utoipa, bearer auth (SHA-256, AWS Secrets Manager option), `authorize_request` at the HTTP boundary (resolves bearer→actor, applies admission control), NDJSON streaming export, **multi-graph mode (v0.6.0+) with cluster routes + read-only `GET /graphs` enumeration + per-graph + server-level Cedar policies. Multi-graph boots from a cluster directory (`--cluster`) or the legacy `omnigraph.yaml`; add/remove graphs via `cluster apply` (or by editing the legacy file) and restarting.** | | CLI with config | — | two-surface config (team `cluster.yaml` dir + per-operator `~/.omnigraph/config.yaml`; legacy `omnigraph.yaml` deprecated per RFC-008), aliases, multi-format output (json/jsonl/csv/kv/table) | | Audit / actor tracking | — | `_as` write APIs + actor map in commit graph | @@ -282,7 +288,7 @@ Rules: 7. **Re-verify before recommending.** If you cite a flag, env var, endpoint, or constant to the user or in code, grep for it in source first. Memory and docs go stale; the code is authoritative. 8. **Keep AGENTS.md short.** This file is always loaded into agent context, so every added line has a recurring context-window cost. Prefer pointers and terse invariants here; put detail in `docs/`. 9. **Keep AGENTS.md a map, not an encyclopedia.** New deep content goes into `docs/`. Add an entry to "Where to find each topic" instead of pasting prose into this file. The "Always-on rules" section is the exception — it's for invariants that should always be in scope. -10. **Re-read on schema/query/IR changes.** Edits to `schema.pest`, `query.pest`, `ir/lower.rs`, `query/typecheck.rs`, or `query/lint.rs` should trigger a re-read of [docs/user/schema-language.md](docs/user/schema/index.md), [docs/user/query-language.md](docs/user/queries/index.md), and [docs/dev/execution.md](docs/dev/execution.md) to confirm they still describe reality. +10. **Re-read on schema/query/IR changes.** Edits to `schema.pest`, `query.pest`, `ir/lower.rs`, `query/typecheck.rs`, or `query/lint.rs` should trigger a re-read of [docs/user/schema/index.md](docs/user/schema/index.md), [docs/user/queries/index.md](docs/user/queries/index.md), and [docs/dev/execution.md](docs/dev/execution.md) to confirm they still describe reality. 11. **Always make smaller commits.** Each commit does one thing, compiles, and passes tests; mechanical refactors land separately from the behavior changes they enable. 12. **Test-first for bug fixes.** When fixing an identified bug, write a regression test that reproduces the failure first. Confirm it fails against the current code with the predicted symptom (not an unrelated error). Then land the fix in a separate commit and confirm the test turns green. The test commit lands just before the fix commit so the red → green pair is visible in `git log` and a reviewer can check out the test commit alone and reproduce the failure. 13. **Correct by design over symptomatic patches.** When a bug surfaces, identify the root cause and make the fix correct by construction. Don't patch the symptom. If the design admits the bug class, the fix is to close the class, not to add a guard around the latest instance. A symptomatic patch is acceptable only as a stop-gap, with an explicit note in the commit message and a follow-up issue tracking the design fix. diff --git a/docs/user/branching/index.md b/docs/user/branching/index.md index 17d17b2b..a0f1a6e2 100644 --- a/docs/user/branching/index.md +++ b/docs/user/branching/index.md @@ -43,11 +43,9 @@ Notes: ## L2 — Snapshots & time travel -- `snapshot()` — current snapshot for the bound branch; cached. -- `snapshot_of(target)` — snapshot at a `ReadTarget` (branch | snapshot id). -- `snapshot_at_version(v: u64)` — historical snapshot from any manifest version. -- `entity_at(table_key, id, version)` — single-entity time travel without building a full snapshot. -- A `Snapshot` is a `(version, HashMap)` — cheap to build, snapshot-isolated cross-table reads. +Reading a branch at a past version, or a single entity at a past version, is +covered on the [time travel](time-travel.md) page. Merging branches and the +conflict kinds are on the [merge](merge.md) page. ## L2 — Internal system branches diff --git a/docs/user/branching/merge.md b/docs/user/branching/merge.md new file mode 100644 index 00000000..fde2fabe --- /dev/null +++ b/docs/user/branching/merge.md @@ -0,0 +1,47 @@ +# Merging Branches + +Merging integrates the changes on one branch into another. OmniGraph merges are +**three-way and row-level**: it compares both branches against their common +ancestor and merges each node/edge table row by row, then publishes the result as +**one atomic commit** across the whole graph. + +```bash +omnigraph branch merge review/2026-04-25 --into main s3://bucket/graph.omni +``` + +`branch merge [--into ]` merges `` into `` +(default `main`). + +## Outcomes + +A merge resolves to one of three outcomes: + +- **Already up to date** — the target already contains every change on the source; + nothing to do. +- **Fast-forward** — the target has no changes the source lacks, so the target + simply advances to the source. +- **Merged** — both sides diverged; a new merge commit is created with two parents. + +## Conflicts + +When both branches changed the same data incompatibly, the merge fails with a +structured list of conflicts (the HTTP server returns `409` with a +`merge_conflicts[]` array). No partial result is published — the merge is +all-or-nothing. The conflict kinds are: + +| Kind | Meaning | +|---|---| +| `DivergentInsert` | The same id was inserted on both branches. | +| `DivergentUpdate` | The same row was updated differently on both branches. | +| `DeleteVsUpdate` | One side deleted a row the other side updated. | +| `OrphanEdge` | An edge references a node the other side deleted. | +| `UniqueViolation` | The merged result would violate a unique constraint. | +| `CardinalityViolation` | The merged result would violate an edge cardinality constraint. | +| `ValueConstraintViolation` | The merged result would violate a value constraint (enum/range). | + +Each conflict carries the table, the row id (when applicable), the kind, and a +message. Resolve conflicts by reconciling the two branches — typically by making +the conflicting change on one side and re-merging. + +See [branches & commits](index.md) for the branch and commit-DAG model, and +[changes](changes.md) for diffing two branches before you merge. diff --git a/docs/user/branching/time-travel.md b/docs/user/branching/time-travel.md new file mode 100644 index 00000000..e6bd52d5 --- /dev/null +++ b/docs/user/branching/time-travel.md @@ -0,0 +1,31 @@ +# Snapshots & Time Travel + +Every read in OmniGraph happens against a **snapshot** — a consistent, cross-table +view of the graph at one manifest version. A query holds one snapshot for its whole +lifetime, so it never sees a partial write from a concurrent commit (see +[transactions](transactions.md)). + +## Reading the past + +- **Current head** — by default a read targets the current head of the bound branch. +- **By snapshot id** — read a branch or a specific snapshot id (`--snapshot` on + `omnigraph read`). +- **By version** — reconstruct a historical snapshot from any past manifest version. +- **Single entity** — look up one entity at a past version without building a full + snapshot (cheaper when you only need one node or edge). + +Snapshots are cheap to build: a snapshot is just the set of visible sub-table +versions at a manifest version, so cross-table reads stay snapshot-isolated. + +## CLI + +```bash +# Read a query against a past snapshot +omnigraph read --query ./q.gq --name find --snapshot s3://bucket/graph.omni +``` + +Time travel composes with branches: every branch has its own version history, and +you can read any branch at any of its past versions. Commits and the commit DAG +that these versions correspond to are described in +[branches & commits](index.md); diffing two versions is on the +[changes](changes.md) page. diff --git a/docs/user/concepts/index.md b/docs/user/concepts/index.md new file mode 100644 index 00000000..8bc3d7ec --- /dev/null +++ b/docs/user/concepts/index.md @@ -0,0 +1,49 @@ +# Concepts + +OmniGraph is a typed property-graph engine built as a coordination layer over the +[Lance](https://lance.org) columnar storage format. It gives you a schema-checked +graph with vector, full-text, and graph queries in one runtime, plus Git-style +branches and commits across the whole graph. + +## The data model + +- A graph has **node types** and **edge types**, declared in a + [schema](../schema/index.md). +- Each node type and each edge type is stored as its **own Lance dataset** — + columnar, versioned, on local disk or object storage. +- A single `__manifest` table coordinates all of those datasets, so the graph has + one coherent version even though it spans many datasets. + +This split is what lets a graph commit be **atomic across every type at once**: a +publish flips every relevant dataset's version together in one manifest write, so +readers never see a half-applied change. See [storage](storage.md) for the layout. + +## Two layers: inherited vs. added + +Throughout the docs, capabilities are framed as **L1** (inherited from Lance) or +**L2** (added by OmniGraph): + +| | L1 — from Lance | L2 — added by OmniGraph | +|---|---|---| +| Storage | Columnar Arrow datasets on object storage | Per-type datasets coordinated as one graph | +| Versioning | Per-dataset versions + time travel | [Snapshots](../branching/time-travel.md) across all types at once | +| Branches | Per-dataset branches | [Graph-level branches](../branching/index.md), atomic across types | +| Commits | Per-dataset commits | [Commit DAG](../branching/index.md) for the whole graph; three-way [merge](../branching/merge.md) | +| Indexes | Scalar / vector / full-text indexes | Built per relevant column; graph topology index for traversal | +| Search | Vector + full-text primitives | [`nearest` / `bm25` / `rrf`](../search/index.md) in one query, plus graph traversal | +| Querying | — | The [`.gq` query language](../queries/index.md) and [`.pg` schema language](../schema/index.md) | + +## How the pieces fit + +- The **schema** (`.pg`) and **query** (`.gq`) languages are compiled to a typed + intermediate representation. +- The **engine** runs queries and mutations against Lance, coordinates the manifest, + maintains the commit graph, and builds indexes. +- The **CLI** ([`omnigraph`](../cli/index.md)) and the + **HTTP server** ([`operations/server.md`](../operations/server.md)) are two front + ends over the same engine, so embedded and remote behavior match. +- [Cedar policy](../operations/policy.md) enforcement is engine-wide — every writer + goes through the same authorization gate regardless of front end. + +For deployment-scale topics — multi-graph servers, control-plane operations, +recovery — see [clusters](../clusters/index.md). diff --git a/docs/user/index.md b/docs/user/index.md index c47b79ba..cabd98a0 100644 --- a/docs/user/index.md +++ b/docs/user/index.md @@ -12,6 +12,8 @@ start with install, then follow the section that matches your task. | Goal | Read | |---|---| | Install OmniGraph | [install.md](install.md) | +| Run the core loop end to end | [quickstart.md](quickstart.md) | +| Understand the model | [concepts/index.md](concepts/index.md) | | Run the CLI | [cli/index.md](cli/index.md) | | Look up every CLI flag and config field | [cli/reference.md](cli/reference.md) | @@ -21,8 +23,9 @@ start with install, then follow the section that matches your task. |---|---| | Write schemas (the `.pg` language) | [schema/index.md](schema/index.md) | | Read schema-lint diagnostic codes | [schema/lint.md](schema/lint.md) | -| Write queries and mutations (the `.gq` language) | [queries/index.md](queries/index.md) | -| Use vector / full-text / hybrid search | [search/indexes.md](search/indexes.md) | +| Write queries (the `.gq` language) | [queries/index.md](queries/index.md) | +| Write data — inserts, updates, deletes | [mutations/index.md](mutations/index.md) | +| Use vector / full-text / hybrid search | [search/index.md](search/index.md) | | Generate embeddings | [search/embeddings.md](search/embeddings.md) | | Build and use indexes | [search/indexes.md](search/indexes.md) | @@ -30,7 +33,9 @@ start with install, then follow the section that matches your task. | Goal | Read | |---|---| -| Work with branches, commits, and snapshots | [branching/index.md](branching/index.md) | +| Work with branches and commits | [branching/index.md](branching/index.md) | +| Read past versions (time travel) | [branching/time-travel.md](branching/time-travel.md) | +| Merge branches and resolve conflicts | [branching/merge.md](branching/merge.md) | | Coordinate multi-query workflows | [branching/transactions.md](branching/transactions.md) | | Read diffs and change feeds | [branching/changes.md](branching/changes.md) | @@ -56,6 +61,7 @@ start with install, then follow the section that matches your task. | Goal | Read | |---|---| +| Understand the model and L1/L2 framing | [concepts/index.md](concepts/index.md) | | Understand graph layout and URI support | [concepts/storage.md](concepts/storage.md) | | Look up constants and tunables | [reference/constants.md](reference/constants.md) | diff --git a/docs/user/mutations/index.md b/docs/user/mutations/index.md new file mode 100644 index 00000000..2602ae59 --- /dev/null +++ b/docs/user/mutations/index.md @@ -0,0 +1,52 @@ +# Mutations + +Write statements live inside a `query` declaration whose body is one or more +mutation statements (the [query language](../queries/index.md) covers the read +shape and shared declaration syntax). + +``` +query onboard($name: String, $title: String) { + insert Person { name: $name, title: $title } +} +``` + +An edge type is inserted the same way — its endpoint columns are just +properties in the assignment block (`insert WorksAt { person: $p, org: $o }`). + +## Statements + +- `insert { prop: , … }` +- `update set { prop: , … } where ` +- `delete where ` + +`` is a literal, `$param`, or `now()`. + +## Atomicity + +A change query publishes **one commit** at the end of the query. Multiple +insert/update statements accumulate in memory and commit together — a mid-query +failure leaves the graph untouched. See [transactions](../branching/transactions.md) +for the per-query atomicity contract and [branches](../branching/index.md) for +multi-query workflows. + +## Inserts/updates and deletes cannot mix in one query + +A single change query must be **either insert/update-only or delete-only**. +Mixing the two is rejected at parse time, before any I/O: + +> `mutation '' on the same query mixes inserts/updates and deletes; split +> into separate mutations: (1) inserts and updates, then (2) deletes.` + +Run two separate queries instead — the inserts/updates first, then the deletes. +The restriction exists because inserts/updates and deletes commit through +different paths today, and mixing them in one query creates ordering hazards +(e.g. a same-row insert-then-delete, or a cascading delete of a just-inserted +edge). Keeping the two kinds in separate queries keeps each one atomic and +correct. + +## Bulk loading + +For loading data from files rather than inline statements, use +[`omnigraph load`](../cli/index.md) (`--mode overwrite|append|merge`) — it is the +single bulk-write command and applies the same schema validation and atomic +publish as inline mutations. diff --git a/docs/user/operations/audit.md b/docs/user/operations/audit.md index 845c2e06..7e8b24de 100644 --- a/docs/user/operations/audit.md +++ b/docs/user/operations/audit.md @@ -1,7 +1,46 @@ -# Audit / Actor tracking +# Audit & Actor Tracking -- `Omnigraph::audit_actor_id: Option` is the actor in effect. -- `_as` variants of every write API let callers override the actor: `mutate_as`, `load_as`, `branch_merge_as`, `apply_schema_as`, etc. -- Actor IDs are persisted on `GraphCommit.actor_id` with split storage in `_graph_commit_actors.lance` (the commit graph is split into `_graph_commits.lance` for the linkage and `_graph_commit_actors.lance` for the actor map). -- HTTP server uses the bearer-token actor automatically. The CLI resolves one actor chain everywhere: `--as` > legacy `cli.actor` in `omnigraph.yaml` > `operator.actor` in `~/.omnigraph/config.yaml` > none (RFC-007). -- Pre-v0.4.0 graphs also stored actor IDs on `RunRecord.actor_id` in `_graph_runs.lance` / `_graph_run_actors.lance`. The Run state machine was removed in MR-771; those files are inert post-v0.4.0. The v2→v3 manifest migration sweeps any stale `__run__*` branches on first write-open (MR-770); the inert dataset bytes remain until a `delete_prefix` primitive lands. +Every write in OmniGraph records **who made it**. The actor id is persisted on the +graph commit, so the commit history is an audit trail of which actor changed the +graph and when. + +## Where the actor comes from + +The actor is resolved differently depending on the front end, but it always lands +on the commit: + +- **HTTP server** — the actor is resolved **server-side from the bearer token**. A + client cannot set its own actor id; it is derived from the authenticated token. + See [policy](policy.md) for how tokens map to actors. +- **CLI / embedded** — the actor is self-declared through one resolution chain: + + 1. `--as ` on the command, + 2. then `operator.actor` in `~/.omnigraph/config.yaml` (see the + [CLI reference](../cli/reference.md)), + 3. otherwise none. + +This difference is intentional: storage credentials imply a self-declared actor, +while a server resolves the actor from a token it trusts. + +## Reading the audit trail + +Actor ids are stored on each commit in the [commit graph](../branching/index.md). +List commits to see who made each change: + +```bash +omnigraph commit list graph.omni +``` + +System-initiated writes use reserved actor ids — for example, automatic recovery +of an interrupted write records `omnigraph:recovery`, so operator changes and +machine repairs are distinguishable in the history: + +```bash +omnigraph commit list --filter actor=omnigraph:recovery graph.omni +``` + +## What is tracked + +Every successful publish — load, change, branch merge, and schema apply — appends a +commit carrying the resolving actor. Because publishes are atomic, the actor on a +commit is exactly the actor responsible for that whole change. diff --git a/docs/user/queries/index.md b/docs/user/queries/index.md index 0942d50b..c00d1a9f 100644 --- a/docs/user/queries/index.md +++ b/docs/user/queries/index.md @@ -13,8 +13,11 @@ query ($p1: T1, $p2: T2?, …) Two body shapes: -- **Read**: `match { … } return { … } [order { … }] [limit N]` -- **Mutation**: one or more of `insert | update | delete` statements +- **Read**: `match { … } return { … } [order { … }] [limit N]` — covered on this page. +- **Mutation**: one or more of `insert | update | delete` statements — see [mutations](../mutations/index.md). + +Multi-modal search functions (`nearest`, `bm25`, `rrf`, …) used inside `match`, +`return`, and `order` are documented on the [search](../search/index.md) page. Param types reuse all schema scalars; trailing `?` makes a param optional. The compiler reserves `$__nanograph_now` for `now()`. @@ -25,21 +28,6 @@ Param types reuse all schema scalars; trailing `?` makes a param optional. The c - **Filter**: ` ` with operators `>=`, `<=`, `!=`, `>`, `<`, `=`, and string `contains`. - **Negation**: `not { clause+ }` — desugars to anti-join over the inner pipeline. -## Search clauses (multi-modal) - -Used inside MATCH or as expressions inside RETURN/ORDER: - -| Function | Purpose | Underlying Lance facility | -|---|---|---| -| `nearest($x.vec, $q)` | k-NN vector search (cosine) | Lance vector index (IVF / HNSW) | -| `search(field, q)` | Generic FTS | Inverted index | -| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | Inverted index | -| `match_text(field, q)` | Pattern match | Inverted index | -| `bm25(field, q)` | BM25 scoring | Inverted index | -| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default k=60) | OmniGraph fuses scored rankings | - -`nearest()` requires a `LIMIT`; the compiler resolves the query vector via the param map (or via the runtime embedding client when bound to a text input). - ## RETURN clause `return { [as ], … }` with expressions: @@ -48,7 +36,7 @@ Used inside MATCH or as expressions inside RETURN/ORDER: - Literals: string, int, float, bool, list - `now()` - Aggregates: `count`, `sum`, `avg`, `min`, `max` -- All search functions above (so you can return a score column) +- [Search functions](../search/index.md) (so you can return a score column) - `AliasRef` — re-use a previous projection alias ## ORDER & LIMIT @@ -58,21 +46,8 @@ Used inside MATCH or as expressions inside RETURN/ORDER: - **Total, deterministic order.** Rows with equal user-sort keys are broken by the bound entities' key columns (`.id`, ascending) appended as a final tie-break, so the result is a *total* order — reproducible across runs, and `order … limit N` returns a deterministic top-N even when ties straddle the cutoff. (Aggregate results have no entity-key columns; their group rows are already distinct on the projected group keys.) - **NULL placement** is *nulls-first ascending, nulls-last descending* (i.e. `nulls_first = !descending`): a NULL sorts as if smaller than any value. -## Mutation statements - -- `insert { prop: , … }` -- `update set { prop: , … } where ` -- `delete where ` - -`` is a literal, `$param`, or `now()`. Multi-statement mutations execute atomically (added in v0.2.0). - -### D₂ — mixed insert/update + delete is rejected at parse time - -A single mutation query must be **either insert/update-only or delete-only**. Mixed → rejected before any I/O with the message: - -> `mutation '' on the same query mixes inserts/updates and deletes; split into separate mutations: (1) inserts and updates, then (2) deletes. This restriction lifts when Lance exposes a two-phase delete API (tracked: MR-793 / Lance-upstream).` - -Reason: under the staged-write rewire (MR-794), inserts and updates accumulate in memory and commit at end-of-query, while deletes still inline-commit (Lance v6.0.1 has no public two-phase delete). Mixing creates ordering hazards (same-row insert→delete becomes a no-op because the staged insert isn't visible to delete; cascading deletes of just-inserted edges break referential integrity by silent design). Until the MR-A Lance v7 bump migrates `delete_where` to staged (`DeleteBuilder::execute_uncommitted` first ships in `v7.0.0-beta.10`), the parse-time rejection keeps both paths atomic and correct. See [docs/dev/writes.md](../../dev/writes.md), [docs/dev/lance.md](../../dev/lance.md), and [docs/dev/invariants.md](../../dev/invariants.md). +Write statements (`insert` / `update` / `delete`) are documented on the +[mutations](../mutations/index.md) page. ## IR (Intermediate Representation) diff --git a/docs/user/quickstart.md b/docs/user/quickstart.md new file mode 100644 index 00000000..b39ff1b2 --- /dev/null +++ b/docs/user/quickstart.md @@ -0,0 +1,81 @@ +# Quickstart + +This walks the core loop end to end: define a schema, initialize a graph, load +data, query it, and use a branch. It uses a local file-backed graph; swap the +path for an `s3://…` URI to run the same flow against object storage. + +[Install](install.md) the `omnigraph` CLI first. + +## 1. Write a schema + +A schema (`.pg`) declares your node and edge types. Save this as `schema.pg`: + +``` +node Person { + name: String, + title: String?, +} +``` + +See the [schema language](schema/index.md) for types, constraints, and edges. + +## 2. Initialize the graph + +```bash +omnigraph init --schema schema.pg graph.omni +``` + +`init` creates an empty graph at the given URI with your schema applied. + +## 3. Load data + +`load` is the single bulk-write command. `--mode` is required +(`overwrite | append | merge`): + +```bash +omnigraph load --data people.jsonl --mode overwrite graph.omni +``` + +`people.jsonl` is newline-delimited JSON, one record per line. For finer-grained +or inline writes, see [mutations](mutations/index.md). + +## 4. Query + +Write a query (`.gq`) — save as `queries.gq`: + +```gq +query find_people($title: String) { + match { $p: Person { title: $title } } + return { $p.name } +} +``` + +Run it: + +```bash +omnigraph read --query queries.gq --name find_people \ + --params '{"title":"Engineer"}' --format table graph.omni +``` + +The [query language](queries/index.md) covers `match`/`return`/`order`, and +[search](search/index.md) covers vector and full-text search. + +## 5. Work on a branch + +Branches isolate changes until you merge them — Git-style, across the whole graph: + +```bash +omnigraph branch create review/new-hires graph.omni +omnigraph load --data new-hires.jsonl --mode append --branch review/new-hires graph.omni +# inspect the branch, then integrate it +omnigraph branch merge review/new-hires --into main graph.omni +``` + +See [branches & commits](branching/index.md) and [merging](branching/merge.md). + +## Next steps + +- [CLI reference](cli/reference.md) — every command and flag. +- [Schema language](schema/index.md) and [query language](queries/index.md). +- [Operating a cluster](clusters/index.md) and [running the server](operations/server.md) + for multi-graph, multi-user deployments. diff --git a/docs/user/search/index.md b/docs/user/search/index.md new file mode 100644 index 00000000..280e9e86 --- /dev/null +++ b/docs/user/search/index.md @@ -0,0 +1,48 @@ +# Search + +OmniGraph runs vector, full-text, and hybrid search in the same runtime as graph +traversal — a single [query](../queries/index.md) can combine a vector `nearest`, +a `bm25` text score, and an `Expand` traversal. Search functions are used inside +`match` (to filter), or as expressions inside `return` / `order` (to score and +rank). + +## Functions + +| Function | Purpose | Backing index | +|---|---|---| +| `nearest($x.vec, $q)` | k-NN vector search (cosine) | vector index (IVF / HNSW) | +| `search(field, q)` | Generic full-text search | inverted (FTS) index | +| `fuzzy(field, q [, max_edits])` | Levenshtein-tolerant text search | inverted index | +| `match_text(field, q)` | Pattern match | inverted index | +| `bm25(field, q)` | BM25 relevance scoring | inverted index | +| `rrf(rank_a, rank_b [, k])` | Reciprocal Rank Fusion of two rankings (default `k=60`) | fuses scored rankings | + +- `nearest()` requires a `limit`. The query vector is resolved from the param map, + or embedded from a text input at runtime via the configured + [embedding client](embeddings.md). +- Scores and ranks propagate as ordinary columns, so you can `return` a score and + `order` by it. + +## Hybrid ranking with `rrf` + +Reciprocal Rank Fusion combines two independent rankings (typically one vector and +one text) into a single fused ranking, without needing the two score scales to be +comparable. Rank each retrieval separately, then fuse: + +```gq +query hybrid($q: String) { + match { $d: Document { } } + return { + $d, + rrf( nearest($d.embedding, $q), bm25($d.body, $q) ) as score + } + order { score desc } + limit 10 +} +``` + +## Indexes and embeddings + +Search functions only work when the backing index exists — see +[indexes](indexes.md) for building vector and inverted indexes, and +[embeddings](embeddings.md) for generating the vectors `nearest` searches over.