-
Notifications
You must be signed in to change notification settings - Fork 24
docs(user): add task guides — hybrid search, cluster on S3, review workflow (Phase 3b) #227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| # Run a Cluster on S3 | ||
|
|
||
| This guide takes a cluster from a local config directory to a server that boots | ||
| **config-free from an object-storage bucket** — the bucket is the whole | ||
| deployment artifact. For the full control-plane reference, see | ||
| [operating a cluster](../clusters/index.md) and | ||
| [cluster config](../clusters/config.md). | ||
|
|
||
| ## 1. Declare the cluster | ||
|
|
||
| Lay out a config directory. The one S3-specific line is `storage:` — it puts the | ||
| state ledger, catalog, and graph data on the bucket instead of in the folder: | ||
|
|
||
| ``` | ||
| company-brain/ | ||
| ├── cluster.yaml | ||
| ├── people.pg | ||
| ├── queries/ | ||
| │ └── people.gq | ||
| └── base.policy.yaml | ||
| ``` | ||
|
|
||
| ```yaml | ||
| # cluster.yaml | ||
| version: 1 | ||
| storage: s3://my-bucket/clusters/company-brain # the deployment lives here | ||
| metadata: | ||
| name: company-brain | ||
| graphs: | ||
| knowledge: | ||
| schema: people.pg | ||
| queries: queries/ | ||
| policies: | ||
| base: | ||
| file: base.policy.yaml | ||
| applies_to: [knowledge] | ||
| ``` | ||
|
|
||
| Set the S3 credentials in the environment (for a non-AWS S3-compatible store such | ||
| as MinIO or RustFS, also set `AWS_ENDPOINT_URL_S3`): | ||
|
|
||
| ```bash | ||
| export AWS_ACCESS_KEY_ID=... AWS_SECRET_ACCESS_KEY=... AWS_REGION=us-east-1 | ||
| # export AWS_ENDPOINT_URL_S3=https://... # non-AWS S3-compatible stores | ||
| ``` | ||
|
|
||
| ## 2. Validate, plan, apply | ||
|
|
||
| `apply` is the only command that changes the world; `plan` previews it: | ||
|
|
||
| ```bash | ||
| omnigraph cluster validate --config company-brain # parse + typecheck | ||
| omnigraph cluster import --config company-brain # create the state ledger | ||
| omnigraph cluster plan --config company-brain # preview the diff | ||
| omnigraph cluster apply --config company-brain # converge onto the bucket | ||
| ``` | ||
|
|
||
| `apply` creates the graph at the derived root | ||
| (`s3://my-bucket/clusters/company-brain/graphs/knowledge.omni`), applies its | ||
| schema, and publishes the query and policy into the content-addressed catalog. | ||
| `converged: true` means there is nothing left to do — re-running `apply` is always | ||
| safe. | ||
|
|
||
| ## 3. Load data | ||
|
|
||
| The control plane manages *definitions*; rows go through the normal data plane. | ||
| Address the graph by its storage URI (the derived `graphs/<id>.omni` root): | ||
|
|
||
| ```bash | ||
| omnigraph load --data seed.jsonl --mode overwrite \ | ||
| s3://my-bucket/clusters/company-brain/graphs/knowledge.omni | ||
| ``` | ||
|
|
||
| ## 4. Serve config-free from the bucket | ||
|
|
||
| A serving host needs only the storage-root URI and credentials — no checkout of | ||
| the config repo: | ||
|
|
||
| ```bash | ||
| OMNIGRAPH_SERVER_BEARER_TOKENS_JSON='{"act-reader":"s3cret"}' \ | ||
| omnigraph-server --cluster s3://my-bucket/clusters/company-brain --bind 0.0.0.0:8080 | ||
| ``` | ||
|
|
||
| The server boots from the **applied revision** recorded in the ledger — never from | ||
| config that was merely written. Roll out a change by `apply`-ing again, then | ||
| restarting replicas. | ||
|
|
||
| ## 5. Maintain it | ||
|
|
||
| Storage maintenance runs out-of-band, addressed by cluster + graph name (it | ||
| resolves the graph's storage URI from the served state): | ||
|
|
||
| ```bash | ||
| omnigraph optimize --cluster company-brain --cluster-graph knowledge | ||
| omnigraph cleanup --cluster company-brain --cluster-graph knowledge --keep 10 --confirm | ||
| ``` | ||
|
|
||
| See [maintenance](../operations/maintenance.md) for what each command does. |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,99 @@ | ||||||||||
| # Hybrid Search End to End | ||||||||||
|
|
||||||||||
| This guide builds a small document graph and runs a **hybrid** query that fuses | ||||||||||
| full-text (BM25) and vector (k-NN) rankings with Reciprocal Rank Fusion. You do | ||||||||||
| not build indexes by hand — the engine maintains them; a freshly loaded row is | ||||||||||
| searchable immediately. | ||||||||||
|
|
||||||||||
| See [search](../search/index.md) for the function reference and | ||||||||||
| [embeddings](../search/embeddings.md) for the full provider/env matrix. | ||||||||||
|
|
||||||||||
| ## 1. Schema | ||||||||||
|
|
||||||||||
| A document with a text body for full-text search and a vector for similarity. | ||||||||||
| `@embed("body")` tells the engine to embed the `body` text into `embedding` at | ||||||||||
| load time: | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| node Document { | ||||||||||
| title: String, | ||||||||||
| body: String, | ||||||||||
| embedding: Vector(768) @embed("body"), | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph init --schema schema.pg docs.omni | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## 2. Configure embeddings | ||||||||||
|
|
||||||||||
| Ingest-time embedding uses the engine's embedding client. Point it at your | ||||||||||
| provider (see [embeddings](../search/embeddings.md) for every variable): | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| export GEMINI_API_KEY=... # ingest-time document embeddings | ||||||||||
| # For local experimentation without a provider, deterministic mock vectors: | ||||||||||
| # export OMNIGRAPH_EMBEDDINGS_MOCK=1 NANOGRAPH_EMBEDDINGS_MOCK=1 | ||||||||||
| ``` | ||||||||||
|
Comment on lines
+33
to
+38
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The guide configures only |
||||||||||
|
|
||||||||||
| If you would rather supply vectors yourself, drop `@embed` and include the | ||||||||||
| `embedding` array in each input record instead. | ||||||||||
|
|
||||||||||
| ## 3. Load | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph load --data docs.jsonl --mode overwrite docs.omni | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Each row's `body` is embedded into `embedding` as it loads. The BM25 (full-text) | ||||||||||
| and vector indexes are maintained by the engine — there is no separate build step. | ||||||||||
|
|
||||||||||
| ## 4. Query — full-text, vector, then hybrid | ||||||||||
|
|
||||||||||
| Full-text only: | ||||||||||
|
|
||||||||||
| ```gq | ||||||||||
| query text_search($q: String) { | ||||||||||
| match { $d: Document { } } | ||||||||||
| return { $d.title, bm25($d.body, $q) as score } | ||||||||||
| order { score desc } | ||||||||||
| limit 10 | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Vector only (the query text is embedded at query time; `nearest` requires a | ||||||||||
| `limit`): | ||||||||||
|
|
||||||||||
| ```gq | ||||||||||
| query vector_search($q: String) { | ||||||||||
| match { $d: Document { } } | ||||||||||
| return { $d.title, nearest($d.embedding, $q) as score } | ||||||||||
| order { score desc } | ||||||||||
| limit 10 | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Hybrid — fuse both rankings with `rrf`: | ||||||||||
|
|
||||||||||
| ```gq | ||||||||||
| query hybrid($q: String) { | ||||||||||
| match { $d: Document { } } | ||||||||||
| return { | ||||||||||
| $d.title, | ||||||||||
| rrf( nearest($d.embedding, $q), bm25($d.body, $q) ) as score | ||||||||||
| } | ||||||||||
| order { score desc } | ||||||||||
| limit 10 | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Run it: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph read --query queries.gq --name hybrid \ | ||||||||||
| --params '{"q":"trends in AI safety"}' --format table docs.omni | ||||||||||
|
Comment on lines
+94
to
+95
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
||||||||||
| ``` | ||||||||||
|
|
||||||||||
| `rrf` combines the two rankings without needing their score scales to match, so | ||||||||||
| you get a single fused ordering from a lexical signal and a semantic one. | ||||||||||
|
Comment on lines
+98
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The CLI renamed Same issue appears in Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,14 @@ | ||
| # Guides | ||
|
|
||
| Task-oriented walkthroughs that compose the building blocks from the reference | ||
| docs into real workflows. Each one is a runnable sequence of commands. | ||
|
|
||
| - [Hybrid search end to end](hybrid-search.md) — combine full-text and vector | ||
| search in one query. | ||
| - [Run a cluster on S3](cluster-on-s3.md) — go from a config directory to a | ||
| config-free server booting from a bucket. | ||
| - [Branch-based review workflow](review-workflow.md) — stage data on a branch, | ||
| review it, and merge. | ||
|
|
||
| New to OmniGraph? Start with the [quickstart](../quickstart.md) and | ||
| [concepts](../concepts/index.md) first. |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,63 @@ | ||||||||||
| # Branch-Based Review Workflow | ||||||||||
|
|
||||||||||
| Branches let you stage changes off `main`, inspect them in isolation, and merge | ||||||||||
| only once they look right — Git-style, atomic across the whole graph. This guide | ||||||||||
| walks a typical "review an incoming batch before it hits main" flow. | ||||||||||
|
|
||||||||||
| See [branches & commits](../branching/index.md) and [merging](../branching/merge.md) | ||||||||||
| for the underlying model. | ||||||||||
|
|
||||||||||
| ## 1. Stage the batch on its own branch | ||||||||||
|
|
||||||||||
| Loading into a branch that does not exist is an error unless you pass `--from`, | ||||||||||
| which forks it from a base first. So one command both forks the branch and loads | ||||||||||
| into it: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph load --data batch.jsonl --mode merge \ | ||||||||||
| --branch review/2026-04-25 --from main graph.omni | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| (Equivalently, create the branch first with | ||||||||||
| `omnigraph branch create review/2026-04-25 --from main graph.omni`, then `load` | ||||||||||
| without `--from`.) | ||||||||||
|
|
||||||||||
| `main` is untouched — the batch lives only on `review/2026-04-25`. | ||||||||||
|
|
||||||||||
| ## 2. Inspect the branch in isolation | ||||||||||
|
|
||||||||||
| Run any read query against the branch with `--branch`: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph read --query checks.gq --name count_by_type \ | ||||||||||
| --branch review/2026-04-25 --format table graph.omni | ||||||||||
|
Comment on lines
+32
to
+33
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Compare it against `main` — list each branch's commits, or diff them: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph branch list graph.omni | ||||||||||
| omnigraph commit list --branch review/2026-04-25 graph.omni | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## 3. Merge when it looks right | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph branch merge review/2026-04-25 --into main graph.omni | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| The merge is three-way and atomic. If both `main` and the branch changed the same | ||||||||||
| data incompatibly, the merge fails with a structured list of conflicts and | ||||||||||
| publishes nothing — resolve them and re-merge. See | ||||||||||
| [merging](../branching/merge.md) for the conflict kinds. | ||||||||||
|
|
||||||||||
| ## 4. Clean up | ||||||||||
|
|
||||||||||
| Once merged, delete the review branch: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| omnigraph branch delete review/2026-04-25 graph.omni | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Branch storage is reclaimed; if a transient error interrupts reclamation, the | ||||||||||
| [`cleanup`](../operations/maintenance.md) command sweeps the leftovers later. | ||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMNIGRAPH_EMBEDDINGS_MOCKfor the engine (ingest-time@embed) andNANOGRAPH_EMBEDDINGS_MOCKfor the compiler (query-time text-to-vector fornearest()). Without an OpenAI-compatible key, any query that passes a string tonearest()will fail at runtime even ifloadsucceeded.