diff --git a/.gitignore b/.gitignore index 557b3a08..3b9c7997 100644 --- a/.gitignore +++ b/.gitignore @@ -24,3 +24,4 @@ workflows/docs-audit/artifacts/** !workflows/docs-audit/artifacts/ workflows/docs-audit/state/** !workflows/docs-audit/state/ +scratch \ No newline at end of file diff --git a/docs/docs.json b/docs/docs.json index 2e9f8eb4..1beaca6e 100644 --- a/docs/docs.json +++ b/docs/docs.json @@ -264,6 +264,7 @@ }, { "group": "Data Platforms & Frameworks", + "expanded": true, "pages": [ "integrations/data/pydantic", "integrations/data/duckdb", @@ -275,8 +276,10 @@ }, { "group": "AI Platforms & Frameworks", + "expanded": true, "pages": [ "integrations/ai/agno", + "integrations/ai/hermes-agent", "integrations/ai/huggingface", "integrations/ai/langchain", "integrations/ai/llamaIndex", diff --git a/docs/integrations/ai/hermes-agent.mdx b/docs/integrations/ai/hermes-agent.mdx new file mode 100644 index 00000000..3e6b4c0c --- /dev/null +++ b/docs/integrations/ai/hermes-agent.mdx @@ -0,0 +1,327 @@ +--- +title: "Hermes Agent" +sidebarTitle: "Hermes Agent" +description: "Use LanceDB as a persistent, semantic memory backend for Hermes Agent. Get durable recall across sessions with vector and hybrid search." +--- + +[Hermes Agent](https://github.com/NousResearch/hermes-agent) is a self-hosted, open-source +personal agent from [Nous Research](https://nousresearch.com). You can talk to it from a +terminal UI or reach the same agent from Telegram, Discord, and Slack, and it exposes a +dedicated slot for external *memory providers* that run alongside its built-in notes. + +The [LanceDB memory plugin](https://github.com/lancedb/hermes-agent-memory) fills that slot. +It gives Hermes durable, semantic recall across sessions: state a preference or a project +convention once, and the agent can retrieve it weeks later in a brand-new session — even when +you ask for it in completely different words. Everything runs inside Hermes' own Python +process, storing a single LanceDB table on local disk. There's no memory server to operate. + + +**The mental model is clean** + +- Hermes owns the agent loop +- LanceDB manages the durable long-term memory and offers semantic recall. + + +## Why LanceDB fits agent memory + +Out of the box, Hermes remembers with a small curated notes file frozen into the system +prompt, plus lexical (keyword) search over past sessions. Both are useful, but keyword search +misses paraphrases of what you originally typed — the exact thing you need when recalling a +fact you phrased differently months ago. + +LanceDB is an embedded retrieval library, which makes it a natural fit here: + +- **No server to stand up** — it reads and writes a table on local disk, so the plugin ships + as a dependency rather than a service to operate. +- **One table holds everything** — content, metadata, and embeddings live together. A memory + becomes a structured row with a category, tags, timestamps, and provenance, not just a text + blob. +- **Query it any way you need** — vector similarity for meaning, BM25 full-text for exact + names and jargon, a hybrid of the two, or plain metadata filters to keep recall scoped to + the right workspace. +- **It scales up** — the same table abstraction carries over to larger LanceDB deployments + later, so the local setup is never a dead end. + +## Install and activate + + +Want to try this without touching your existing Hermes setup? Run everything in an isolated +profile: `hermes profile create demo`, then add `-p demo` to the commands below. When you're +done, `rm -rf ~/.hermes/profiles/demo` removes all trace. + + + + +Skip this if you already have Hermes installed. + +```bash +curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash +``` + + + +This shallow-clones the plugin into `~/.hermes/plugins/lancedb/`. + +```bash +hermes plugins install lancedb/hermes-agent-memory +``` + + + +Hermes loads plugins inside its own Python interpreter, so the dependencies go *there* — not +into a separate virtualenv. (This interpreter is shared across profiles, so you only install +once.) + +```bash +uv pip install --python ~/.hermes/hermes-agent/venv/bin/python3 lancedb openai pyyaml +``` + + + +The plugin turns conversations into embeddings, so it needs an embeddings key. By default that +is OpenAI, so set `OPENAI_API_KEY` in your environment or in `~/.hermes/.env`. + + +Prefer a local or non-OpenAI model? The plugin uses an OpenAI-compatible client, so you can +point it at any compatible endpoint (OpenRouter, Ollama, vLLM, …) in your config — no code +change needed. See [Configuration](#configuration) below. + + + + +Switch memory on and pick this plugin: + +```bash +hermes memory setup # choose "lancedb" +``` + +Then confirm it's actually active before you start chatting — this is the one step worth not +skipping, because Hermes quietly falls back to its built-in notes if the provider isn't set: + +```bash +hermes memory status +``` + +```text +Memory status +──────────────────────────────────────── + Built-in: always active + Provider: lancedb + + Plugin: installed ✓ + Status: available ✓ +``` + +You want to see `Provider: lancedb` with both `installed ✓` and `available ✓`. + + + +## The memory tools + +Once activated, the agent has four tools for working with long-term memory: + +| Tool | What it does | +|:--|:--| +| `lancedb_recall` | Semantic (vector, the default) or hybrid search over your workspace memory. Returns matching facts with scores and provenance. | +| `lancedb_remember` | Stores a durable fact when you explicitly ask. Deduplicated by content hash, so remembering the same thing twice doesn't pile up rows. | +| `lancedb_read` | Fetches a single memory by ID, optionally with the original conversation messages it was distilled from. | +| `lancedb_forget` | Deletes safely: previews candidates first, then deletes by exact ID, so nothing disappears by accident. | + +Beyond these tools, the plugin also captures durable facts from your conversations +automatically — an auxiliary model distills them before context is compressed and again when a +session ends, so insights survive even when the raw messages are summarized away. + +## Walkthrough + +"_Teach it your project preferences_" + +Let's make this concrete with the pain we opened on: re-explaining your setup to the agent every session. +We'll save a convention once and then prove a brand-new session can recall it. This example will touch all four +tools along the way. + +### Remember + +Ask Hermes to commit a convention to long-term memory. Saying "remember in long-term memory" +makes sure it lands in the LanceDB store, which shows up as the `⚡ lancedb_r` (`lancedb_remember`) +line below: + +```text +● Remember in long-term memory: for this project I only use uv, never pip, and I always add type hints to Python functions. + + ┊ 🧠 memory +memory: "For this project, the user only uses uv for Python package management, never pip, and always adds type hints to Python functions." + ┊ ⚡ lancedb_r 0.0s + ─ ⚕ Hermes ──────────────────────────────────────────────────────────────── + + Remembered. I've stored that project convention: use uv only, never pip, and always add type hints to Python functions. +``` + +### Recall + +First, take Hermes' built-in notes out of the picture so recall can *only* come from LanceDB — +the two layers run side by side otherwise, and either could answer: + +```bash +hermes config set memory.memory_enabled false +``` + +Now start a fresh session (`/new`) and ask for the convention back, worded completely +differently from how you saved it: + +```text +● /new +● Look through your long-term memory for my Python project preferences. + + ─ ⚕ Hermes ──────────────────────────────────────────────────────────────── + + I found one durable memory about Python project preferences: + + - Project Python tooling preference is uv; pip is not used; type hints are required. +``` + +This is the whole point in one exchange. You asked about your *preferences* using none of the +words you stored the fact with — no *uv*, no *pip*, no *type hints* — yet a fresh session, with +none of the earlier conversation in context, pulls it straight back. Semantic search matches on +meaning rather than exact wording, which is precisely where keyword search falls short. + + +Turn the built-in layer back on for everyday use with `hermes config set memory.memory_enabled true` +— the two memory layers happily coexist. We only disabled it here to prove the recall came from +LanceDB. + + +### Read + +You can also ask where a fact came from. Hermes attributes the answer to its stored memory +rather than guessing from a file in the repo (under the hood, `lancedb_read` can also return +the exact source messages a fact was distilled from): + +```text +● Where did that uv preference come from? + + ─ ⚕ Hermes ──────────────────────────────────────────────────────────────── + + It came from the project's persistent memory, not from a codebase file I inspected. + + Specifically, the recalled memory says: + - "For this project, the user only uses uv for Python package management, never pip, and always adds type hints to Python functions." +``` + +### Forget + +When a preference changes, ask Hermes to drop the old fact. The tool calls tell the whole +story: the two `⚡ lancedb_f` (`lancedb_forget`) lines are it previewing matches and then +deleting, and the trailing `⚡ lancedb_r` is it saving the replacement in the same breath: + +```text +● I've switched from uv to pixi. Forget the uv preference. + + ┊ ⚡ lancedb_f uv preference python package management never pip type hints 0.3s + ┊ ⚡ lancedb_f 0.0s + ┊ ⚡ lancedb_r 0.2s + ─ ⚕ Hermes ──────────────────────────────────────────────────────────────── + + Done — I removed the uv preference from memory and replaced it with: + + - This project uses pixi for Python package management + - Type hints on Python functions are still preferred +``` + +Remember, recall, read, forget: four small operations that between them cover the entire +lifecycle of a durable memory. + +## Retrieval modes + +Recall ships in `vector` mode by default — pure semantic search, which is what survives the +paraphrasing you saw above. If you also need exact name or jargon matching, switch to `hybrid` +(vector + BM25) and choose how the two legs are fused: RRF, a vector-biased linear blend, or a +cross-encoder reranker. Mode is set per call; fusion is a config setting. + +```yaml +# ~/.hermes/config.yaml +plugins: + lancedb: + retrieval: + mode: hybrid # vector (default) | hybrid + reranker: + type: rrf # how the vector + BM25 legs are fused + # Swap RRF for a reranking pass (pulls in sentence-transformers + torch): + # type: cross-encoder + # model: cross-encoder/ettin-reranker-17m-v1 + # rerank_top_n: 50 +``` + +The cross-encoder is the one path that pulls in a local ML stack, so it stays opt-in. It +defaults to the compact 17M-parameter [ettin reranker](https://huggingface.co/cross-encoder/ettin-reranker-17m-v1). + +## Inspect the store + +Everything lives in one table named `memories` at `~/.hermes/lancedb/memories.lance`. Because +it's a plain LanceDB table, you can open it directly and see exactly what the agent has stored +— a `kind` column separates extracted `fact` rows from the raw `turn` rows they were drawn +from: + +```python +import lancedb + +db = lancedb.connect("~/.hermes/lancedb") +tbl = db.open_table("memories") +print(tbl.to_pandas()[["kind", "category", "content"]].head()) +``` + +## Configuration + +The plugin runs on sensible defaults once activated — you don't have to configure anything. +`~/.hermes/config.yaml` is purely for overrides. Two common ones: + +Use a cheaper model for the auxiliary fact-extraction calls: + +```yaml +# ~/.hermes/config.yaml +auxiliary: + lancedb_extraction: + provider: openrouter + model: google/gemini-3-flash +``` + +Point embeddings at a fully local endpoint (for example, Ollama) so nothing leaves your +machine: + +```yaml +# ~/.hermes/config.yaml +plugins: + lancedb: + embedding: + model: nomic-embed-text + base_url: http://localhost:11434/v1 + api_key_env: OLLAMA_API_KEY # any value works for local Ollama +``` + + +Changing the embedding model (or its dimension) against an existing store requires recreating +the table — the plugin fails loudly on a dimension mismatch rather than silently returning +nothing. Every option is documented in the plugin's [`default_config.yaml`](https://github.com/lancedb/hermes-agent-memory/blob/main/src/default_config.yaml). + + +## Benchmark + +On [LongMemEval-S](https://huggingface.co/datasets/xiaowu0162/longmemeval-cleaned), a +long-conversation QA benchmark, LanceDB's semantic recall clearly beat Hermes' built-in lexical +search (0.66 vs. 0.53 answer accuracy) by finding the right messages even when the question was +worded differently from the original conversation. For the full methodology, the +per-question-type breakdown, and a reproducible harness, see the +[blog post](https://www.lancedb.com/blog/semantic-memory-for-hermes-agent-with-lancedb) and the +[benchmark harness](https://github.com/lancedb/hermes-agent-memory/tree/main/benchmarks). + +## Why this works well + +- **It's local-first and embedded.** The LanceDB memory table lives on your disk with no server to run; + the plugin installs as a dependency of Hermes' own environment. +- **Recall survives paraphrasing.** Semantic search matches meaning, not spelling, which is the + failure mode that sinks keyword-only session search. +- **Memories are structured and traceable.** Each fact is a row with metadata and a link back + to the messages it came from, and `forget` always previews before it deletes. +- **Nothing about it is a dead end.** As your needs grow, the same table abstraction carries + over to LanceDB [Enterprise](/enterprise) for automatic compaction, reindexing, and scale. + +To try it, install the plugin, enable it with `hermes memory setup`, and run the kind of +workflow we walked through above. diff --git a/docs/training/torch.mdx b/docs/training/torch.mdx index 9a9dbe5b..c04d2760 100644 --- a/docs/training/torch.mdx +++ b/docs/training/torch.mdx @@ -17,13 +17,14 @@ The `Table` class in LanceDB implements a contract for a PyTorch import lancedb import torch import pyarrow as pa +from lancedb.util import tbl_to_tensor mem_db = lancedb.connect("memory://") table = mem_db.create_table("test_table", pa.table({"a": range(1000)})) # Any LanceDB table can be used as a PyTorch Dataset dataloader = torch.utils.data.DataLoader( - table, batch_size=1024, shuffle=True + table, batch_size=1024, shuffle=True, collate_fn=tbl_to_tensor ) for batch in dataloader: @@ -42,12 +43,17 @@ dataloader = torch.utils.data.DataLoader(permutation) ## Output Formats -By default, a `Table` data loader will emit a `pyarrow.RecordBatch`. To convert to a different format (such as a -`pytorch.Tensor`), you will need to provide a custom collate function. +By default, a `Table` data loader will emit Arrow data. `collate_fn` is PyTorch's batching hook: PyTorch calls it to +turn the fetched items into one batch. PyTorch's default collate function only knows how to combine tensors, NumPy +arrays, numbers, dicts, and lists, so it does not accept Arrow data directly. When using a `Table` directly, pass +LanceDB's `lancedb.util.tbl_to_tensor` helper as PyTorch's `collate_fn`; it converts numeric Arrow columns into a +column-major `torch.Tensor` with shape `(columns, rows)`. -The `Permutation` class is more flexible. By default, the output will be a list of dicts. This is the default output -format of standard data loaders and usually more convenient when you are getting started. However, there is a -significant performance penalty converting from Arrow, Lance's internal representation, to this default format. +`Permutation` works differently: its default output is a list of Python dicts, which PyTorch's default collate function +can batch into a dict of tensors. This is usually more convenient when you are getting started. However, there is a +significant performance penalty converting from Arrow, Lance's internal representation, to this default format. Use a +direct `Table` with `collate_fn` when you want Arrow-to-tensor conversion, or a `Permutation` when you want the default +PyTorch dict-of-tensors behavior. To address this, the `Permutation` class provides a set of builtin transform functions that can be applied to map the Arrow data in different ways. The `arrow` and `polars` formats will always avoid data copies. However, `numpy`, @@ -96,3 +102,84 @@ dataloader = torch.utils.data.DataLoader( for batch in dataloader: print(batch.schema) ``` + +## Using multiple DataLoader workers + +Set `num_workers > 0` to read from LanceDB in multiple PyTorch worker processes. LanceDB tables and `Permutation` objects are picklable, so each worker reopens the table after it starts. + +Prefer the `spawn` start method when using multiple workers; LanceDB uses internal threads. See [the performance guide](/performance) for more multiprocessing guidance. + +```py Python icon=Python +import torch +from lancedb.permutation import Permutation + +permutation = Permutation.identity(table) +dataloader = torch.utils.data.DataLoader( + permutation, + batch_size=1024, + shuffle=True, + num_workers=4, + multiprocessing_context="spawn", + persistent_workers=True, +) +``` + +### Remote tables in DataLoader workers + +Remote LanceDB Enterprise tables (`db://...`) work the same way: workers reopen the table from the pickled connection state. + +```py Python icon=Python +import lancedb +import torch +from lancedb.util import tbl_to_tensor + +db = lancedb.connect( + "db://my-database", + api_key="sk-...", + region="us-east-1", +) +table = db.open_table("my_table") + +dataloader = torch.utils.data.DataLoader( + table, + batch_size=512, + num_workers=4, + multiprocessing_context="spawn", + collate_fn=tbl_to_tensor, +) +``` + + +This sends the connection state, including the API key, to each worker. Use a connection factory if credentials should be loaded inside the worker or your `client_config` contains a non-serializable `header_provider`. + + +### Providing a custom connection factory + +`Permutation.with_connection_factory` lets each worker reopen the base table with custom logic. The factory takes the table name, returns a LanceDB table, and must be picklable. + +```py Python icon=Python +import os +import lancedb +import torch +from lancedb.permutation import Permutation + +def open_table(name: str): + db = lancedb.connect( + "db://my-database", + api_key=os.environ["LANCEDB_API_KEY"], + region="us-east-1", + ) + return db.open_table(name) + +table = open_table("my_table") +permutation = ( + Permutation.identity(table) + .with_connection_factory(open_table) +) +dataloader = torch.utils.data.DataLoader( + permutation, + batch_size=512, + num_workers=4, + multiprocessing_context="spawn", +) +```