Skip to content

Add agent memory search for OpenClaw and Hermes Agent using llama.cpp#2438

Open
merveenoyan wants to merge 4 commits into
mainfrom
add-mem-llama
Open

Add agent memory search for OpenClaw and Hermes Agent using llama.cpp#2438
merveenoyan wants to merge 4 commits into
mainfrom
add-mem-llama

Conversation

@merveenoyan

@merveenoyan merveenoyan commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Note

Low Risk
Documentation-only changes that add configuration guidance for local embedding/memory search; no runtime or security-sensitive code is modified.

Overview
Adds a new Local Memory Search section to docs/hub/agents-local.md showing how to run an embedding model locally (via node-llama-cpp) and configure OpenClaw (agents.defaults.memorySearch.*) to use it.

Extends the Hermes Agent instructions with a local semantic search configuration snippet (auxiliary.session_search) and a hermes memory status check to verify local memory integration.

Reviewed by Cursor Bugbot for commit 083b470. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@merveenoyan

Copy link
Copy Markdown
Contributor Author

@pcuenca @burtenshaw @gary149 can you take a look?

Comment thread docs/hub/agents-local.md Outdated
### Local Memory Search for OpenClaw

[Hermes](https://hermes-agent.nousresearch.com/) works locally with llama.cpp. Define a default config as:
You can run local embedding models with Llama.cpp for your agent's memory search. To do so, make sure to have node-llama-cpp.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use OpenClaw. Does this have to be the node version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand your question :/

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just use llama.cpp, instead of node-llama-cpp? Is this an OpenClaw requirement?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you install this way you can use below one-liner to get llama server up and using it for memory instead of setting it up yourself

Comment thread docs/hub/agents-local.md
Comment thread docs/hub/agents-local.md Outdated
Comment thread docs/hub/agents-local.md Outdated
Comment thread docs/hub/agents-local.md
Comment thread docs/hub/agents-local.md
merveenoyan and others added 3 commits May 13, 2026 12:03
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Updated instructions for running local embedding models with Llama.cpp and added npm installation command.
@merveenoyan merveenoyan requested a review from pcuenca June 4, 2026 14:59

@pcuenca pcuenca left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually looks good, but it could be helpful if other Hermes / OC users could take a look. It'd be helpful to describe what this brings over the built-in memory search implementations; as far as I know harnesses have ways to look stuff up in memory without going through this process.

Comment thread docs/hub/agents-local.md
npm i node-llama-cpp
```

Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with below command.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with below command.
Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with the command below.

Comment thread docs/hub/agents-local.md
max_concurrency: 1
```

Check if this works, `none - built-in only` shows that no other memory plug-ins are used. Below output shows that local serving is active.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Check if this works, `none - built-in only` shows that no other memory plug-ins are used. Below output shows that local serving is active.
Check if this works, `none - built-in only` shows that no other memory plug-ins are used. The output below shows that local serving is active.

Comment thread docs/hub/agents-local.md

### Local Memory Search for Hermes Agent

Hermes Agent consumes semantic search models through endpoints. Once you get your preferred embedding model up on endpoint 8080 with llama.cpp or the inference engine of your choice, add the following to `~/.hermes/config.yaml`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does memory search work if you don't do anything? Because it's able to look up things in memory by default, AFAIU. Here's my output from the command you mention below. I haven't done anything about semantic search, but the first lines match what you said:

Memory status
────────────────────────────────────────
  Built-in:  always active
  Provider:  (none — built-in only)

  Installed plugins:
    • byterover  (requires API key)
    • hindsight  (API key / local)
    • holographic  (local)
    • honcho  (API key / local)
    • mem0  (API key / local)
    • openviking  (API key / local)
    • retaindb  (API key / local)
    • supermemory  (requires API key)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants