Add agent memory search for OpenClaw and Hermes Agent using llama.cpp by merveenoyan · Pull Request #2438 · huggingface/hub-docs

merveenoyan · 2026-04-27T11:55:32Z

Note

Low Risk
Documentation-only changes that add configuration guidance for local embedding/memory search; no runtime or security-sensitive code is modified.

Overview
Adds a new Local Memory Search section to docs/hub/agents-local.md showing how to run an embedding model locally (via node-llama-cpp) and configure OpenClaw (agents.defaults.memorySearch.*) to use it.

Extends the Hermes Agent instructions with a local semantic search configuration snippet (auxiliary.session_search) and a hermes memory status check to verify local memory integration.

^{Reviewed by Cursor Bugbot for commit 083b470. Bugbot is set up for automated code reviews on this repo. Configure here.}

HuggingFaceDocBuilderDev · 2026-04-27T11:57:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

merveenoyan · 2026-05-12T12:33:00Z

@pcuenca @burtenshaw @gary149 can you take a look?

pcuenca · 2026-05-13T09:08:18Z

+### Local Memory Search for OpenClaw

-[Hermes](https://hermes-agent.nousresearch.com/) works locally with llama.cpp. Define a default config as:
+You can run local embedding models with Llama.cpp for your agent's memory search. To do so, make sure to have node-llama-cpp.


I don't use OpenClaw. Does this have to be the node version?

not sure I understand your question :/

why not just use llama.cpp, instead of node-llama-cpp? Is this an OpenClaw requirement?

if you install this way you can use below one-liner to get llama server up and using it for memory instead of setting it up yourself

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Updated instructions for running local embedding models with Llama.cpp and added npm installation command.

pcuenca

Conceptually looks good, but it could be helpful if other Hermes / OC users could take a look. It'd be helpful to describe what this brings over the built-in memory search implementations; as far as I know harnesses have ways to look stuff up in memory without going through this process.

pcuenca · 2026-06-04T17:09:17Z

+npm i node-llama-cpp 
+```
+
+Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with below command.


Suggested change

Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with below command.

Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with the command below.

pcuenca · 2026-06-04T17:10:52Z

+    max_concurrency: 1
+```
+
+Check if this works, `none - built-in only` shows that no other memory plug-ins are used. Below output shows that local serving is active.


Suggested change

Check if this works, `none - built-in only` shows that no other memory plug-ins are used. Below output shows that local serving is active.

Check if this works, `none - built-in only` shows that no other memory plug-ins are used. The output below shows that local serving is active.

pcuenca · 2026-06-04T17:15:47Z


+### Local Memory Search for Hermes Agent
+
+Hermes Agent consumes semantic search models through endpoints. Once you get your preferred embedding model up on endpoint 8080 with llama.cpp or the inference engine of your choice, add the following to `~/.hermes/config.yaml`.


How does memory search work if you don't do anything? Because it's able to look up things in memory by default, AFAIU. Here's my output from the command you mention below. I haven't done anything about semantic search, but the first lines match what you said:

Memory status ──────────────────────────────────────── Built-in: always active Provider: (none — built-in only) Installed plugins: • byterover (requires API key) • hindsight (API key / local) • holographic (local) • honcho (API key / local) • mem0 (API key / local) • openviking (API key / local) • retaindb (API key / local) • supermemory (requires API key)

add docs

76e11dd

pcuenca reviewed May 13, 2026

View reviewed changes

merveenoyan and others added 3 commits May 13, 2026 12:03

Apply suggestions from code review

b7b7378

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

Update agents-local.md

a66972f

Enhance local memory search documentation for OpenClaw

083b470

Updated instructions for running local embedding models with Llama.cpp and added npm installation command.

merveenoyan requested a review from pcuenca June 4, 2026 14:59

pcuenca approved these changes Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add agent memory search for OpenClaw and Hermes Agent using llama.cpp#2438

Add agent memory search for OpenClaw and Hermes Agent using llama.cpp#2438
merveenoyan wants to merge 4 commits into
mainfrom
add-mem-llama

merveenoyan commented Apr 27, 2026 •

edited by cursor Bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 27, 2026

Uh oh!

merveenoyan commented May 12, 2026

Uh oh!

pcuenca May 13, 2026

Uh oh!

merveenoyan May 13, 2026

Uh oh!

pcuenca May 13, 2026

Uh oh!

merveenoyan May 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcuenca left a comment

Uh oh!

pcuenca Jun 4, 2026

Uh oh!

pcuenca Jun 4, 2026

Uh oh!

pcuenca Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with below command.
	Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with the command below.

	Check if this works, `none - built-in only` shows that no other memory plug-ins are used. Below output shows that local serving is active.
	Check if this works, `none - built-in only` shows that no other memory plug-ins are used. The output below shows that local serving is active.


		### Local Memory Search for Hermes Agent

		Hermes Agent consumes semantic search models through endpoints. Once you get your preferred embedding model up on endpoint 8080 with llama.cpp or the inference engine of your choice, add the following to `~/.hermes/config.yaml`.

Conversation

merveenoyan commented Apr 27, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 27, 2026

Uh oh!

merveenoyan commented May 12, 2026

Uh oh!

pcuenca May 13, 2026

Choose a reason for hiding this comment

Uh oh!

merveenoyan May 13, 2026

Choose a reason for hiding this comment

Uh oh!

pcuenca May 13, 2026

Choose a reason for hiding this comment

Uh oh!

merveenoyan May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

pcuenca Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

pcuenca Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

pcuenca Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

merveenoyan commented Apr 27, 2026 •

edited by cursor Bot

Loading