Add agent memory search for OpenClaw and Hermes Agent using llama.cpp#2438
Add agent memory search for OpenClaw and Hermes Agent using llama.cpp#2438merveenoyan wants to merge 4 commits into
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@pcuenca @burtenshaw @gary149 can you take a look? |
| ### Local Memory Search for OpenClaw | ||
|
|
||
| [Hermes](https://hermes-agent.nousresearch.com/) works locally with llama.cpp. Define a default config as: | ||
| You can run local embedding models with Llama.cpp for your agent's memory search. To do so, make sure to have node-llama-cpp. |
There was a problem hiding this comment.
I don't use OpenClaw. Does this have to be the node version?
There was a problem hiding this comment.
not sure I understand your question :/
There was a problem hiding this comment.
why not just use llama.cpp, instead of node-llama-cpp? Is this an OpenClaw requirement?
There was a problem hiding this comment.
if you install this way you can use below one-liner to get llama server up and using it for memory instead of setting it up yourself
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Updated instructions for running local embedding models with Llama.cpp and added npm installation command.
pcuenca
left a comment
There was a problem hiding this comment.
Conceptually looks good, but it could be helpful if other Hermes / OC users could take a look. It'd be helpful to describe what this brings over the built-in memory search implementations; as far as I know harnesses have ways to look stuff up in memory without going through this process.
| npm i node-llama-cpp | ||
| ``` | ||
|
|
||
| Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with below command. |
There was a problem hiding this comment.
| Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with below command. | |
| Here's an example snippet to run [quantized EmbeddingGemma-300M](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF?show_file_info=embeddinggemma-300M-Q8_0.gguf) locally for memory search. OpenClaw automatically downloads and serves the model with the command below. |
| max_concurrency: 1 | ||
| ``` | ||
|
|
||
| Check if this works, `none - built-in only` shows that no other memory plug-ins are used. Below output shows that local serving is active. |
There was a problem hiding this comment.
| Check if this works, `none - built-in only` shows that no other memory plug-ins are used. Below output shows that local serving is active. | |
| Check if this works, `none - built-in only` shows that no other memory plug-ins are used. The output below shows that local serving is active. |
|
|
||
| ### Local Memory Search for Hermes Agent | ||
|
|
||
| Hermes Agent consumes semantic search models through endpoints. Once you get your preferred embedding model up on endpoint 8080 with llama.cpp or the inference engine of your choice, add the following to `~/.hermes/config.yaml`. |
There was a problem hiding this comment.
How does memory search work if you don't do anything? Because it's able to look up things in memory by default, AFAIU. Here's my output from the command you mention below. I haven't done anything about semantic search, but the first lines match what you said:
Memory status
────────────────────────────────────────
Built-in: always active
Provider: (none — built-in only)
Installed plugins:
• byterover (requires API key)
• hindsight (API key / local)
• holographic (local)
• honcho (API key / local)
• mem0 (API key / local)
• openviking (API key / local)
• retaindb (API key / local)
• supermemory (requires API key)
Note
Low Risk
Documentation-only changes that add configuration guidance for local embedding/memory search; no runtime or security-sensitive code is modified.
Overview
Adds a new Local Memory Search section to
docs/hub/agents-local.mdshowing how to run an embedding model locally (vianode-llama-cpp) and configure OpenClaw (agents.defaults.memorySearch.*) to use it.Extends the Hermes Agent instructions with a local semantic search configuration snippet (
auxiliary.session_search) and ahermes memory statuscheck to verify local memory integration.Reviewed by Cursor Bugbot for commit 083b470. Bugbot is set up for automated code reviews on this repo. Configure here.