This is the user-facing guide for running Hermes against Psionic.
In plain language:
- Psionic can now act as a real backend for Hermes on one consumer GPU.
- Hermes can talk to Psionic through the normal OpenAI-compatible
chat.completionspath. - Hermes can answer normally, call tools, read tool results, keep going across multiple turns, stream tool calls, and do a same-turn two-tool batch when required.
- Psionic is the backend here. Hermes is still a separate checkout. Psionic does not bundle a full Hermes product or CLI.
If you want the shortest honest answer to "can I command Hermes to do shit through Psionic?":
- yes, if you already have a Hermes checkout
- you point Hermes at a running
psionic-openai-server - you use
provider="custom"andapi_mode="chat_completions" - you set the model name to the GGUF basename Psionic is serving
The current retained consumer-GPU proof is:
- host:
archlinux - GPU:
RTX 4080 - model family: local
qwen3.5 - canonical direct compatibility result:
6/6
That means the current native Psionic Hermes lane can do all of the following:
- required tool call
- plain-text no-tool answer
- multi-turn tool loop with tool-result replay
- same-turn two-tool assistant response
- truthful refusal after an invalid tool result
- streamed tool-call turn
For the strict same-turn two-tool case, the acceptance bar is now stronger than "the model called both tools." The retained proof requires Hermes to:
- emit both tool calls in the same assistant turn
- receive both tool results back through Psionic
- produce a final grounded answer that actually uses those results
The currently proven practical user path is the tool-backed lane. In live validation on 2026-03-29:
- the repo-owned compatibility checker reran green at
6/6 - the same-turn two-tool row only passed because the final answer grounded on
the tool results as
Paris is sunny at 18C. Tokyo is rainy at 12C. - live Hermes tool-backed conversations against Psionic worked
- one ad hoc no-tool
9btext-summary run hit a local fallback400(unsupported JSON schema feature for local fallback: object schemas with more than 5 properties are not supported by the local fallback)
So if you want the reliable path today, use Hermes against Psionic for the
tool-backed chat.completions lane first, not as a claim that every open-ended
Hermes formatting path is already polished.
Canonical retained proof:
docs/HERMES_QWEN35_COMPATIBILITY.mddocs/HERMES_QWEN35_PARALLEL_ATTRIBUTION.md
If you just want to prove your local setup works end to end, run the repo-owned checker.
From the Psionic repo root:
PSIONIC_HERMES_ROOT=/abs/path/to/hermes \
PSIONIC_HERMES_PYTHON=/abs/path/to/hermes/.venv/bin/python \
PSIONIC_HERMES_QWEN35_MODEL_PATH=/abs/path/to/qwen3.5-2b-q8_0-registry.gguf \
scripts/release/check-psionic-hermes-qwen35-compatibility.shThat script will:
- build or reuse
psionic-openai-server - start a local Psionic OpenAI-compatible server
- point Hermes at it through
OPENAI_BASE_URL - run the retained compatibility matrix
- write a JSON receipt under
fixtures/qwen35/hermes/
If that passes, your local Hermes-on-Psionic lane is working.
Build the server:
cargo build -p psionic-serve --bin psionic-openai-server --releaseRun it on a Linux NVIDIA host:
./target/release/psionic-openai-server \
--backend cuda \
--host 127.0.0.1 \
--port 8095 \
-m /abs/path/to/qwen3.5-2b-q8_0-registry.ggufYou should then have:
- health check:
http://127.0.0.1:8095/health - OpenAI-compatible base URL:
http://127.0.0.1:8095/v1
The repo-owned compatibility probe uses Hermes like this:
from run_agent import AIAgent
agent = AIAgent(
base_url="http://127.0.0.1:8095/v1",
api_key="dummy",
provider="custom",
api_mode="chat_completions",
model="qwen3.5-2b-q8_0-registry.gguf",
enabled_toolsets=["<your_toolset_here>"],
max_iterations=8,
quiet_mode=True,
skip_context_files=True,
skip_memory=True,
)The important Psionic-specific pieces are:
base_urlprovider="custom"api_mode="chat_completions"api_key="dummy"modelmatching the basename of the GGUF Psionic is serving
The exact toolset names come from your Hermes checkout, not from Psionic.
Once Hermes is pointed at Psionic, you use Hermes the same basic way you would use it against any other OpenAI-compatible backend:
- ask a plain question and get a text answer
- ask for a tool-backed task and let Hermes call tools
- let Hermes loop across tool calls until it has enough information
Examples of the kinds of tasks the retained lane already proves:
- "Use the weather tool for Paris."
- "Use both weather tools now, then summarize both cities in one answer."
- "Use the Atlantis weather tool and then explain truthfully what happened."
This is not yet:
- a bundled Psionic-only Hermes product
- a one-command generic assistant launcher shipped by Psionic
- a claim that Psionic is faster than Ollama on every Hermes workload
- a claim that
llama.cppis already a clean comparator for this exactqwen3.5artifact lane
So the right mental model is:
- Psionic is now a working Hermes backend
- Hermes remains the agent/controller layer
- the integration is real and rerunnable
- the current remaining work is mostly benchmark, packaging, and output-polish work, not tool-loop correctness
If you want more than the quickstart:
- direct compatibility proof:
docs/HERMES_QWEN35_COMPATIBILITY.md - strict same-turn parallel proof:
docs/HERMES_QWEN35_PARALLEL_ATTRIBUTION.md - same-host Psionic vs Ollama benchmark:
docs/HERMES_BACKEND_BENCHMARK.md - serialized two-tool workflow:
docs/HERMES_QWEN35_SERIALIZED_TWO_CITY.md