Hermes On Psionic

This is the user-facing guide for running Hermes against Psionic.

In plain language:

Psionic can now act as a real backend for Hermes on one consumer GPU.
Hermes can talk to Psionic through the normal OpenAI-compatible chat.completions path.
Hermes can answer normally, call tools, read tool results, keep going across multiple turns, stream tool calls, and do a same-turn two-tool batch when required.
Psionic is the backend here. Hermes is still a separate checkout. Psionic does not bundle a full Hermes product or CLI.

If you want the shortest honest answer to "can I command Hermes to do shit through Psionic?":

yes, if you already have a Hermes checkout
you point Hermes at a running psionic-openai-server
you use provider="custom" and api_mode="chat_completions"
you set the model name to the GGUF basename Psionic is serving

What Works Today

The current retained consumer-GPU proof is:

host: archlinux
GPU: RTX 4080
model family: local qwen3.5
canonical direct compatibility result: 6/6

That means the current native Psionic Hermes lane can do all of the following:

required tool call
plain-text no-tool answer
multi-turn tool loop with tool-result replay
same-turn two-tool assistant response
truthful refusal after an invalid tool result
streamed tool-call turn

For the strict same-turn two-tool case, the acceptance bar is now stronger than "the model called both tools." The retained proof requires Hermes to:

emit both tool calls in the same assistant turn
receive both tool results back through Psionic
produce a final grounded answer that actually uses those results

The currently proven practical user path is the tool-backed lane. In live validation on 2026-03-29:

the repo-owned compatibility checker reran green at 6/6
the same-turn two-tool row only passed because the final answer grounded on the tool results as Paris is sunny at 18C. Tokyo is rainy at 12C.
live Hermes tool-backed conversations against Psionic worked
one ad hoc no-tool 9b text-summary run hit a local fallback 400 (unsupported JSON schema feature for local fallback: object schemas with more than 5 properties are not supported by the local fallback)

So if you want the reliable path today, use Hermes against Psionic for the tool-backed chat.completions lane first, not as a claim that every open-ended Hermes formatting path is already polished.

Canonical retained proof:

docs/HERMES_QWEN35_COMPATIBILITY.md
docs/HERMES_QWEN35_PARALLEL_ATTRIBUTION.md

Fastest Proof Path

If you just want to prove your local setup works end to end, run the repo-owned checker.

From the Psionic repo root:

PSIONIC_HERMES_ROOT=/abs/path/to/hermes \
PSIONIC_HERMES_PYTHON=/abs/path/to/hermes/.venv/bin/python \
PSIONIC_HERMES_QWEN35_MODEL_PATH=/abs/path/to/qwen3.5-2b-q8_0-registry.gguf \
scripts/release/check-psionic-hermes-qwen35-compatibility.sh

That script will:

build or reuse psionic-openai-server
start a local Psionic OpenAI-compatible server
point Hermes at it through OPENAI_BASE_URL
run the retained compatibility matrix
write a JSON receipt under fixtures/qwen35/hermes/

If that passes, your local Hermes-on-Psionic lane is working.

How You Actually Use It

1. Run Psionic as the backend

Build the server:

cargo build -p psionic-serve --bin psionic-openai-server --release

Run it on a Linux NVIDIA host:

./target/release/psionic-openai-server \
  --backend cuda \
  --host 127.0.0.1 \
  --port 8095 \
  -m /abs/path/to/qwen3.5-2b-q8_0-registry.gguf

You should then have:

health check: http://127.0.0.1:8095/health
OpenAI-compatible base URL: http://127.0.0.1:8095/v1

2. Point Hermes at that server

The repo-owned compatibility probe uses Hermes like this:

from run_agent import AIAgent

agent = AIAgent(
    base_url="http://127.0.0.1:8095/v1",
    api_key="dummy",
    provider="custom",
    api_mode="chat_completions",
    model="qwen3.5-2b-q8_0-registry.gguf",
    enabled_toolsets=["<your_toolset_here>"],
    max_iterations=8,
    quiet_mode=True,
    skip_context_files=True,
    skip_memory=True,
)

The important Psionic-specific pieces are:

base_url
provider="custom"
api_mode="chat_completions"
api_key="dummy"
model matching the basename of the GGUF Psionic is serving

The exact toolset names come from your Hermes checkout, not from Psionic.

3. Send Hermes a real task

Once Hermes is pointed at Psionic, you use Hermes the same basic way you would use it against any other OpenAI-compatible backend:

ask a plain question and get a text answer
ask for a tool-backed task and let Hermes call tools
let Hermes loop across tool calls until it has enough information

Examples of the kinds of tasks the retained lane already proves:

"Use the weather tool for Paris."
"Use both weather tools now, then summarize both cities in one answer."
"Use the Atlantis weather tool and then explain truthfully what happened."

What This Is Not Yet

This is not yet:

a bundled Psionic-only Hermes product
a one-command generic assistant launcher shipped by Psionic
a claim that Psionic is faster than Ollama on every Hermes workload
a claim that llama.cpp is already a clean comparator for this exact qwen3.5 artifact lane

So the right mental model is:

Psionic is now a working Hermes backend
Hermes remains the agent/controller layer
the integration is real and rerunnable
the current remaining work is mostly benchmark, packaging, and output-polish work, not tool-loop correctness

Best Next Docs

If you want more than the quickstart:

direct compatibility proof: docs/HERMES_QWEN35_COMPATIBILITY.md
strict same-turn parallel proof: docs/HERMES_QWEN35_PARALLEL_ATTRIBUTION.md
same-host Psionic vs Ollama benchmark: docs/HERMES_BACKEND_BENCHMARK.md
serialized two-tool workflow: docs/HERMES_QWEN35_SERIALIZED_TWO_CITY.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hermes On Psionic

What Works Today

Fastest Proof Path

How You Actually Use It

1. Run Psionic as the backend

2. Point Hermes at that server

3. Send Hermes a real task

What This Is Not Yet

Best Next Docs

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Hermes On Psionic

What Works Today

Fastest Proof Path

How You Actually Use It

1. Run Psionic as the backend

2. Point Hermes at that server

3. Send Hermes a real task

What This Is Not Yet

Best Next Docs