Skip to content

fix(hermes): unblock Hermes Agent inference end-to-end (NS-324)#1

Draft
shannonsands wants to merge 3 commits into
feat/nemohermes-reference-blueprintfrom
ns324/hermes-inference-fixes
Draft

fix(hermes): unblock Hermes Agent inference end-to-end (NS-324)#1
shannonsands wants to merge 3 commits into
feat/nemohermes-reference-blueprintfrom
ns324/hermes-inference-fixes

Conversation

@shannonsands
Copy link
Copy Markdown

Summary

Three surgical fixes that make the Hermes Agent integration in NemoClaw actually serve a chat completion. Pre-fix it was returning HTTP 403 connection not allowed by policy on every request — first time this has worked end-to-end since the v0.11.0 bump.

Linear: NS-324

Commits (read in order)

1. managed_inference network policy

agents/hermes/policy-additions.yaml had no entry for inference.local — the OpenShell gateway's virtual provider router. So every Hermes-to-provider call was rejected at the L7 proxy before it could reach the upstream provider. Added a managed_inference block mirroring the OpenClaw baseline but with Hermes-side binaries (/usr/local/bin/hermes, /usr/bin/python3{,.11}, curl).

2. Wire missing Dockerfile build args

NEMOCLAW_PROVIDER_KEY was already declared as ARG/ENV. NEMOCLAW_PRIMARY_MODEL_REF and NEMOCLAW_INFERENCE_API were set by the onboard pipeline (src/lib/onboard.ts → resolveSandboxBuildArgs) but never declared in the Hermes Dockerfile, so the runtime image saw blank values regardless of what onboard configured.

3. Map NEMOCLAW_PROVIDER_KEY to Hermes provider with credential placeholder

agents/hermes/generate-config.ts previously hardcoded provider: "custom" regardless of the chosen provider. Works for OpenAI-compatible flows because Hermes's custom path speaks OpenAI Chat Completions — broken for Anthropic, because OpenShell's anthropic-prod provider only accepts Anthropic Messages format (POST /v1/messages).

The fix has two parts:

a. Provider mapping

function mapProvider(providerKey: string): string {
  switch (providerKey) {
    case "anthropic": return "anthropic";
    case "openai":    return "openai";
    case "inference":           // gemini/nvidia/custom-OpenAI
    case "custom":              // legacy/default
    default:          return "custom";
  }
}

b. Credential placeholder

Hermes's provider adapters (e.g. anthropic_adapter.py) read the per-provider env var (e.g. ANTHROPIC_API_KEY) before sending and short-circuit with "credentials missing" if empty — even when the actual outbound call goes through OpenShell's L7 proxy with a substituted credential. Fix: emit ANTHROPIC_API_KEY=openshell:resolve:env:ANTHROPIC_API_KEY placeholder in .env. The proxy rewrites the real header at egress (verified empirically — any non-empty x-api-key works, the proxy overrides it). Same pattern as messaging tokens.

4 new test cases in test/hermes-generate-config.test.ts cover anthropic / openai / inference / legacy-custom mappings (5 tests pass).

Verification

End-to-end verified against Anthropic — POST http://127.0.0.1:8642/v1/chat/completions to a freshly-onboarded sandbox returns real Claude Opus 4.7 responses through inference.local.

{"id": "chatcmpl-...", "object": "chat.completion", "model": "claude-opus-4-7",
 "choices": [{"message": {"role": "assistant", "content": "pong"}}],
 "usage": {"prompt_tokens": 14842, "completion_tokens": 6}}

Out of scope (deliberately)

  • Nous Portal as a NemoClaw provider — separate ticket NS-322. That's the fast-follow PR. Will add the nous provider option in src/lib/onboard-providers.ts, the nous mapping in mapProvider(), the OAuth handoff, and the corresponding policy hosts (portal.nousresearch.com, inference-api.nousresearch.com, and the 5× <vendor>-gateway.nousresearch.com for the Tool Gateway).

  • socat workaround removal — Hermes v2026.4.23 supports API_SERVER_HOST=0.0.0.0 env override (the comment in start.sh saying "Hermes binds to 127.0.0.1 regardless of config (upstream bug)" is stale; the env-var override has worked since at least v2026.4.8). Removing it cleanly requires auto-generating an API_SERVER_KEY (Hermes refuses 0.0.0.0 binding without one), wiring gateway-token for Hermes (currently OpenClaw-only — reads openclaw.json), and coordinating dashboard-port detection between nemoclaw and the start script (currently PUBLIC_PORT=8642 is hardcoded). Filing a follow-up ticket.

Other observations (filing follow-ups, not in scope here)

  1. nemoclaw rebuild doesn't detect Dockerfile.base content driftagent-onboard.ts skips rebuilding the base image if it exists locally. When the HA pin in Dockerfile.base was bumped to v2026.4.23, users with cached older bases stayed on the old version even after rebuild. Workaround: docker rmi … && nemoclaw rebuild. CI rebuilds and pushes :latest on Dockerfile.base changes so production users get it via docker pull automatically; only local devs hit this.

  2. Port forward dies during rebuildopenshell forward list shows the gateway port dead after every rebuild. Manual forward stop + forward start recovers. Probably a known race in nemoclaw's lifecycle.

  3. _skill_commands.clear() monkey-patch in agents/hermes/plugin/__init__.py — depends on hermes-agent#17670 landing for a public API replacement. Will reland once that PR is merged.

… routing

All Hermes inference traffic goes through https://inference.local — the
OpenShell gateway's virtual provider router that terminates TLS, swaps
in the host-side credential for the selected provider, and forwards to
the real backend.

The existing policy-additions.yaml had no entry for inference.local,
so every chat completion was rejected with HTTP 403 'connection not
allowed by policy' before reaching OpenShell at all. Direct verification
against api.anthropic.com / inference-api.nvidia.com worked for those
specific hosts, but the inference.local short-circuit (which is the
actual runtime path for every Hermes-to-provider call under NemoClaw)
was unreachable.

Mirrors the equivalent block in nemoclaw-blueprint/policies/openclaw-sandbox.yaml
but with Hermes-side binaries (/usr/local/bin/hermes, /usr/bin/python3,
/usr/bin/python3.11, plus curl for diagnostics).

Verified via end-to-end Anthropic chat completion: real Claude response
through inference.local after this entry is in place.
…PI build args

The onboard pipeline (src/lib/onboard.ts -> resolveSandboxBuildArgs) sets
these on every sandbox build:

  NEMOCLAW_PROVIDER_KEY=anthropic         (or openai/gemini/inference/...)
  NEMOCLAW_PRIMARY_MODEL_REF=<provider>/<model>
  NEMOCLAW_INFERENCE_API=<api flavour>    (anthropic-messages, openai-completions, ...)

NEMOCLAW_PROVIDER_KEY was already declared as ARG and ENV. The other two
were missing, so the runtime image saw blank values regardless of what
the onboard wizard configured. NEMOCLAW_INFERENCE_API in particular is
how generate-config.ts decides whether the OpenShell proxy should
expect Anthropic Messages format vs OpenAI Chat Completions; without
it, every non-default sandbox spoke the wrong protocol shape upstream.

This commit only declares the ARGs and promotes them to ENV. The
generate-config.ts changes that consume them are in the next commit.
…tial placeholder

generate-config.ts previously hardcoded `provider: "custom"` in the
generated Hermes config.yaml regardless of which provider the user
selected during onboard. This works for OpenAI-compatible providers
because Hermes's "custom" path speaks OpenAI Chat Completions, which
matches what OpenShell's openai-prod / nvidia-prod / gemini-api
adapters accept. It is wrong for Anthropic — the OpenShell
anthropic-prod provider only accepts Anthropic Messages format
(POST /v1/messages), which Hermes only emits when its config has
`provider: anthropic`.

Concretely, an Anthropic-onboard sandbox produced HTTP 400/403 errors
from the OpenShell proxy on every chat completion attempt. The proxy
saw an OpenAI-shaped POST /chat/completions request hitting
inference.local and rejected it because "no compatible inference route
available". This was the second blocker after the missing
managed_inference policy.

Changes:

1. Add a mapProvider() function that turns NEMOCLAW_PROVIDER_KEY (set by
   src/lib/onboard-providers.ts -> getSandboxInferenceConfig) into the
   correct Hermes-side provider value:

     anthropic              -> provider: anthropic
     openai                 -> provider: openai
     inference (gemini/nvidia/custom-OpenAI) -> provider: custom
     custom (legacy/default)                  -> provider: custom

2. Add PROVIDER_API_KEY_ENV to emit a per-provider credential placeholder
   line in the generated .env. Hermes's provider adapters read the env
   var (e.g. anthropic_adapter.py reads ANTHROPIC_API_KEY) before
   sending the request and short-circuit with an in-process
   "credentials missing" error if the env var is empty — even when the
   actual outbound call goes through OpenShell's L7 proxy with a
   substituted credential.

   We satisfy the in-process check by writing
   ANTHROPIC_API_KEY=openshell:resolve:env:ANTHROPIC_API_KEY (etc).
   The proxy rewrites the real header at egress, identical to the
   pattern already used for messaging tokens. Verified empirically:
   any non-empty x-api-key header is overridden by the proxy before
   leaving the cluster, so the placeholder string itself never reaches
   the upstream provider.

   Custom / inference providers do not get an emitted line because
   they do not have a fixed credential env var name in Hermes; that
   path expects either no key (proxy-handled) or a per-user override
   passed through onboard.

3. Add four parameterised test cases covering anthropic / openai /
   inference / legacy-custom mappings. Each verifies both the YAML
   provider value and the .env credential placeholder (or its
   absence). The existing Telegram DM test still asserts
   provider: "custom" for the no-NEMOCLAW_PROVIDER_KEY default, so
   onboard flows that don't set the var keep their current
   behaviour.

Verified end-to-end: a freshly onboarded Anthropic sandbox now
returns real Claude responses through
POST http://127.0.0.1:8642/v1/chat/completions.
@shannonsands shannonsands changed the base branch from main to feat/nemohermes-reference-blueprint April 30, 2026 04:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant