Skip to content

feat(proxy): external backend auth + graceful /props fallback#104

Open
tcsenpai wants to merge 1 commit into
antoinezambelli:mainfrom
tcsenpai:feat/external-backend-auth
Open

feat(proxy): external backend auth + graceful /props fallback#104
tcsenpai wants to merge 1 commit into
antoinezambelli:mainfrom
tcsenpai:feat/external-backend-auth

Conversation

@tcsenpai

@tcsenpai tcsenpai commented Jun 3, 2026

Copy link
Copy Markdown

TL;DR

Forge proxy only worked with local models (llama-server, Ollama). Now it works with remote APIs too (Xiaomi, OpenAI, etc). You can pass an API key and point it at any OpenAI-compatible endpoint.

Before: forge-proxy --backend ollama --model mistral
After: forge-proxy --backend-url https://api.example.com/v1 --api-key sk-xxx --model gpt-4

What changed

Three small fixes that make the proxy usable with hosted APIs:

1. API key authentication (--api-key)

Added api_key parameter through the full stack:

CLI (--api-key)
  -> ProxyServer (api_key)
    -> LlamafileClient (api_key)
      -> httpx headers (Authorization: Bearer sk-xxx)

This is the minimum viable change. No token refresh, no header injection, just a Bearer token passed to the backend.

2. Graceful /props fallback

The proxy calls GET /props to auto-detect context length. Hosted APIs return 404 on this endpoint, which crashed the proxy. Now it returns None and falls back to the budget-tokens default.

Before: crash on startup
After: works with any backend, auto-detect is best-effort

3. Model name resolution fix

Path("mimo-v2.5").stem returns "mimo-v2" because Python sees .5 as a file extension. Fixed by only stripping .gguf and .llamafile suffixes, not arbitrary ones.

Before: model name gets truncated for any model with a dot
After: plain model names pass through unchanged

Usage example

# Remote API with auth
forge-proxy \
  --backend-url https://api.example.com/v1 \
  --api-key sk-your-key-here \
  --model gpt-4 \
  --port 8083

# Point your client at it
curl http://localhost:8083/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"hello"}]}'

Why this matters

The README says forge works with "Ollama, llama-server, Llamafile, vLLM, and Anthropic." But in practice, anyone wanting to add guardrails in front of a hosted API (OpenAI, Xiaomi, Mistral, etc) had no way to pass auth. This removes that barrier.

Testing

  • Chat completion (non-streaming)
  • Streaming (SSE)
  • Tool calling with guardrails
  • /props 404 handled gracefully
  • Model name with dots preserved (mimo-v2.5)
  • Backwards compatible (api_key=None by default)

- Add --api-key CLI flag for Bearer token authentication
- Propagate api_key through ProxyServer → LlamafileClient → httpx headers
- Handle 404 on /props endpoint gracefully (return None instead of crashing)
  for compatibility with non-llama.cpp backends (Xiaomi, vLLM, OpenAI)
- Fix model name resolution: only strip .gguf/.llamafile extensions, not
  arbitrary suffixes like .5 in model names (e.g. mimo-v2.5)

This enables forge proxy to sit in front of hosted OpenAI-compatible APIs
with authentication, not just local llama-server/Ollama instances.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant