feat(proxy): external backend auth + graceful /props fallback by tcsenpai · Pull Request #104 · antoinezambelli/forge

tcsenpai · 2026-06-03T12:00:57Z

TL;DR

Forge proxy only worked with local models (llama-server, Ollama). Now it works with remote APIs too (Xiaomi, OpenAI, etc). You can pass an API key and point it at any OpenAI-compatible endpoint.

Before: forge-proxy --backend ollama --model mistral
After: forge-proxy --backend-url https://api.example.com/v1 --api-key sk-xxx --model gpt-4

What changed

Three small fixes that make the proxy usable with hosted APIs:

1. API key authentication (`--api-key`)

Added api_key parameter through the full stack:

CLI (--api-key)
  -> ProxyServer (api_key)
    -> LlamafileClient (api_key)
      -> httpx headers (Authorization: Bearer sk-xxx)

This is the minimum viable change. No token refresh, no header injection, just a Bearer token passed to the backend.

2. Graceful `/props` fallback

The proxy calls GET /props to auto-detect context length. Hosted APIs return 404 on this endpoint, which crashed the proxy. Now it returns None and falls back to the budget-tokens default.

Before: crash on startup
After: works with any backend, auto-detect is best-effort

3. Model name resolution fix

Path("mimo-v2.5").stem returns "mimo-v2" because Python sees .5 as a file extension. Fixed by only stripping .gguf and .llamafile suffixes, not arbitrary ones.

Before: model name gets truncated for any model with a dot
After: plain model names pass through unchanged

Usage example

# Remote API with auth
forge-proxy \
  --backend-url https://api.example.com/v1 \
  --api-key sk-your-key-here \
  --model gpt-4 \
  --port 8083

# Point your client at it
curl http://localhost:8083/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4","messages":[{"role":"user","content":"hello"}]}'

Why this matters

The README says forge works with "Ollama, llama-server, Llamafile, vLLM, and Anthropic." But in practice, anyone wanting to add guardrails in front of a hosted API (OpenAI, Xiaomi, Mistral, etc) had no way to pass auth. This removes that barrier.

Testing

Chat completion (non-streaming)
Streaming (SSE)
Tool calling with guardrails
/props 404 handled gracefully
Model name with dots preserved (mimo-v2.5)
Backwards compatible (api_key=None by default)

- Add --api-key CLI flag for Bearer token authentication - Propagate api_key through ProxyServer → LlamafileClient → httpx headers - Handle 404 on /props endpoint gracefully (return None instead of crashing) for compatibility with non-llama.cpp backends (Xiaomi, vLLM, OpenAI) - Fix model name resolution: only strip .gguf/.llamafile extensions, not arbitrary suffixes like .5 in model names (e.g. mimo-v2.5) This enables forge proxy to sit in front of hosted OpenAI-compatible APIs with authentication, not just local llama-server/Ollama instances.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(proxy): external backend auth + graceful /props fallback#104

feat(proxy): external backend auth + graceful /props fallback#104
tcsenpai wants to merge 1 commit into
antoinezambelli:mainfrom
tcsenpai:feat/external-backend-auth

tcsenpai commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tcsenpai commented Jun 3, 2026

TL;DR

What changed

1. API key authentication (--api-key)

2. Graceful /props fallback

3. Model name resolution fix

Usage example

Why this matters

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. API key authentication (`--api-key`)

2. Graceful `/props` fallback