feat(proxy): external backend auth + graceful /props fallback#104
Open
tcsenpai wants to merge 1 commit into
Open
feat(proxy): external backend auth + graceful /props fallback#104tcsenpai wants to merge 1 commit into
tcsenpai wants to merge 1 commit into
Conversation
- Add --api-key CLI flag for Bearer token authentication - Propagate api_key through ProxyServer → LlamafileClient → httpx headers - Handle 404 on /props endpoint gracefully (return None instead of crashing) for compatibility with non-llama.cpp backends (Xiaomi, vLLM, OpenAI) - Fix model name resolution: only strip .gguf/.llamafile extensions, not arbitrary suffixes like .5 in model names (e.g. mimo-v2.5) This enables forge proxy to sit in front of hosted OpenAI-compatible APIs with authentication, not just local llama-server/Ollama instances.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR
Forge proxy only worked with local models (llama-server, Ollama). Now it works with remote APIs too (Xiaomi, OpenAI, etc). You can pass an API key and point it at any OpenAI-compatible endpoint.
Before:
forge-proxy --backend ollama --model mistralAfter:
forge-proxy --backend-url https://api.example.com/v1 --api-key sk-xxx --model gpt-4What changed
Three small fixes that make the proxy usable with hosted APIs:
1. API key authentication (
--api-key)Added
api_keyparameter through the full stack:This is the minimum viable change. No token refresh, no header injection, just a Bearer token passed to the backend.
2. Graceful
/propsfallbackThe proxy calls
GET /propsto auto-detect context length. Hosted APIs return 404 on this endpoint, which crashed the proxy. Now it returnsNoneand falls back to the budget-tokens default.Before: crash on startup
After: works with any backend, auto-detect is best-effort
3. Model name resolution fix
Path("mimo-v2.5").stemreturns"mimo-v2"because Python sees.5as a file extension. Fixed by only stripping.ggufand.llamafilesuffixes, not arbitrary ones.Before: model name gets truncated for any model with a dot
After: plain model names pass through unchanged
Usage example
Why this matters
The README says forge works with "Ollama, llama-server, Llamafile, vLLM, and Anthropic." But in practice, anyone wanting to add guardrails in front of a hosted API (OpenAI, Xiaomi, Mistral, etc) had no way to pass auth. This removes that barrier.
Testing
/props404 handled gracefully