feat(scripts): add OpenAI-compatible shim for Claude Code OAuth subscription#85
feat(scripts): add OpenAI-compatible shim for Claude Code OAuth subscription#85jarrettj wants to merge 1 commit into
Conversation
…ription
Adds `scripts/claude_oai_shim.py` — a FastAPI app exposing
`/v1/chat/completions` that routes through the official
`claude-agent-sdk`, letting any OpenAI-speaking client (litellm, DSPy,
etc.) use a Claude Code subscription instead of a metered Anthropic
Console API key.
Why: `evolve_skill` and friends default to OpenAI clients via litellm.
If you already pay for Claude Code, you'd rather not provision a second
billing account just to run evolutions.
Usage:
pip install -e ".[claude-shim]"
python scripts/claude_oai_shim.py &
export OPENAI_BASE_URL=http://127.0.0.1:8765/v1
export OPENAI_API_KEY=anything
python -m evolution.skills.evolve_skill \
--skill github-code-review \
--optimizer-model openai/claude-opus-4-7 \
--eval-model openai/claude-haiku-4-5
Verified end-to-end: completes a 5-iteration MIPROv2 run against
github-code-review, optimizer (opus) + eval (haiku) both routed through
the shim, ~29 min wall time, score 54.28%.
Adds a `claude-shim` optional dependency group so the three runtime
deps (claude-agent-sdk, fastapi, uvicorn) don't bloat the default
install.
Caveats:
- Streaming not yet supported (stream: true is buffered)
- Token usage counts are best-effort
- Users should review Claude Code's acceptable-use terms before
automating against it (note included in README)
jarrettj
left a comment
There was a problem hiding this comment.
Security Review
Found 1 Critical and 3 Warning issues that should be addressed before merge. See inline comments for details.
| Severity | Count |
|---|---|
| 🔴 Critical | 1 |
| 3 | |
| 💡 Suggestion | 4 |
Note: Cannot request changes on own PR — submitting as comment review.
|
|
||
| from __future__ import annotations | ||
|
|
||
| import asyncio |
There was a problem hiding this comment.
import asyncio is unused throughout the file. This will cause a ruff lint failure in CI (F401 'asyncio' imported but unused). Remove this import.
| return {"ok": True} | ||
|
|
||
|
|
||
| @app.post("/v1/chat/completions") |
There was a problem hiding this comment.
🔴 Critical: No authentication on /v1/chat/completions. Any process on the local machine can call this endpoint and consume the owner's Claude Code subscription without restriction. While the server binds to 127.0.0.1 (mitigating remote network access), add at minimum an optional API key check via SHIM_API_KEY env var validated against the Authorization header. Example:
import os
from fastapi import Depends, HTTPException, Security
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
_bearer = HTTPBearer(auto_error=False)
def _verify(creds: HTTPAuthorizationCredentials = Security(_bearer)):
key = os.environ.get('SHIM_API_KEY')
if key and (not creds or creds.credentials != key):
raise HTTPException(status_code=401, detail='Unauthorized')This prevents malicious local processes or supply-chain compromises from silently draining the subscription.
|
|
||
| text_chunks: list[str] = [] | ||
| usage = {"input_tokens": 0, "output_tokens": 0} | ||
| async for message in query(prompt=prompt, options=options): |
There was a problem hiding this comment.
claude-agent-sdk async call. Auth failures, network errors, rate limits, or SDK version mismatches will bubble up as raw 500 tracebacks. Clients like litellm expect an OpenAI-shaped error envelope. Wrap the async for block in try/except Exception as exc and return:
{"error": {"message": "...", "type": "server_error", "code": null}}| # ResultMessage carries usage at the end | ||
| u = getattr(message, "usage", None) | ||
| if u: | ||
| usage["input_tokens"] = u.get("input_tokens", 0) or 0 |
There was a problem hiding this comment.
u.get('input_tokens', 0) assumes usage is a dict. If claude-agent-sdk returns an object (dataclass or Pydantic model) for ResultMessage.usage, this raises AttributeError. Prefer getattr(u, 'input_tokens', None) or 0 to handle both shapes safely.
| @app.post("/v1/chat/completions") | ||
| async def chat_completions(request: Request) -> JSONResponse: | ||
| body = await request.json() | ||
| messages = body.get("messages", []) |
There was a problem hiding this comment.
💡 Suggestion: No validation that messages is non-empty. An empty list sends an empty string to Claude, wasting quota. Add an early-exit 422:
if not messages:
from fastapi import HTTPException
raise HTTPException(status_code=422, detail='messages must be non-empty')|
|
||
|
|
||
| @app.post("/v1/chat/completions") | ||
| async def chat_completions(request: Request) -> JSONResponse: |
There was a problem hiding this comment.
💡 Suggestion: No request-size guard. A caller can send an arbitrarily large messages array, consuming significant Claude Code quota in one request. Consider a configurable cap on total message content length before forwarding to the SDK.
| resolved = _resolve_model(model) | ||
| prompt, system = _messages_to_prompt(messages) | ||
|
|
||
| options = ClaudeAgentOptions(model=resolved, system_prompt=system) if system else ClaudeAgentOptions(model=resolved) |
There was a problem hiding this comment.
💡 Suggestion: Duplicate ClaudeAgentOptions constructor calls. Simplify:
kw = {'model': resolved}
if system:
kw['system_prompt'] = system
options = ClaudeAgentOptions(**kw)| { | ||
| "index": 0, | ||
| "message": {"role": "assistant", "content": response_text}, | ||
| "finish_reason": "stop", |
There was a problem hiding this comment.
💡 Suggestion: finish_reason is hardcoded to 'stop'. If the SDK signals truncation or content-filter stop this is silently swallowed. Map the SDK result status to 'stop', 'length', or 'content_filter' as appropriate.
Code Review Summary — PR #85 (feat/claude-oauth-shim)Verdict: Changes Requested (posted as COMMENT — cannot self-approve/block own PR) 🔴 Critical (1)[scripts/claude_oai_shim.py:114] No authentication on The endpoint has zero authentication. Any process running on the local machine can freely call it and consume the owner's Claude Code subscription quota. While Fix: Add an optional
|
jarrettj
left a comment
There was a problem hiding this comment.
Code Review — feat(scripts): add OpenAI-compatible shim for Claude Code OAuth subscription
Overall the idea is sound and the implementation is mostly clean, but there are several issues that need to be addressed before merging.
Critical / Must-Fix
1. import asyncio is unused (will fail ruff/flake8 CI)
asyncio is imported on line 53 but never referenced anywhere in the file. The async for loop uses query() from the SDK directly — Python's async machinery doesn't require an explicit asyncio import here. This will cause a lint failure in the CI pipeline (ruff is configured in this repo).
2. usage access pattern is fragile — u.get(...) assumes a dict, not an object
Lines 138-139: u.get("input_tokens", 0) will AttributeError at runtime if message.usage is a dataclass/object (which is the typical shape for SDK response types) rather than a plain dict. Should use getattr(u, "input_tokens", 0) consistently with how content and text are accessed elsewhere in the same loop.
3. No error handling in chat_completions — any SDK failure returns a bare 500
If query() raises (e.g. auth expired, rate limited, network error), FastAPI will return a bare 500 with no JSON body. OpenAI-compatible clients expect an error response shaped as {"error": {"message": "...", "type": "...", "code": ...}}. A try/except block with an appropriate JSONResponse(status_code=...) is needed.
Warnings / Should-Fix
4. _resolve_model silently maps all unknown model names to sonnet
If a caller passes "gpt-4" or "gpt-4o" (common when reusing existing DSPy config), the shim silently routes to sonnet with no warning. The README says to use claude-* names but there is no enforcement at the API layer. Consider logging a warning or returning a 400 for unrecognized families.
5. stream: true is silently ignored
The README and docstring mention this limitation, but the endpoint itself never reads body.get("stream") — it simply ignores the field. A well-behaved shim should at minimum log a warning. Clients that assert on stream behavior (some LiteLLM retry paths) may produce unexpected results.
6. model field in response echoes the requested model, not the resolved model
Line 148 returns "model": model (the raw OpenAI-style string from the request). Returning the actual resolved Claude model string would be more transparent and easier to debug.
7. Credentials path typo in README
Line 23 of README_claude_shim.md: ~/.claude/.credentials.json — double dot before credentials. The actual Claude Code credentials path is ~/.claude/credentials.json (single dot directory, no leading dot on the file).
Suggestions / Nice-to-Have
8. Port is hardcoded — no --port/--host CLI arguments
uvicorn.run(app, host="127.0.0.1", port=8765) is fixed. A simple argparse or click argument for --port and optionally --host would make this more composable (e.g. for CI environments or Docker).
9. No tests
Given the repo already has pytest + pytest-asyncio in dev deps, even a minimal test (mock query(), assert response shape) would add meaningful coverage and prevent regressions.
10. claude-opus-4-1-20250805 date stamp will go stale
The MODEL_MAP hardcodes a date-stamped opus model. Claude model IDs change; this will silently route to an outdated or deprecated model once Anthropic updates the version. Consider documenting the update cadence or sourcing the model ID from an environment variable / config.
Looks Good
- Optional dependency group
claude-shiminpyproject.tomlis correctly scoped and won't affect existing installs. - The
_messages_to_prompt()function correctly handles both string and list-of-block content formats, covering the OpenAI vision/tools content shape. - Localhost-only binding (
127.0.0.1) is a good default for a dev-only tool — avoids accidental exposure. - The
uuid4ID generation and timestamp in the response correctly mimics the OpenAI response envelope. - Documentation is clear about acceptable-use caveats.
Suggested verdict: REQUEST_CHANGES — items 1–3 are blocking (unused import breaks CI, usage dict access will AttributeError in production, missing error handling degrades client compatibility). Items 4–7 are strongly recommended before merge.
(Note: PR #300 was requested for review but does not exist in this repository — the highest PR number is #85. This review covers PR #85, the most recent open PR.)
Code Review Summary — PR #85:
|
| # | Issue | File | Line |
|---|---|---|---|
| 1 | import asyncio is unused — will fail ruff CI |
claude_oai_shim.py |
53 |
| 2 | u.get("input_tokens") assumes usage is a dict; SDK objects will AttributeError — use getattr(u, ...) instead |
claude_oai_shim.py |
138-139 |
| 3 | No try/except in chat_completions — any SDK error returns a bare 500 without the OpenAI error envelope {"error": {...}} |
claude_oai_shim.py |
127 |
Warnings (should fix)
| # | Issue |
|---|---|
| 4 | _resolve_model silently maps gpt-4, gpt-4o, etc. to sonnet — no log warning or 400 response |
| 5 | stream: true field is never read — silently ignored without any warning to the caller |
| 6 | Response model field echoes the raw requested name, not the resolved Claude model ID |
| 7 | README typo: ~/.claude/.credentials.json should be ~/.claude/credentials.json (extra leading dot) |
Suggestions (nice-to-have)
- Add
--port/--hostCLI args tomain()for composability - Add at least one
pytesttest (mockquery(), verify response shape) claude-opus-4-1-20250805will go stale — consider an env var override
Looks good
pyproject.tomloptional group is correctly scoped_messages_to_prompt()handles both string and list-of-block content127.0.0.1-only binding is the right default for a dev tool- UUID + timestamp response envelope matches OpenAI shape
- Acceptable-use caveat is prominently documented
Verdict: REQUEST_CHANGES — items 1–3 are blocking before merge.
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review — PR #85
Verdict: Would Request Changes — 2 bugs, 3 warnings, 2 suggestions (self-review; GitHub doesn't allow REQUEST_CHANGES on your own PR).
🔴 Bugs (fix before merge)
- claude_oai_shim.py:139 —
u.get()assumesusageis a dict; SDK may return a typed object →AttributeErrorat runtime - claude_oai_shim.py:118 —
stream=truesilently ignored; clients expecting SSE will hang or parse incorrectly
⚠️ Warnings
- claude_oai_shim.py:120 — no error handling around
query()— SDK exceptions return raw 500 with full traceback - pyproject.toml:34 —
claude-agent-sdk>=0.1.0is too loose; a breaking 0.x bump will pull in incompatible code silently - README_claude_shim.md:43 — example uses
claude-opus-4-7but MODEL_MAP resolves opus toclaude-opus-4-1-20250805
💡 Suggestions
- Port 8765 hard-coded with no
--portCLI arg orCLAUDE_SHIM_PORTenv var - No unit tests for
_resolve_model()or_messages_to_prompt()
✅ Looks Good
- Clean docstring with usage, caveats, and model routing table
from __future__ import annotations+ typed signatures throughout- Correct
if __name__ == '__main__'guard - Optional dep group
claude-shimkeeps the default install slim - All 139 existing tests pass; ruff: clean
Reviewed by Hermes Agent
| u = getattr(message, "usage", None) | ||
| if u: | ||
| usage["input_tokens"] = u.get("input_tokens", 0) or 0 | ||
| usage["output_tokens"] = u.get("output_tokens", 0) or 0 |
There was a problem hiding this comment.
🔴 Bug: u.get() assumes usage is a plain dict. The claude-agent-sdk may return a typed object (Pydantic model, dataclass, etc.) — calling .get() on it raises AttributeError at runtime whenever actual usage data is returned.
Suggested fix:
u = getattr(message, 'usage', None)
if u:
def _tok(obj, key):
if isinstance(obj, dict):
return obj.get(key, 0) or 0
return getattr(obj, key, 0) or 0
usage['input_tokens'] = _tok(u, 'input_tokens')
usage['output_tokens'] = _tok(u, 'output_tokens')| async def chat_completions(request: Request) -> JSONResponse: | ||
| body = await request.json() | ||
| messages = body.get("messages", []) | ||
| model = body.get("model", "claude-sonnet-4-5") |
There was a problem hiding this comment.
🔴 Bug: stream=True is silently downgraded to a buffered JSON response. OpenAI-compatible clients (litellm, DSPy, openai-python) that set stream=True expect a text/event-stream SSE response. Returning plain JSON will cause them to hang waiting for chunks or mis-parse the response.
Short-term fix — return an explicit 400 so callers know streaming isn't supported yet:
if body.get('stream'):
return JSONResponse(status_code=400, content={
'error': {'message': 'Streaming is not yet supported by this shim.', 'type': 'invalid_request_error'}
})| messages = body.get("messages", []) | ||
| model = body.get("model", "claude-sonnet-4-5") | ||
|
|
||
| resolved = _resolve_model(model) |
There was a problem hiding this comment.
query() call. Any SDK exception (auth failure, model not found, rate-limit) will surface as a raw 500 with a full Python traceback in the response body.
Wrap the handler:
try:
# ... existing code ...
except Exception as exc:
return JSONResponse(
status_code=500,
content={'error': {'message': str(exc), 'type': 'internal_error'}}
)| "darwinian-evolver", | ||
| ] | ||
| claude-shim = [ | ||
| "claude-agent-sdk>=0.1.0", |
There was a problem hiding this comment.
claude-agent-sdk>=0.1.0 has no upper bound. For an early-stage 0.x package, any minor release can introduce breaking API changes. Recommend:
"claude-agent-sdk>=0.1.0,<1.0",| --skill github-code-review \ | ||
| --optimizer-model openai/claude-opus-4-7 \ | ||
| --eval-model openai/claude-haiku-4-5 \ | ||
| --iterations 5 |
There was a problem hiding this comment.
--optimizer-model openai/claude-opus-4-7, but MODEL_MAP resolves all opus-* names to claude-opus-4-1-20250805. Either update MODEL_MAP to use claude-opus-4-7 (or the latest opus alias), or update this example to match what the shim actually resolves.
Code Review Summary — PR #85Verdict: Would Request Changes (2 bugs, 3 warnings, 2 suggestions)
🔴 Bugs
|
jarrettj
left a comment
There was a problem hiding this comment.
Hermes Agent Review
Found 3 warnings and 3 suggestions. Core logic is solid — model routing, message flattening, and the OpenAI response shape are all correct. All 139 existing tests pass. Three issues worth fixing before this lands in main: an unused import, missing error handling on query(), and a potential AttributeError on the usage object. See inline comments.
|
|
||
| from __future__ import annotations | ||
|
|
||
| import asyncio |
There was a problem hiding this comment.
asyncio is imported but never used — ruff flags this as F401. Remove this import line.
| resolved = _resolve_model(model) | ||
| prompt, system = _messages_to_prompt(messages) | ||
|
|
||
| options = ClaudeAgentOptions(model=resolved, system_prompt=system) if system else ClaudeAgentOptions(model=resolved) |
There was a problem hiding this comment.
💡 Suggestion: The conditional is unnecessary if ClaudeAgentOptions accepts system_prompt=None. Most SDKs treat None as 'no system prompt'. Simplify to:
options = ClaudeAgentOptions(model=resolved, system_prompt=system)|
|
||
| text_chunks: list[str] = [] | ||
| usage = {"input_tokens": 0, "output_tokens": 0} | ||
| async for message in query(prompt=prompt, options=options): |
There was a problem hiding this comment.
query() call. If the claude-agent-sdk raises (auth failure, rate limit, network error), FastAPI returns a raw 500 traceback instead of an OpenAI-compatible error body. Clients that parse {"error": {"message": ...}} will get a confusing response.
Suggested fix:
try:
async for message in query(prompt=prompt, options=options):
...
except Exception as exc:
return JSONResponse(
status_code=500,
content={"error": {"message": str(exc), "type": "internal_error"}},
)| # ResultMessage carries usage at the end | ||
| u = getattr(message, "usage", None) | ||
| if u: | ||
| usage["input_tokens"] = u.get("input_tokens", 0) or 0 |
There was a problem hiding this comment.
u.get() assumes usage is a dict, but the claude-agent-sdk may return a Usage dataclass or named object — calling .get() on it raises AttributeError at runtime.
Use getattr instead:
usage["input_tokens"] = getattr(u, "input_tokens", 0) or 0
usage["output_tokens"] = getattr(u, "output_tokens", 0) or 0(Or confirm the SDK always returns a dict here and add a comment noting that.)
| from fastapi.responses import JSONResponse | ||
|
|
||
| MODEL_MAP = { | ||
| "opus": "claude-opus-4-1-20250805", |
There was a problem hiding this comment.
💡 Suggestion: Model versions are hardcoded and will go stale as Claude releases new versions. Consider env-var overrides so users can pin to newer versions without editing source:
import os
MODEL_MAP = {
"opus": os.getenv("CLAUDE_SHIM_OPUS", "claude-opus-4-1-20250805"),
"sonnet": os.getenv("CLAUDE_SHIM_SONNET", "claude-sonnet-4-5"),
"haiku": os.getenv("CLAUDE_SHIM_HAIKU", "claude-haiku-4-5"),
}| @app.post("/v1/chat/completions") | ||
| async def chat_completions(request: Request) -> JSONResponse: | ||
| body = await request.json() | ||
| messages = body.get("messages", []) |
There was a problem hiding this comment.
💡 Suggestion: stream: true is silently ignored. A client requesting streaming will receive a non-streaming response with no indication the mode was not honoured. Consider a warning log so this is observable:
import logging
if body.get("stream"):
logging.getLogger(__name__).warning(
"stream=true requested but not supported — returning buffered response"
)
Code Review Summary — PR #85Verdict: Comments (non-blocking) — 3 warnings, 3 suggestions. Core implementation is correct; two runtime bugs should be fixed before this sees production use.
|
Adds
scripts/claude_oai_shim.py— a FastAPI app exposing/v1/chat/completionsthat routes through the officialclaude-agent-sdk, letting any OpenAI-speaking client (litellm, DSPy, etc.) use a Claude Code subscription instead of a metered Anthropic Console API key.Addresses issue #9 (use Anthropic Max plan subscription instead of API keys).
Changes
scripts/claude_oai_shim.py: FastAPI OpenAI-compat shimscripts/README_claude_shim.md: usage documentationpyproject.toml: addsclaude-shimoptional dependency group