feat: transparent image fallback to Anthropic for non-Anthropic backends#21
Open
BenSheridanEdwards wants to merge 4 commits into
Open
feat: transparent image fallback to Anthropic for non-Anthropic backends#21BenSheridanEdwards wants to merge 4 commits into
BenSheridanEdwards wants to merge 4 commits into
Conversation
Routes image turns from the configured backend (DeepSeek/OpenRouter) to
api.anthropic.com so Claude Code's vision capabilities work even when
the active backend can't process image content blocks. The wire-side
model name is swapped to the canonical Claude name on outbound (so
Anthropic accepts the request) and back to the backend name on the
inbound response (so Claude Code never sees the swap and the TUI keeps
showing the backend model).
Components:
- launch_claude now starts the proxy and routes Claude Code through it
(mirrors what --remote already did). Deliberately does not touch
ANTHROPIC_AUTH_TOKEN — whatever credential Claude Code already carries
(OAuth bearer from `claude login`, an explicit token, etc.) flows
through and is what Anthropic sees on the image-reroute path.
start_proxy is a shared helper that sets PROXY_PID/PROXY_PORT/PROXY_LOG
as script globals; must not be called via $() — the EXIT trap depends
on PROXY_PID reaching the parent shell. SCRIPT_DIR is symlink-resolved
so deepclaude works when installed via ~/.local/bin symlink.
- proxy/start-proxy.js legacy mode accepts an optional [defaultMode]
third arg, threaded through as `defaultMode` so state.mode resolves to
e.g. `deepseek` instead of `_single` and MODEL_REMAP[state.mode] fires.
- proxy/model-proxy.js:
- containsImageBlock walks tool_result.content[] recursively because
Claude Code's Read tool wraps a returned PNG in tool_result rather
than at the top of the message.
- REVERSE_MODEL_REMAP is derived from MODEL_REMAP at module load.
Many-to-one collisions (claude-opus-4-6 + claude-opus-4-7 both map
to deepseek-v4-pro) collapse last-write-wins.
- On forceAnthropicForImage, the proxy: (a) leaves the client auth
header intact, (b) swaps body.model backend-name → canonical
claude-* so Anthropic recognizes it, (c) drops `thinking` and
`context_management` to avoid Anthropic 400s on
clear_thinking_* strategies, (d) strips ALL thinking blocks
(foreign backends emit signed-but-invalid ones).
- UsageNormalizer (SSE) extended with optional modelRewrite, applied
to message.model in message_start events to swap the canonical
name back to the backend name on the wire to the client.
- Non-streaming JSON responses get the same model-name swap on the
response body before it's forwarded.
- Cost tracking: image-rerouted turns bucket under `anthropic_max`
(cost: 0, since Max is subscription quota); anthropic_equivalent
still computed via PRICING_PER_M.anthropic so savings is truthful.
- Single body parse in the request handler; outbound and content
encoding stripped on proxy-mutated paths to avoid client-side
ZlibError.
Disable the whole feature with DEEPCLAUDE_IMAGE_FALLBACK=off.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So plain `deepclaude` works from a symlink in PATH without users needing to chmod +x after every checkout/branch switch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed narration and code-walking commentary; kept the non-obvious-WHY notes (clear_thinking_* mismatch, foreign-backend signed thinking blocks, ZlibError on mutated bodies, EXIT-trap preservation, last-write-wins on REVERSE_MODEL_REMAP collisions, must-not-be-\$()-on-start_proxy). Architectural reasoning moves to the PR description where it belongs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without this, attaching one image silently pinned every subsequent turn
to Anthropic Max OAuth — the TUI kept showing the cheap backend while
the user's Max quota bled out, defeating the cost-savings premise of
the product.
Two changes:
1. forceAnthropicForImage now triggers only on images in the LATEST
message (a fresh attachment, or a Read tool_result that just came
back), not anywhere in conversation history.
2. On non-Anthropic routes, walk parsed.messages and replace every
image content block (including those nested in tool_result.content[])
with a text placeholder. Text follow-ups now route back to DeepSeek
and work from the assistant's prior textual description of the image.
Trade-off: a question that genuinely needs to look at the pixels again
("what color is the third building from the left?") will not have
access to the image after the strip. The conversation can re-Read the
file if needed, which produces a fresh image in the latest message and
routes that single turn back to Anthropic.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 7, 2026
BenSheridanEdwards
added a commit
to BenSheridanEdwards/DeepClaude
that referenced
this pull request
May 7, 2026
DeepSeek's anthropic-compat endpoint 400s with:
The \`content[].thinking\` in the thinking mode must be passed back to
the API.
…when the request body has \`thinking: { type: \"enabled\", ... }\` at
the top level but the messages don't carry thinking content blocks.
Background: foreign-backend thinking blocks are invalid against
Anthropic's signing key, so the proxy strips them from messages on
isModelCall. But it left the top-level \`thinking\` config in place,
creating the contradictory state DeepSeek rejects.
Fix: drop both \`thinking\` and \`context_management\` for isModelCall
routes (mirrors what the image-fallback path on PR aattaran#21 already does on
forceAnthropicForImage). Backends like DeepSeek don't honor Anthropic's
extended-thinking config anyway, so dropping it costs nothing and
fixes the 400.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DeepSeek's anthropic-compat endpoint returns 400 on image content blocks; OpenRouter and Fireworks reject or silently drop the pixels. Translating images to text up front loses the detail vision is for.
This PR detects an image in the latest message of the outbound request body and reroutes that single request to
api.anthropic.com. The model name is swapped on the wire — backend name (deepseek-v4-pro) → canonical Claude name (claude-opus-4-7) outbound, reversed on the response — so Anthropic recognises the model and Claude Code keeps showing the configured backend in its TUI throughout. Once the image is no longer in the latest message, subsequent text turns route back to the configured backend and stale images in conversation history are stripped, so the cost savings hold for follow-ups.launch_claudenow starts the proxy in path so the defaultdeepclaudeflow benefits — previously only--remotedid. Image-rerouted turns bucket under a newanthropic_maxcost key withcost: 0(Max consumes subscription quota, not per-token).Disable with
DEEPCLAUDE_IMAGE_FALLBACK=off.Demo
The session shows: TUI banner reads
deepseek-v4-pro[1m] · Claude Max; "what model is this?" answersdeepseek-v4-pro(text turn → DeepSeek); a screenshot describes accurately (image turn → Anthropic, swapped on the wire); a follow-up referencing the image still self-identifies asdeepseek-v4-pro(response-side swap-back, and follow-up has routed back to DeepSeek with stripped history).Design notes
The proxy doesn't touch the client auth header on image-reroute. Whatever Claude Code is already carrying — OAuth bearer from
claude login, an explicitANTHROPIC_AUTH_TOKEN, etc. — is what authenticates at Anthropic. Other turns continue to the configured backend with the backend's own key, which the proxy holds privately (passed via argv tostart-proxy.js). The two credentials never cross.Why the on-the-wire swap, not env-var canonical names. A simpler approach would be to set
ANTHROPIC_DEFAULT_*_MODELto canonical Claude names and let the proxy do the forward remap to backend names. That works, but it makes the TUI lie permanently — Claude Code reads those env vars and renders "Opus 4.7" in the welcome banner regardless of where requests actually go. The on-the-wire swap keeps the env vars as backend names so the TUI stays truthful for the dominant case (text turns to DeepSeek), and only the image-reroute moment uses Claude names — invisibly.Why two swaps, not one. Outbound only would leave Claude Code seeing
claude-opus-4-7in the response payload, mismatched against thedeepseek-v4-proit sent. The inbound swap onmessage.model(in SSEmessage_startevents and non-streaming JSON bodies) closes the loop so Claude Code's view is consistent end-to-end.Stripping stale images is product-critical, not an optimization. Without it, attaching one image would silently pin every subsequent turn in the conversation to Anthropic Max — the TUI keeps advertising the cheap backend while the user's Max quota bleeds out. That defeats the cost-savings premise of the product. So:
forceAnthropicForImageonly fires when an image is in the latest message (a fresh attachment, or aReadtool_result that just returned).parsed.messagesand replaces every image content block (including those nested insidetool_result.content[]) with the text placeholder[image omitted]. Each strip is logged.The trade-off: a question that genuinely needs to look at the pixels again ("what color is the third building from the left?") won't have visual access to the image after the strip. The conversation can
Readthe file again, which produces a fresh image in the latest message and routes that single turn to Anthropic.Detection walks
tool_result.content[]recursively. Claude Code'sReadtool wraps a returned PNG inside atool_resultrather than at the top of the message. A naive top-level check would miss the most common case.MODEL_REMAPreverses last-write-wins.claude-opus-4-6andclaude-opus-4-7both forward-map todeepseek-v4-pro, so the reverse picksclaude-opus-4-7(the most recent key). When Anthropic ships an Opus bump, updatingMODEL_REMAPautomatically updates the reverse — no separate maintenance.Drops
thinkingandcontext_managementon image-reroute. Anthropic 400s withclear_thinking_*strategies ifthinkingisn't enabled; foreign backends emit signed-but-invalid thinking blocks that don't survive crossing into Anthropic's signing domain.Test plan
deepseek-v4-pro, response correct.[image→anthropic, deepseek-v4-pro→claude-opus-4-7], response describes the image accurately, TUI still showsdeepseek-v4-pro./_proxy/cost:deepseekbucket grows on text turns;anthropic_maxbucket grows only on image turns themselves, not on subsequent text follow-ups.Closes #12. Supersedes #19.