Skip to content

feat: transparent image fallback to Anthropic for non-Anthropic backends#21

Open
BenSheridanEdwards wants to merge 4 commits into
aattaran:mainfrom
BenSheridanEdwards:feat/image-fallback
Open

feat: transparent image fallback to Anthropic for non-Anthropic backends#21
BenSheridanEdwards wants to merge 4 commits into
aattaran:mainfrom
BenSheridanEdwards:feat/image-fallback

Conversation

@BenSheridanEdwards
Copy link
Copy Markdown

@BenSheridanEdwards BenSheridanEdwards commented May 7, 2026

DeepSeek's anthropic-compat endpoint returns 400 on image content blocks; OpenRouter and Fireworks reject or silently drop the pixels. Translating images to text up front loses the detail vision is for.

This PR detects an image in the latest message of the outbound request body and reroutes that single request to api.anthropic.com. The model name is swapped on the wire — backend name (deepseek-v4-pro) → canonical Claude name (claude-opus-4-7) outbound, reversed on the response — so Anthropic recognises the model and Claude Code keeps showing the configured backend in its TUI throughout. Once the image is no longer in the latest message, subsequent text turns route back to the configured backend and stale images in conversation history are stripped, so the cost savings hold for follow-ups.

launch_claude now starts the proxy in path so the default deepclaude flow benefits — previously only --remote did. Image-rerouted turns bucket under a new anthropic_max cost key with cost: 0 (Max consumes subscription quota, not per-token).

Disable with DEEPCLAUDE_IMAGE_FALLBACK=off.

Demo

The session shows: TUI banner reads deepseek-v4-pro[1m] · Claude Max; "what model is this?" answers deepseek-v4-pro (text turn → DeepSeek); a screenshot describes accurately (image turn → Anthropic, swapped on the wire); a follow-up referencing the image still self-identifies as deepseek-v4-pro (response-side swap-back, and follow-up has routed back to DeepSeek with stripped history).

Design notes

The proxy doesn't touch the client auth header on image-reroute. Whatever Claude Code is already carrying — OAuth bearer from claude login, an explicit ANTHROPIC_AUTH_TOKEN, etc. — is what authenticates at Anthropic. Other turns continue to the configured backend with the backend's own key, which the proxy holds privately (passed via argv to start-proxy.js). The two credentials never cross.

Why the on-the-wire swap, not env-var canonical names. A simpler approach would be to set ANTHROPIC_DEFAULT_*_MODEL to canonical Claude names and let the proxy do the forward remap to backend names. That works, but it makes the TUI lie permanently — Claude Code reads those env vars and renders "Opus 4.7" in the welcome banner regardless of where requests actually go. The on-the-wire swap keeps the env vars as backend names so the TUI stays truthful for the dominant case (text turns to DeepSeek), and only the image-reroute moment uses Claude names — invisibly.

Why two swaps, not one. Outbound only would leave Claude Code seeing claude-opus-4-7 in the response payload, mismatched against the deepseek-v4-pro it sent. The inbound swap on message.model (in SSE message_start events and non-streaming JSON bodies) closes the loop so Claude Code's view is consistent end-to-end.

Stripping stale images is product-critical, not an optimization. Without it, attaching one image would silently pin every subsequent turn in the conversation to Anthropic Max — the TUI keeps advertising the cheap backend while the user's Max quota bleeds out. That defeats the cost-savings premise of the product. So:

  • forceAnthropicForImage only fires when an image is in the latest message (a fresh attachment, or a Read tool_result that just returned).
  • On any non-Anthropic route, the proxy walks parsed.messages and replaces every image content block (including those nested inside tool_result.content[]) with the text placeholder [image omitted]. Each strip is logged.
  • Text follow-ups go back to the backend and work from the assistant's prior textual description of the image.

The trade-off: a question that genuinely needs to look at the pixels again ("what color is the third building from the left?") won't have visual access to the image after the strip. The conversation can Read the file again, which produces a fresh image in the latest message and routes that single turn to Anthropic.

Detection walks tool_result.content[] recursively. Claude Code's Read tool wraps a returned PNG inside a tool_result rather than at the top of the message. A naive top-level check would miss the most common case.

MODEL_REMAP reverses last-write-wins. claude-opus-4-6 and claude-opus-4-7 both forward-map to deepseek-v4-pro, so the reverse picks claude-opus-4-7 (the most recent key). When Anthropic ships an Opus bump, updating MODEL_REMAP automatically updates the reverse — no separate maintenance.

Drops thinking and context_management on image-reroute. Anthropic 400s with clear_thinking_* strategies if thinking isn't enabled; foreign backends emit signed-but-invalid thinking blocks that don't survive crossing into Anthropic's signing domain.

Test plan

  • Plain text turn: routes to DeepSeek, TUI shows deepseek-v4-pro, response correct.
  • Image turn (Claude Code Read tool on a PNG): proxy log shows [image→anthropic, deepseek-v4-pro→claude-opus-4-7], response describes the image accurately, TUI still shows deepseek-v4-pro.
  • Follow-up text turn after an image: stale images stripped (logged in proxy), routes back to DeepSeek, response continues coherently from the assistant's prior description.
  • /_proxy/cost: deepseek bucket grows on text turns; anthropic_max bucket grows only on image turns themselves, not on subsequent text follow-ups.

Closes #12. Supersedes #19.

BenSheridanEdwards and others added 3 commits May 7, 2026 13:11
Routes image turns from the configured backend (DeepSeek/OpenRouter) to
api.anthropic.com so Claude Code's vision capabilities work even when
the active backend can't process image content blocks. The wire-side
model name is swapped to the canonical Claude name on outbound (so
Anthropic accepts the request) and back to the backend name on the
inbound response (so Claude Code never sees the swap and the TUI keeps
showing the backend model).

Components:

- launch_claude now starts the proxy and routes Claude Code through it
  (mirrors what --remote already did). Deliberately does not touch
  ANTHROPIC_AUTH_TOKEN — whatever credential Claude Code already carries
  (OAuth bearer from `claude login`, an explicit token, etc.) flows
  through and is what Anthropic sees on the image-reroute path.
  start_proxy is a shared helper that sets PROXY_PID/PROXY_PORT/PROXY_LOG
  as script globals; must not be called via $() — the EXIT trap depends
  on PROXY_PID reaching the parent shell. SCRIPT_DIR is symlink-resolved
  so deepclaude works when installed via ~/.local/bin symlink.

- proxy/start-proxy.js legacy mode accepts an optional [defaultMode]
  third arg, threaded through as `defaultMode` so state.mode resolves to
  e.g. `deepseek` instead of `_single` and MODEL_REMAP[state.mode] fires.

- proxy/model-proxy.js:
    - containsImageBlock walks tool_result.content[] recursively because
      Claude Code's Read tool wraps a returned PNG in tool_result rather
      than at the top of the message.
    - REVERSE_MODEL_REMAP is derived from MODEL_REMAP at module load.
      Many-to-one collisions (claude-opus-4-6 + claude-opus-4-7 both map
      to deepseek-v4-pro) collapse last-write-wins.
    - On forceAnthropicForImage, the proxy: (a) leaves the client auth
      header intact, (b) swaps body.model backend-name → canonical
      claude-* so Anthropic recognizes it, (c) drops `thinking` and
      `context_management` to avoid Anthropic 400s on
      clear_thinking_* strategies, (d) strips ALL thinking blocks
      (foreign backends emit signed-but-invalid ones).
    - UsageNormalizer (SSE) extended with optional modelRewrite, applied
      to message.model in message_start events to swap the canonical
      name back to the backend name on the wire to the client.
    - Non-streaming JSON responses get the same model-name swap on the
      response body before it's forwarded.
    - Cost tracking: image-rerouted turns bucket under `anthropic_max`
      (cost: 0, since Max is subscription quota); anthropic_equivalent
      still computed via PRICING_PER_M.anthropic so savings is truthful.
    - Single body parse in the request handler; outbound and content
      encoding stripped on proxy-mutated paths to avoid client-side
      ZlibError.

Disable the whole feature with DEEPCLAUDE_IMAGE_FALLBACK=off.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
So plain `deepclaude` works from a symlink in PATH without users
needing to chmod +x after every checkout/branch switch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removed narration and code-walking commentary; kept the
non-obvious-WHY notes (clear_thinking_* mismatch, foreign-backend
signed thinking blocks, ZlibError on mutated bodies, EXIT-trap
preservation, last-write-wins on REVERSE_MODEL_REMAP collisions,
must-not-be-\$()-on-start_proxy). Architectural reasoning moves to
the PR description where it belongs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@BenSheridanEdwards BenSheridanEdwards changed the title feat: image fallback to Anthropic via on-the-wire model name swap feat: transparent image fallback to Anthropic for non-Anthropic backends May 7, 2026
Without this, attaching one image silently pinned every subsequent turn
to Anthropic Max OAuth — the TUI kept showing the cheap backend while
the user's Max quota bled out, defeating the cost-savings premise of
the product.

Two changes:

1. forceAnthropicForImage now triggers only on images in the LATEST
   message (a fresh attachment, or a Read tool_result that just came
   back), not anywhere in conversation history.

2. On non-Anthropic routes, walk parsed.messages and replace every
   image content block (including those nested in tool_result.content[])
   with a text placeholder. Text follow-ups now route back to DeepSeek
   and work from the assistant's prior textual description of the image.

Trade-off: a question that genuinely needs to look at the pixels again
("what color is the third building from the left?") will not have
access to the image after the strip. The conversation can re-Read the
file if needed, which produces a fresh image in the latest message and
routes that single turn back to Anthropic.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BenSheridanEdwards added a commit to BenSheridanEdwards/DeepClaude that referenced this pull request May 7, 2026
DeepSeek's anthropic-compat endpoint 400s with:

  The \`content[].thinking\` in the thinking mode must be passed back to
  the API.

…when the request body has \`thinking: { type: \"enabled\", ... }\` at
the top level but the messages don't carry thinking content blocks.

Background: foreign-backend thinking blocks are invalid against
Anthropic's signing key, so the proxy strips them from messages on
isModelCall. But it left the top-level \`thinking\` config in place,
creating the contradictory state DeepSeek rejects.

Fix: drop both \`thinking\` and \`context_management\` for isModelCall
routes (mirrors what the image-fallback path on PR aattaran#21 already does on
forceAnthropicForImage). Backends like DeepSeek don't honor Anthropic's
extended-thinking config anyway, so dropping it costs nothing and
fixes the 400.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deepseek image support implementation

1 participant