fix(volcengine): support multimodal embedding models by rangerTao · Pull Request #4998 · QuantumNous/new-api

rangerTao · 2026-05-20T12:15:04Z

Related Issue

Summary

This PR makes Volcengine multimodal embedding models (e.g. doubao-embedding-vision-251215) work end-to-end in new-api by fixing two independent bugs.

Changes

1. Route vision/multimodal models to dedicated endpoint

relay/channel/volcengine/adaptor.go:

GetRequestURL for RelayModeEmbeddings now routes vision/multimodal models to /api/v3/embeddings/multimodal and keeps text-only models on /embeddings.
New helper isVolcengineMultimodalEmbedding(modelName) does the keyword match (case-insensitive: must contain embedding AND vision/multimodal).
Both UpstreamModelName and OriginModelName are checked so model_mapping aliases keep working.

2. Use multimodal input shape in dashboard "Test channel" button

controller/channel-test.go:

Hard-coded Input=[\"hello world\"] was rejected by the multimodal endpoint with \"we could not parse the JSON body of your request\".
Now emits Input=[{type:\"text\", text:\"hello world\"}] for vision/multimodal models, matching the endpoint contract.
Plain-text embedding models (Doubao-embedding, text-embedding-3-*, m3e, bge-*, etc.) keep the legacy string-array shape.

Behavior Matrix

Model name	Path before	Test btn before	Path after	Test btn after
`Doubao-embedding` (text)	`/embeddings` ✅	✅	(unchanged)	(unchanged)
`doubao-embedding-text-240715`	`/embeddings` ✅	✅	(unchanged)	(unchanged)
`doubao-embedding-large-text-240915`	`/embeddings` ✅	✅	(unchanged)	(unchanged)
`doubao-embedding-vision-241215`	`/embeddings` ❌ 400	❌ 400	`/embeddings/multimodal` ✅	✅ multimodal input
`doubao-embedding-vision-251215`	`/embeddings` ❌ 400	❌ 400	`/embeddings/multimodal` ✅	✅ multimodal input
`-multimodal` (future)	`/embeddings` ❌	❌	`/embeddings/multimodal` ✅	✅

Tests

Validated locally with a real Volcengine multimodal endpoint:

# Direct call: 200 + full embedding vector
curl http://newapi/v1/embeddings \
  -H \"Authorization: Bearer <token>\" -H \"Content-Type: application/json\" \
  -d '{\"model\":\"doubao-embedding-vision-251215\",\"input\":[{\"type\":\"text\",\"text\":\"测试\"}]}'

# Dashboard \"Test channel\" button: ✅ 200

Plain-text embedding channels (Doubao-embedding, OpenAI text-embedding-3-small, m3e) verified unchanged.

Compatibility

✅ No breaking changes for existing text embedding channels
✅ No DB migration required — detection is by model name only
✅ Works with model_mapping aliases (both OriginModelName and UpstreamModelName are checked)
✅ "Test channel" button auto-adapts based on model name

Summary by CodeRabbit

Release Notes

New Features
- Added automatic detection and enhanced support for multimodal and vision-capable embedding models.
Bug Fixes
- Improved request handling and endpoint routing for specialized embedding model types to ensure proper processing.

Volcengine multimodal embedding models (e.g. doubao-embedding-vision-251215) require POST /api/v3/embeddings/multimodal and reject the standard /api/v3/embeddings path with a 400 InvalidParameter error: "the requested model doubao-embedding-vision-251215 does not support this api" Detect by model-name keywords (case-insensitive: contains "embedding" AND contains "vision" or "multimodal"). Both UpstreamModelName and OriginModelName are checked so model_mapping aliases still work. Pure-text embedding models continue to use the standard endpoint.

…ding channels The follow-up fix for the multimodal endpoint routing exposed a second bug in the channel test path: controller/channel-test.go always emits Input=["hello world"] for any embedding model, but Volcengine's /api/v3/embeddings/multimodal endpoint rejects flat string arrays with: "we could not parse the JSON body of your request" So the dashboard "Test channel" button still failed for vision embedding models even after routing was corrected. Detect vision/multimodal embedding models by name keyword (case-insensitive) and emit the typed-parts shape that matches the multimodal endpoint: {"input": [{"type": "text", "text": "hello world"}]} Plain text embedding models (Doubao-embedding, doubao-embedding-text-*, m3e, bge-*, OpenAI text-embedding-3-*, etc.) keep using the legacy string array shape.

coderabbitai · 2026-05-20T12:15:21Z

Walkthrough

Fixes Volcengine multimodal embedding models by adding detection logic for vision/multimodal model names and routing requests to the dedicated multimodal endpoint with the correct typed-parts payload format.

Changes

Volcengine multimodal embedding support

Layer / File(s)	Summary
Multimodal model detection helper `relay/channel/volcengine/adaptor.go`	`isVolcengineMultimodalEmbedding` helper detects model names containing both `embedding` and (`vision` or `multimodal`) using case-insensitive matching.
Adaptor endpoint routing for multimodal embeddings `relay/channel/volcengine/adaptor.go`	`GetRequestURL` checks the upstream model name and routes multimodal embedding requests to `/api/v3/embeddings/multimodal` instead of the standard embeddings path.
Test request payload format for multimodal models `controller/channel-test.go`	`buildTestRequest` detects multimodal models in the embedding path and emits `EmbeddingRequest` with `Input` formatted as typed parts instead of plain string array.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

Calcium-Ion

Poem

🐰 A rabbit hops through Volcengine's dream,
Where vision models split the stream—
Multimodal magic, typed with care,
Embedding wisdom, floating fair! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(volcengine): support multimodal embedding models' accurately and concisely summarizes the primary change: adding support for Volcengine multimodal embedding models.
Linked Issues check	✅ Passed	The PR fully addresses both bugs from issue `#4997`: routing multimodal models to the correct endpoint in adaptor.go and emitting correct input shape in channel-test.go.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the two identified bugs in issue `#4997` with no unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@controller/channel-test.go`:
- Around line 761-778: The multimodal typed-parts payload is being applied for
any model name containing "vision" or "multimodal" (variable model / lower) but
must be limited to Volcengine channel only; update the conditional that builds
the dto.EmbeddingRequest to also check the channel variable (e.g., channel ==
"volcengine" or whatever canonical channel constant is used in this file) so the
typed-parts input is emitted only for Volcengine multimodal models and not for
other providers.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fca0f41a-296f-45db-bf61-7f348fe5adda

📥 Commits

Reviewing files that changed from the base of the PR and between 20d3e73 and bbbd8d8.

📒 Files selected for processing (2)

controller/channel-test.go
relay/channel/volcengine/adaptor.go

coderabbitai · 2026-05-20T12:22:38Z

+		// Volcengine multimodal embedding endpoint (/api/v3/embeddings/multimodal)
+		// requires input to be an array of typed parts (e.g. [{type:"text",text:"..."}]),
+		// not a plain string array. Sending the standard `["hello world"]` shape
+		// against vision/multimodal models triggers a 400 from the upstream
+		// "we could not parse the JSON body of your request" — so emit the
+		// multimodal-compatible shape when the model name signals it.
+		lower := strings.ToLower(model)
+		if strings.Contains(lower, "vision") || strings.Contains(lower, "multimodal") {
+			return &dto.EmbeddingRequest{
+				Model: model,
+				Input: []any{
+					map[string]any{
+						"type": "text",
+						"text": "hello world",
+					},
+				},
+			}
+		}


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Search for embedding model definitions that contain vision/multimodal keywords # across different channel adaptors to assess cross-provider impact rg -nC3 --type=go 'ModelList.*=.*\[' relay/channel/ | \ rg -i 'embedding.*(vision|multimodal)|(vision|multimodal).*embedding'

Repository: QuantumNous/new-api

Length of output: 45

🌐 Web query:

Do OpenAI, Anthropic, or other major LLM providers support typed-parts input format [{type:"text", text:"..."}] for multimodal embedding requests?

💡 Result:

As of May 20, 2026, support for the specific typed-parts input format [{type: "text", text: "..."},...] varies significantly among major LLM providers [1][2][3]. Google Gemini API The Gemini API uses a native multimodal structure that supports the requested format for embedding requests. Specifically, the Gemini Embedding 2 model accepts a content structure consisting of parts [3][4]. Users can pass multiple input types (text, images, etc.) within the parts array of a Content object to generate an aggregated embedding [3]: { "model": "models/gemini-embedding-2", "content": { "parts": [ {"text": "Your text here"}, {"inlineData": {"mimeType": "image/jpeg", "data": "..."}} ] } } OpenAI API OpenAI's official Embeddings API does not support a multimodal typed-parts format [1][5]. The input parameter for the embeddings endpoint is strictly limited to a string, an array of strings, or an array of token integers [1]. While OpenAI supports multimodal inputs (like images and text) in their Chat Completions API (using the same typed-parts format you referenced), this format is not compatible with their dedicated embedding endpoints [1][5][6]. Anthropic API Anthropic's Claude API does not currently offer a standalone multimodal embedding endpoint that supports a typed-parts input format [2]. While Anthropic's Messages API uses the [{type: "text", text: "..."}, {type: "image",...}] format for chat interactions [7][6], it does not provide native support for embedding requests in this format [2][8]. Users seeking multimodal embeddings typically rely on third-party models or specialized services like Voyage AI, which also use standard input arrays rather than the typed-parts block format for their embedding endpoints [2].

Citations:

1: https://developers.openai.com/api/docs/api-reference/embeddings

2: https://console.anthropic.com/docs/en/build-with-claude/embeddings

3: https://ai.google.dev/gemini-api/docs/embeddings

4: https://pub.towardsai.net/google-just-killed-five-embedding-models-with-one-heres-what-actually-happens-4cda368ae871

5: https://developers.openai.com/api/docs/guides/embeddings

6: https://platform.claude.com/docs/en/build-with-claude/vision

7: https://tessl.io/registry/tessl/pypi-anthropic/files/docs/common-tasks/multimodal-input.md

8: Feature Request: Support embeddings as a native input modality in the Messages API anthropics/anthropic-sdk-python#1351

🏁 Script executed:

# Find where buildTestRequest is defined and called rg -n 'buildTestRequest' --type=go

Repository: QuantumNous/new-api

Length of output: 308

🏁 Script executed:

# Search for embedding model definitions across all channel implementations fd --type=f --extension=go 'relay/channel' | head -20

Repository: QuantumNous/new-api

Length of output: 421

🏁 Script executed:

# Check specifically for Volcengine embedding model definition rg -n 'embedding' relay/channel/ --type=go -i | grep -i volcengine

Repository: QuantumNous/new-api

Length of output: 1811

🏁 Script executed:

# See what embedding models are defined per channel type rg -n 'embedding' relay/channel/ --type=go -i -B2 -A2 | head -100

Repository: QuantumNous/new-api

Length of output: 7110

🏁 Script executed:

# Look at the buildTestRequest function to understand the full context sed -n '682,810p' controller/channel-test.go

Repository: QuantumNous/new-api

Length of output: 4068

🏁 Script executed:

# Check if channel.Type is used elsewhere in this file rg -n 'channel.Type' controller/channel-test.go | head -20

Repository: QuantumNous/new-api

Length of output: 874

🏁 Script executed:

# Verify what constant.ChannelTypeVolcEngine is rg -n 'ChannelTypeVolcEngine' --type=go | head -10

Repository: QuantumNous/new-api

Length of output: 1062

🏁 Script executed:

# Check if channel parameter is available and used in buildTestRequest sed -n '682,750p' controller/channel-test.go | grep -E '(channel|Channel)'

Repository: QuantumNous/new-api

Length of output: 173

🏁 Script executed:

# Search for any other embedding models with vision/multimodal in model constants rg -n "vision|multimodal" relay/channel/ --type=go -i

Repository: QuantumNous/new-api

Length of output: 2363

🏁 Script executed:

# Double-check: are there any other channels with embedding capability and vision/multimodal models? rg -n 'embedding' relay/channel/ --type=go -l | xargs -I {} sh -c 'echo "=== {} ===" && rg -i "vision|multimodal" {}'

Repository: QuantumNous/new-api

Length of output: 2263

🏁 Script executed:

# Check if Gemini has embedding support and what format it uses rg -A10 'ConvertEmbeddingRequest' relay/channel/gemini/ --type=go

Repository: QuantumNous/new-api

Length of output: 872

🏁 Script executed:

# Verify what models are in Gemini constants rg -n 'embedding' relay/channel/gemini/ --type=go -i

Repository: QuantumNous/new-api

Length of output: 3397

🏁 Script executed:

# Check Gemini's ConvertEmbeddingRequest to see how it handles the input format sed -n '196,260p' relay/channel/gemini/adaptor.go

Repository: QuantumNous/new-api

Length of output: 2395

🏁 Script executed:

# Verify if there are any Gemini multimodal embedding models already defined rg -n 'embedding' relay/channel/gemini/constant.go --type=go -B2 -A2

Repository: QuantumNous/new-api

Length of output: 346

Scope the multimodal typed-parts format to Volcengine only.

This logic emits the typed-parts input shape [{type:"text", text:"hello world"}] for ANY embedding model containing "vision" or "multimodal, regardless of channel type. While this format is correct for Volcengine's /api/v3/embeddings/multimodal endpoint, it is Volcengine-specific and would break other providers if they add multimodal embedding models in the future. For example, Gemini has embedding models and uses a completely different input structure ({content: {parts: [{text: "..."}]}}`); applying the Volcengine format to a future Gemini multimodal embedding model would cause failures.

The channel parameter is available in this function and is already used elsewhere in the file. Add a channel-type check to limit this behavior to Volcengine:

Suggested fix

lower := strings.ToLower(model) - if strings.Contains(lower, "vision") || strings.Contains(lower, "multimodal") { + // Volcengine multimodal embeddings require typed-parts input format + if channel != nil && channel.Type == constant.ChannelTypeVolcEngine && + (strings.Contains(lower, "vision") || strings.Contains(lower, "multimodal")) { return &dto.EmbeddingRequest{

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@controller/channel-test.go` around lines 761 - 778, The multimodal typed-parts payload is being applied for any model name containing "vision" or "multimodal" (variable model / lower) but must be limited to Volcengine channel only; update the conditional that builds the dto.EmbeddingRequest to also check the channel variable (e.g., channel == "volcengine" or whatever canonical channel constant is used in this file) so the typed-parts input is emitted only for Volcengine multimodal models and not for other providers.

taoliang added 2 commits May 20, 2026 20:14

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(volcengine): support multimodal embedding models#4998

fix(volcengine): support multimodal embedding models#4998
rangerTao wants to merge 2 commits into
QuantumNous:mainfrom
rangerTao:pr/volcengine-multimodal-embedding

rangerTao commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rangerTao commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issue

Summary

Changes

1. Route vision/multimodal models to dedicated endpoint

2. Use multimodal input shape in dashboard "Test channel" button

Behavior Matrix

Tests

Compatibility

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rangerTao commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading