feat: add OpenAI-compatible /v1/audio/speech and /v1/models endpoints#656
feat: add OpenAI-compatible /v1/audio/speech and /v1/models endpoints#656neuron-tech-ai wants to merge 2 commits into
Conversation
Clients using the openai SDK can point at voicebox without code changes. Model mapping: tts-1→Kokoro, tts-1-hd→Qwen 1.7B, gts-4o-mini-tts→Qwen 0.6B. Voice resolution checks profiles by name first, falls back to Kokoro voice IDs.
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds OpenAI-compatible text-to-speech API endpoints to the backend. It introduces a new ChangesOpenAI-compatible TTS API
🎯 2 (Simple) | ⏱️ ~12 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
backend/routes/openai_compat.py (2)
125-125: ⚡ Quick winLanguage is hardcoded to "en".
The language parameter is hardcoded to
"en", which prevents users from generating speech in other languages even if the underlying models support them.Consider making language configurable through the request schema or inferring it from the input text.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/routes/openai_compat.py` at line 125, The code currently hardcodes language="en" which prevents non-English TTS; make the language configurable by accepting a language field from the request schema (or infer it from the input text) and pass that value instead of the literal "en". Update the request parsing/validation to include a language parameter (e.g., request.json().get("language") or the request model used in the route) and fall back to a sensible default (e.g., "en") if absent; replace the hardcoded language="en" in the TTS call with the validated variable (the same identifier used where language="en" appears). Ensure input-based inference is optional: if language not provided, attempt language detection on the text and set the language variable before invoking the TTS function.
164-164: 💤 Low valueDocumentation inconsistency: kokoro check mentioned but not implemented.
The comment states "or the engine is kokoro" (line 164) but there's no actual check for the kokoro engine before doing the profile lookup. The fallback happens for any engine when profile lookup fails, not specifically for kokoro.
Either remove this part of the comment or add an explicit engine check if that's the intended behavior.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/routes/openai_compat.py` at line 164, The comment in openai_compat.py incorrectly says "or the engine is kokoro" without an actual kokoro check; either remove that phrase from the comment or implement an explicit engine check so the fallback only triggers for kokoro. Concretely, locate the profile lookup/fallback logic that uses the engine variable (the code doing "profile lookup" and the subsequent fallback) and either (A) edit the comment to drop "or the engine is kokoro", or (B) add a conditional that checks if engine == "kokoro" (or the canonical kokoro identifier used elsewhere) before applying the kokoro-specific fallback, ensuring the fallback behavior matches the documented text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/routes/openai_compat.py`:
- Line 55: There is a typo in the model identifier: replace every occurrence of
"gts-4o-mini-tts" with the correct "gpt-4o-mini-tts" in the OpenAI compatibility
model mapping and any places that reference it; specifically update the mapping
entry (the dict that defines model aliases/IDs) and the other referenced usage
that currently points to "gts-4o-mini-tts" so callers can resolve the valid
model id "gpt-4o-mini-tts".
- Around line 185-187: The bare "except Exception: pass" in the voice prompt
creation path silently swallows errors; update the handler around
create_voice_prompt_for_profile so it logs the exception (include the exception
message and stack trace via process/logger.error or logging.exception) and only
suppresses expected exceptions (e.g., catch ValueError or the specific custom
exceptions you anticipate) while re-raising or letting unexpected exceptions
propagate; ensure you reference the create_voice_prompt_for_profile call and the
surrounding profile lookup code when making these changes.
- Around line 121-129: The SpeechRequest schema accepts a speed parameter but it
is never forwarded to the TTS pipeline; update the call to generate_chunked in
the handler (the call that currently passes tts_model, request.input,
voice_prompt, language, seed, instruct, trim_fn) to include speed=request.speed
so the downstream function receives client speed requests; if generate_chunked
or the backend (tts_model) does not support speed, instead detect a non-default
request.speed and either log/warn or raise a clear error indicating speed
control is unsupported by the implementation.
- Around line 190-191: The current fallback returns {"voice_id": kokoro_voice}
which mismatches backend expectations; update the return to emit engine-specific
keys using the local engine variable and kokoro_voice (and
_OPENAI_VOICE_TO_KOKORO) so Qwen gets {"preset_voice_id": kokoro_voice}, Kokoro
gets {"kokoro_voice": kokoro_voice} (or {"preset_voice_id": kokoro_voice} as its
alternate), and all other engines default to {"preset_voice_id": kokoro_voice};
locate the code that sets kokoro_voice in backend/routes/openai_compat.py and
replace the single {"voice_id": ...} return with a conditional branch on engine
to return the appropriate dict keys.
---
Nitpick comments:
In `@backend/routes/openai_compat.py`:
- Line 125: The code currently hardcodes language="en" which prevents
non-English TTS; make the language configurable by accepting a language field
from the request schema (or infer it from the input text) and pass that value
instead of the literal "en". Update the request parsing/validation to include a
language parameter (e.g., request.json().get("language") or the request model
used in the route) and fall back to a sensible default (e.g., "en") if absent;
replace the hardcoded language="en" in the TTS call with the validated variable
(the same identifier used where language="en" appears). Ensure input-based
inference is optional: if language not provided, attempt language detection on
the text and set the language variable before invoking the TTS function.
- Line 164: The comment in openai_compat.py incorrectly says "or the engine is
kokoro" without an actual kokoro check; either remove that phrase from the
comment or implement an explicit engine check so the fallback only triggers for
kokoro. Concretely, locate the profile lookup/fallback logic that uses the
engine variable (the code doing "profile lookup" and the subsequent fallback)
and either (A) edit the comment to drop "or the engine is kokoro", or (B) add a
conditional that checks if engine == "kokoro" (or the canonical kokoro
identifier used elsewhere) before applying the kokoro-specific fallback,
ensuring the fallback behavior matches the documented text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f4bebfd4-97dd-4e9b-8edb-e702f1de304b
📒 Files selected for processing (2)
backend/routes/__init__.pybackend/routes/openai_compat.py
| audio, sample_rate = await generate_chunked( | ||
| tts_model, | ||
| request.input, | ||
| voice_prompt, | ||
| language="en", | ||
| seed=None, | ||
| instruct=request.instructions, | ||
| trim_fn=trim_fn, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check if generate_chunked supports a speed parameter
# Find the function definition
ast-grep --pattern $'def generate_chunked($$$) {
$$$
}'
# Also check for async def
ast-grep --pattern $'async def generate_chunked($$$) {
$$$
}'Repository: jamiepine/voicebox
Length of output: 44
🏁 Script executed:
# First, let's see the directory structure and find the target file
cd backend && ls -laRepository: jamiepine/voicebox
Length of output: 1709
🏁 Script executed:
# Find where generate_chunked is defined or imported
rg "generate_chunked" --type python -A 2 -B 2Repository: jamiepine/voicebox
Length of output: 93
🏁 Script executed:
# Read the specific file to verify the code at lines 121-129
cat -n backend/routes/openai_compat.py | head -150 | tail -50Repository: jamiepine/voicebox
Length of output: 1965
🏁 Script executed:
# Search for generate_chunked import/definition using rg with correct flags
rg "generate_chunked" --type-list | head -20Repository: jamiepine/voicebox
Length of output: 452
🏁 Script executed:
# Try with correct file matching
rg "generate_chunked" -g "*.py" -A 2 -B 2Repository: jamiepine/voicebox
Length of output: 3423
🏁 Script executed:
# Read the imports and class definitions at the start of openai_compat.py
cat -n backend/routes/openai_compat.py | head -90Repository: jamiepine/voicebox
Length of output: 3190
🏁 Script executed:
# Search for SpeechRequest definition
rg "class SpeechRequest" -g "*.py" -A 10Repository: jamiepine/voicebox
Length of output: 762
🏁 Script executed:
# Read the generate_chunked function definition
cat -n backend/utils/chunked_tts.py | head -100Repository: jamiepine/voicebox
Length of output: 3409
🏁 Script executed:
# Also check the full function to see what parameters it accepts
cat backend/utils/chunked_tts.pyRepository: jamiepine/voicebox
Length of output: 9367
Critical: speed parameter is accepted but never used.
The SpeechRequest schema accepts a speed parameter (line 79), but it is never passed to generate_chunked or any downstream function. This means clients setting speed will see no effect, breaking OpenAI API compatibility expectations.
Pass speed=request.speed to generate_chunked, or if the underlying backend doesn't support speed control, document this limitation and raise an error or warning when a non-default speed is requested.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/routes/openai_compat.py` around lines 121 - 129, The SpeechRequest
schema accepts a speed parameter but it is never forwarded to the TTS pipeline;
update the call to generate_chunked in the handler (the call that currently
passes tts_model, request.input, voice_prompt, language, seed, instruct,
trim_fn) to include speed=request.speed so the downstream function receives
client speed requests; if generate_chunked or the backend (tts_model) does not
support speed, instead detect a non-default request.speed and either log/warn or
raise a clear error indicating speed control is unsupported by the
implementation.
- Fix model ID typo: 'gts-4o-mini-tts' -> 'gpt-4o-mini-tts' (valid OpenAI ID) - Log exception in voice-prompt fallback instead of bare 'except: pass' - Return engine-specific dict keys in fallback voice prompt (kokoro_voice vs preset_voice_id) - Document response_format and speed limitations in module docstring
|
Addressed all five CodeRabbit review comments:
|
Adds two endpoints that mirror the OpenAI TTS API:
POST /v1/audio/speech— accepts the same request shape as OpenAI's TTS endpoint (model,input,voice,response_format,speed)GET /v1/models— returns available Voicebox models in OpenAI's model list formatWhy this matters: any application already integrated with OpenAI TTS can point at a local Voicebox instance by changing only the base URL and API key — no code changes on the client side. This makes Voicebox a drop-in local replacement for OpenAI TTS in existing workflows, scripts, and tools.
Summary by CodeRabbit
New Features