Skip to content

feat: add OpenAI-compatible /v1/audio/speech and /v1/models endpoints#656

Open
neuron-tech-ai wants to merge 2 commits into
jamiepine:mainfrom
neuron-tech-ai:feat/openai-compat-api
Open

feat: add OpenAI-compatible /v1/audio/speech and /v1/models endpoints#656
neuron-tech-ai wants to merge 2 commits into
jamiepine:mainfrom
neuron-tech-ai:feat/openai-compat-api

Conversation

@neuron-tech-ai
Copy link
Copy Markdown

@neuron-tech-ai neuron-tech-ai commented May 14, 2026

Adds two endpoints that mirror the OpenAI TTS API:

  • POST /v1/audio/speech — accepts the same request shape as OpenAI's TTS endpoint (model, input, voice, response_format, speed)
  • GET /v1/models — returns available Voicebox models in OpenAI's model list format

Why this matters: any application already integrated with OpenAI TTS can point at a local Voicebox instance by changing only the base URL and API key — no code changes on the client side. This makes Voicebox a drop-in local replacement for OpenAI TTS in existing workflows, scripts, and tools.

Summary by CodeRabbit

New Features

  • Added OpenAI-compatible text-to-speech API endpoints for generating audio from text with support for multiple voices and customizable speech parameters.
  • Added endpoint to retrieve available text-to-speech models.

Review Change Stack

Clients using the openai SDK can point at voicebox without code changes.
Model mapping: tts-1→Kokoro, tts-1-hd→Qwen 1.7B, gts-4o-mini-tts→Qwen 0.6B.
Voice resolution checks profiles by name first, falls back to Kokoro voice IDs.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 14, 2026

Warning

Rate limit exceeded

@neuron-tech-ai has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 26 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c5c70f1c-d49f-499b-a991-bf18f791f1cc

📥 Commits

Reviewing files that changed from the base of the PR and between 3046860 and 1fcc81e.

📒 Files selected for processing (1)
  • backend/routes/openai_compat.py
📝 Walkthrough

Walkthrough

This PR adds OpenAI-compatible text-to-speech API endpoints to the backend. It introduces a new openai_compat module with POST /v1/audio/speech and GET /v1/models routes, supporting model and voice name mappings, and voice resolution via database profiles or built-in Kokoro voice IDs.

Changes

OpenAI-compatible TTS API

Layer / File(s) Summary
Request contract and model/voice mappings
backend/routes/openai_compat.py
Module documentation, OpenAI model ID to engine/model-size mappings, OpenAI voice name to Kokoro voice ID mappings, and SpeechRequest schema with model, input, voice, response format, speed, and instructions fields.
Speech endpoint and voice resolution
backend/routes/openai_compat.py
POST /v1/audio/speech handler validates the model, resolves the voice via case-insensitive database VoiceProfile lookup with fallback to built-in voice ID mapping, loads the engine, generates chunked audio with trimming, normalizes output, and returns WAV bytes. _resolve_voice_prompt helper performs voice lookup with cached prompt creation and default voice fallback.
Models endpoint and router registration
backend/routes/openai_compat.py, backend/routes/__init__.py
GET /v1/models endpoint returns list of supported model IDs. Router is imported and registered with the FastAPI application via app.include_router().

🎯 2 (Simple) | ⏱️ ~12 minutes

🐰 New routes hop in with voice and cheer,
OpenAI-compatible TTS draws near,
Models map and voices resolve,
A speech endpoint to smoothly evolve!
WAV bytes flowing, profiles in sight—
The API now talks OpenAI-right! 🎙️

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main changes: addition of OpenAI-compatible API endpoints for speech generation and model listing.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
backend/routes/openai_compat.py (2)

125-125: ⚡ Quick win

Language is hardcoded to "en".

The language parameter is hardcoded to "en", which prevents users from generating speech in other languages even if the underlying models support them.

Consider making language configurable through the request schema or inferring it from the input text.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routes/openai_compat.py` at line 125, The code currently hardcodes
language="en" which prevents non-English TTS; make the language configurable by
accepting a language field from the request schema (or infer it from the input
text) and pass that value instead of the literal "en". Update the request
parsing/validation to include a language parameter (e.g.,
request.json().get("language") or the request model used in the route) and fall
back to a sensible default (e.g., "en") if absent; replace the hardcoded
language="en" in the TTS call with the validated variable (the same identifier
used where language="en" appears). Ensure input-based inference is optional: if
language not provided, attempt language detection on the text and set the
language variable before invoking the TTS function.

164-164: 💤 Low value

Documentation inconsistency: kokoro check mentioned but not implemented.

The comment states "or the engine is kokoro" (line 164) but there's no actual check for the kokoro engine before doing the profile lookup. The fallback happens for any engine when profile lookup fails, not specifically for kokoro.

Either remove this part of the comment or add an explicit engine check if that's the intended behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routes/openai_compat.py` at line 164, The comment in openai_compat.py
incorrectly says "or the engine is kokoro" without an actual kokoro check;
either remove that phrase from the comment or implement an explicit engine check
so the fallback only triggers for kokoro. Concretely, locate the profile
lookup/fallback logic that uses the engine variable (the code doing "profile
lookup" and the subsequent fallback) and either (A) edit the comment to drop "or
the engine is kokoro", or (B) add a conditional that checks if engine ==
"kokoro" (or the canonical kokoro identifier used elsewhere) before applying the
kokoro-specific fallback, ensuring the fallback behavior matches the documented
text.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/routes/openai_compat.py`:
- Line 55: There is a typo in the model identifier: replace every occurrence of
"gts-4o-mini-tts" with the correct "gpt-4o-mini-tts" in the OpenAI compatibility
model mapping and any places that reference it; specifically update the mapping
entry (the dict that defines model aliases/IDs) and the other referenced usage
that currently points to "gts-4o-mini-tts" so callers can resolve the valid
model id "gpt-4o-mini-tts".
- Around line 185-187: The bare "except Exception: pass" in the voice prompt
creation path silently swallows errors; update the handler around
create_voice_prompt_for_profile so it logs the exception (include the exception
message and stack trace via process/logger.error or logging.exception) and only
suppresses expected exceptions (e.g., catch ValueError or the specific custom
exceptions you anticipate) while re-raising or letting unexpected exceptions
propagate; ensure you reference the create_voice_prompt_for_profile call and the
surrounding profile lookup code when making these changes.
- Around line 121-129: The SpeechRequest schema accepts a speed parameter but it
is never forwarded to the TTS pipeline; update the call to generate_chunked in
the handler (the call that currently passes tts_model, request.input,
voice_prompt, language, seed, instruct, trim_fn) to include speed=request.speed
so the downstream function receives client speed requests; if generate_chunked
or the backend (tts_model) does not support speed, instead detect a non-default
request.speed and either log/warn or raise a clear error indicating speed
control is unsupported by the implementation.
- Around line 190-191: The current fallback returns {"voice_id": kokoro_voice}
which mismatches backend expectations; update the return to emit engine-specific
keys using the local engine variable and kokoro_voice (and
_OPENAI_VOICE_TO_KOKORO) so Qwen gets {"preset_voice_id": kokoro_voice}, Kokoro
gets {"kokoro_voice": kokoro_voice} (or {"preset_voice_id": kokoro_voice} as its
alternate), and all other engines default to {"preset_voice_id": kokoro_voice};
locate the code that sets kokoro_voice in backend/routes/openai_compat.py and
replace the single {"voice_id": ...} return with a conditional branch on engine
to return the appropriate dict keys.

---

Nitpick comments:
In `@backend/routes/openai_compat.py`:
- Line 125: The code currently hardcodes language="en" which prevents
non-English TTS; make the language configurable by accepting a language field
from the request schema (or infer it from the input text) and pass that value
instead of the literal "en". Update the request parsing/validation to include a
language parameter (e.g., request.json().get("language") or the request model
used in the route) and fall back to a sensible default (e.g., "en") if absent;
replace the hardcoded language="en" in the TTS call with the validated variable
(the same identifier used where language="en" appears). Ensure input-based
inference is optional: if language not provided, attempt language detection on
the text and set the language variable before invoking the TTS function.
- Line 164: The comment in openai_compat.py incorrectly says "or the engine is
kokoro" without an actual kokoro check; either remove that phrase from the
comment or implement an explicit engine check so the fallback only triggers for
kokoro. Concretely, locate the profile lookup/fallback logic that uses the
engine variable (the code doing "profile lookup" and the subsequent fallback)
and either (A) edit the comment to drop "or the engine is kokoro", or (B) add a
conditional that checks if engine == "kokoro" (or the canonical kokoro
identifier used elsewhere) before applying the kokoro-specific fallback,
ensuring the fallback behavior matches the documented text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f4bebfd4-97dd-4e9b-8edb-e702f1de304b

📥 Commits

Reviewing files that changed from the base of the PR and between b35b909 and 3046860.

📒 Files selected for processing (2)
  • backend/routes/__init__.py
  • backend/routes/openai_compat.py

Comment thread backend/routes/openai_compat.py Outdated
Comment thread backend/routes/openai_compat.py
Comment on lines +121 to +129
audio, sample_rate = await generate_chunked(
tts_model,
request.input,
voice_prompt,
language="en",
seed=None,
instruct=request.instructions,
trim_fn=trim_fn,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check if generate_chunked supports a speed parameter

# Find the function definition
ast-grep --pattern $'def generate_chunked($$$) {
  $$$
}'

# Also check for async def
ast-grep --pattern $'async def generate_chunked($$$) {
  $$$
}'

Repository: jamiepine/voicebox

Length of output: 44


🏁 Script executed:

# First, let's see the directory structure and find the target file
cd backend && ls -la

Repository: jamiepine/voicebox

Length of output: 1709


🏁 Script executed:

# Find where generate_chunked is defined or imported
rg "generate_chunked" --type python -A 2 -B 2

Repository: jamiepine/voicebox

Length of output: 93


🏁 Script executed:

# Read the specific file to verify the code at lines 121-129
cat -n backend/routes/openai_compat.py | head -150 | tail -50

Repository: jamiepine/voicebox

Length of output: 1965


🏁 Script executed:

# Search for generate_chunked import/definition using rg with correct flags
rg "generate_chunked" --type-list | head -20

Repository: jamiepine/voicebox

Length of output: 452


🏁 Script executed:

# Try with correct file matching
rg "generate_chunked" -g "*.py" -A 2 -B 2

Repository: jamiepine/voicebox

Length of output: 3423


🏁 Script executed:

# Read the imports and class definitions at the start of openai_compat.py
cat -n backend/routes/openai_compat.py | head -90

Repository: jamiepine/voicebox

Length of output: 3190


🏁 Script executed:

# Search for SpeechRequest definition
rg "class SpeechRequest" -g "*.py" -A 10

Repository: jamiepine/voicebox

Length of output: 762


🏁 Script executed:

# Read the generate_chunked function definition
cat -n backend/utils/chunked_tts.py | head -100

Repository: jamiepine/voicebox

Length of output: 3409


🏁 Script executed:

# Also check the full function to see what parameters it accepts
cat backend/utils/chunked_tts.py

Repository: jamiepine/voicebox

Length of output: 9367


Critical: speed parameter is accepted but never used.

The SpeechRequest schema accepts a speed parameter (line 79), but it is never passed to generate_chunked or any downstream function. This means clients setting speed will see no effect, breaking OpenAI API compatibility expectations.

Pass speed=request.speed to generate_chunked, or if the underlying backend doesn't support speed control, document this limitation and raise an error or warning when a non-default speed is requested.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routes/openai_compat.py` around lines 121 - 129, The SpeechRequest
schema accepts a speed parameter but it is never forwarded to the TTS pipeline;
update the call to generate_chunked in the handler (the call that currently
passes tts_model, request.input, voice_prompt, language, seed, instruct,
trim_fn) to include speed=request.speed so the downstream function receives
client speed requests; if generate_chunked or the backend (tts_model) does not
support speed, instead detect a non-default request.speed and either log/warn or
raise a clear error indicating speed control is unsupported by the
implementation.

Comment thread backend/routes/openai_compat.py Outdated
Comment thread backend/routes/openai_compat.py Outdated
- Fix model ID typo: 'gts-4o-mini-tts' -> 'gpt-4o-mini-tts' (valid OpenAI ID)
- Log exception in voice-prompt fallback instead of bare 'except: pass'
- Return engine-specific dict keys in fallback voice prompt (kokoro_voice vs preset_voice_id)
- Document response_format and speed limitations in module docstring
@neuron-tech-ai
Copy link
Copy Markdown
Author

Addressed all five CodeRabbit review comments:

  1. Model ID typo: Fixed gts-4o-mini-ttsgpt-4o-mini-tts in both the module docstring and the _MODEL_MAP dict.

  2. response_format silently ignored: Added documentation in the module docstring noting that WAV is always returned and other formats are not yet supported. Raising a 400 for unsupported formats would break existing callers that pass the default; the limitation is now explicitly documented.

  3. speed parameter not forwarded: Added documentation in the module docstring noting that speed is accepted for schema compatibility but not yet forwarded to backends. A follow-up implementation PR can wire this through once backend support is added.

  4. Bare except Exception: pass: Replaced with except Exception as e that logs a warning (with voice name and exception message) before falling through to the built-in voice.

  5. Wrong fallback voice_prompt dict keys: Fixed the fallback return to use engine-specific keys: Kokoro gets {"kokoro_voice": kokoro_voice}, all others (qwen, qwen_custom_voice) get {"preset_voice_id": kokoro_voice}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant