Skip to content

Conversation

@Pavanmanikanta98
Copy link
Contributor

@Pavanmanikanta98 Pavanmanikanta98 commented Nov 29, 2025

Fixes #3530

Key Changes:

  • Updated Profile: Added openai_audio_input_encoding: Literal['base64', 'uri'] to OpenAIModelProfile.
    • 'base64' (default): Maintains strict OpenAI compliance.
    • 'uri': Enables Data URI formatting for providers like Qwen Omni.
  • Updated Model Logic: Modified OpenAIChatModel._map_user_prompt to respect this setting.
    • For BinaryContent: Uses item.data_uri when encoding is 'uri'.
    • For AudioUrl: Manually constructs the Data URI with the correct MIME type (e.g., audio/mpeg for mp3) when
      encoding is 'uri'.
  • New Tests: Added tests/models/test_openai_audio.py covering both default and URI encoding scenarios for both binary content and audio URLs.

@Pavanmanikanta98
Copy link
Contributor Author

@DouweM , For the Qwen Omni integration specifically, I’d like to follow your suggestion and handle the Data URI requirement via a dedicated provider rather than changing the shared qwen_model_profile.

Concretely, my plan is:

  • Add a new provider class for Qwen’s OpenAI‑compatible Chat Completions endpoint (e.g. QwenOpenAIProvider ), which implements its own model_profile(self, model_name: str).
  • That model_profile will start from the standard OpenAI profile (e.g. openai_model_profile(model_name)) and then update it to set openai_chat_audio_input_encoding='uri'.
  • Users who want to talk to Qwen Omni via an OpenAI‑style API would instantiate OpenAIChatModel with this provider and the Qwen Omni base URL, and they’d automatically get Data URI audio, while other Qwen providers keep the default base64 behavior.

@DouweM
Copy link
Collaborator

DouweM commented Dec 2, 2025

@Pavanmanikanta98 Thanks, makes sense. It should be just QwenProvider, and we should also support the qwen: model name prefix, update the docs, etc. See https://ai.pydantic.dev/models/openai/#openai-compatible-models for examples; anywhere those a referenced in the code, we should add a branch for qwen as well.

@Pavanmanikanta98
Copy link
Contributor Author

Pavanmanikanta98 commented Dec 4, 2025

Hi @DouweM,
I've addressed your feedback: renamed to openai_chat_audio_input_encoding, used item.media_type instead of hardcoded mapping, and added QwenProvider with automatic Omni audio encoding. All tests pass.
Ready for review.

@Pavanmanikanta98
Copy link
Contributor Author

Pavanmanikanta98 commented Dec 8, 2025

Hi @DouweM, following up on this when you have time. Ready for review.


### Qwen

To use Qwen models via the OpenAI-compatible API from [Alibaba Cloud DashScope](https://www.alibabacloud.com/help/doc-detail/2712576.html), you can set the `QWEN_API_KEY` (or `DASHSCOPE_API_KEY`) environment variable and use [`QwenProvider`][pydantic_ai.providers.qwen.QwenProvider] by name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Should the provider be named AlibabaProvider instead, and the prefix alibaba:, as that's the platform name, whole Qwen is a model family? Or DashScopeProvider?
  • Let's link to the product page rather than the docs: https://www.alibabacloud.com/en/product/modelstudio
  • Let's support only 1 env var, likely ALIBABA_API_KEY or DASHSCOPE_API_KEY


The `QwenProvider` uses the international DashScope compatible endpoint `https://dashscope-intl.aliyuncs.com/compatible-mode/v1` by default.

When using **Qwen Omni** models (e.g. `qwen-omni-turbo`), this provider automatically handles audio input using the Data URI format required by the DashScope API.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this, the user will assume everything just works, we don't have to explain the specific things we did to make it so

elif item.is_audio:
assert item.format in ('wav', 'mp3')
audio = InputAudio(data=base64.b64encode(item.data).decode('utf-8'), format=item.format)
profile = OpenAIModelProfile.from_profile(self.profile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this just once at the top of the method

profile = OpenAIModelProfile.from_profile(self.profile)
if profile.openai_chat_audio_input_encoding == 'uri':
mime_type = item.media_type or f'audio/{downloaded_item["data_type"]}'
data_uri = f'data:{mime_type};base64,{downloaded_item["data"]}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need to do this ourselves, we can call download_item with data_format='base64_uri'

@property
def base_url(self) -> str:
# Using the international endpoint by default as it's more standard for global users
# Users in China region can override this via passing `openai_client` or implementing logic to check region
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we take a base_url argument? And mention this in the docs

@DouweM DouweM changed the title Add configurable audio encoding for OpenAI models (Data URI support) Support audio on Alibaba Cloud Qwen Omni Dec 9, 2025
Pavanmanikanta98 pushed a commit to Pavanmanikanta98/pydantic-ai that referenced this pull request Dec 11, 2025
…r Omni models

- Rename QwenProvider to DashScopeProvider with dashscope: prefix
- Use single DASHSCOPE_API_KEY environment variable
- Add base_url argument to DashScopeProvider constructor
- Refactor audio mapping to fetch profile once at method top
- Use download_item with base64_uri format for AudioUrl
- Remove Qwen-specific mentions from docstrings
- Update documentation with product page link and base_url example
- Add comprehensive tests for DashScopeProvider

Addresses all maintainer feedback from PR pydantic#3596
Fixes pydantic#3530
...
```

### DashScope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mention it on docs/models/overview.md, docs/index.md and README.md where we mention all the other providers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### DashScope
### Alibaba Cloud DashScope

@DouweM DouweM changed the title Support audio on Alibaba Cloud Qwen Omni Add Alibaba Cloud DashScopeProvider and support audio input for Qwen Omni Dec 12, 2025
@Pavanmanikanta98
Copy link
Contributor Author

Hi @DouweM,

Addressed the feedback. Ready for review.

...
```

### DashScope
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### DashScope
### Alibaba Cloud DashScope


### DashScope

To use Qwen models via [Alibaba Cloud DashScope](https://www.alibabacloud.com/en/product/modelstudio), you can set the `DASHSCOPE_API_KEY` environment variable and use [`DashScopeProvider`][pydantic_ai.providers.dashscope.DashScopeProvider] by name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DashScopeProvider really the most appropriate/recognizable name? I see DashScope mentioned only once on https://www.alibabacloud.com/en/product/modelstudio, so maybe it should just be AlibabaProvider?

@Pavanmanikanta98
Copy link
Contributor Author

Hi @DouweM,

I'm happy to rename the provider to AlibabaProvider if that's the preferred direction for discovery, but I wanted to share my reasoning for choosing DashScopeProvider initially, backed by the official SDK and documentation:

  1. Official SDK & Identity: The official Python SDK is explicitly named dashscope (PyPI (https://pypi.org/project/dashscope/)), and the official documentation refers to the API usage as "DashScope" (e.g., First API Call to Qwen (https://www.alibabacloud.com/help/en/model-studio/first-api-call-to-qwen)).
  2. Service vs. Platform: "DashScope" is the specific name of the OpenAI-compatible API service we are connecting to. This aligns with other provider naming conventions like BedrockProvider (wrapping Amazon Bedrock, not AmazonProvider) and AzureProvider (wrapping Azure OpenAI, not MicrosoftProvider).
  3. Environment Variable: The standard API keys generated in the console are explicitly for DashScope (DASHSCOPE_API_KEY), so DashScopeProvider aligns with the configuration users will already have.

If you still prefer AlibabaProvider for better recognizability, I can definitely make the switch! I just wanted to clarify that DashScopeProvider was chosen to match the specific API service name.

Let me know what you think!

@DouweM
Copy link
Collaborator

DouweM commented Dec 15, 2025

@Pavanmanikanta98 It's a bit confusing/inconsistent, but I interpret these 2 sentences on https://www.alibabacloud.com/help/en/model-studio/first-api-call-to-qwen to mean that they actually prefer the platform to be called "Alibaba Cloud Model Studio", and DashScope is just the SDK:

Alibaba Cloud Model Studio lets you call large language models (LLMs) through OpenAI compatible interfaces or the DashScope SDK.

(this seems to contradict your saying that "DashScope" is the specific name of the OpenAI-compatible API service we are connecting to", as they here say "OpenAI endpoint OR DashScope")

# Replace YOUR_DASHSCOPE_API_KEY with your Alibaba Cloud Model Studio API key.

(not "DashScope API key")

But then again the URLs all include "dashscope" as well, so clearly it's not just the SDK. So maybe it's the name of their AI inference platform, but also they don't want to actually call it that in marketing material (anymore?) and say "Alibaba Cloud Model Studio" instead?

I think the fact that https://www.alibabacloud.com/help/en/model-studio/what-is-model-studio only mentions "dashscope" in the code example is the deciding factor here. Someone who reads that page and wants to look for Pydantic AI support seems far more likely to scan a list for "Alibaba Cloud Model Studio" or (in one word) "Alibaba" than "DashScope".

So yeah please make the rename, and where we can have the full name (in docs etc), we can say "Alibaba Cloud Model Studio (DashScope)".

Of course Alibaba does a lot more, but it's similar to HerokuProvider and VercelProvider that in Pydantic AI context clearly refer to their AI inference features.

Azure and Bedrock are a little different because those are establish brand names in their own right; if Alibaba still called it Alibaba DashScope AI or something, DashScope would've been fine.

All of that is to say, please make the change 😄

@Pavanmanikanta98
Copy link
Contributor Author

@DouweM

I've renamed DashScopeProvider to AlibabaProvider (using alibaba: prefix) and updated the documentation to reference "Alibaba Cloud Model Studio (DashScope)" as discussed. DASHSCOPE_API_KEY is preserved for consistency with the official docs. Ready for review


### Alibaba Cloud Model Studio (DashScope)

To use Qwen models via [Alibaba Cloud Model Studio (DashScope)](https://www.alibabacloud.com/en/product/modelstudio), you can set the `DASHSCOPE_API_KEY` environment variable and use [`AlibabaProvider`][pydantic_ai.providers.alibaba.AlibabaProvider] by name:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we additionally support ALIBABA_API_KEY please? Like for Google we support GOOGLE_ and GEMINI_ both. That'll be easier to recognize in a .env file for someone looking for the token that's used by AlibabaProvider who's not familiar with the DashScope name

pavan added 10 commits December 17, 2025 22:22
This commit introduces `openai_audio_input_encoding` to `OpenAIModelProfile`, allowing users to choose between `'base64'` (default) and `'uri'` encoding for audio inputs. This addresses compatibility issues with providers like Qwen Omni that require Data URI format for audio data.

Key changes:
- Added `openai_audio_input_encoding` to `OpenAIModelProfile`.
- Updated `OpenAIChatModel._map_user_prompt` to respect the configured encoding for `BinaryContent` and `AudioUrl`.
- Added new tests in `tests/models/test_openai_audio.py` covering both encoding modes.
…i models

- Add QwenProvider for DashScope OpenAI-compatible API
- Rename openai_audio_input_encoding to openai_chat_audio_input_encoding
- Use item.media_type for Data URI MIME types instead of hardcoded mapping
- Automatically set Data URI audio encoding for Qwen Omni models
- Add comprehensive tests for QwenProvider and audio encoding
- Add Qwen documentation section to OpenAI-compatible models docs

Fixes pydantic#3530
- Include 'qwen' in the model inference options for compatibility with Qwen models.
- Set up environment variable for Qwen API key in test_examples.py to facilitate testing.

This enhances the integration of Qwen models within the existing framework.
- Add tests for initializing QwenProvider with `openai_client` and `http_client` to ensure full branch coverage.
…r Omni models

- Rename QwenProvider to DashScopeProvider with dashscope: prefix
- Use single DASHSCOPE_API_KEY environment variable
- Add base_url argument to DashScopeProvider constructor
- Refactor audio mapping to fetch profile once at method top
- Use download_item with base64_uri format for AudioUrl
- Remove Qwen-specific mentions from docstrings
- Update documentation with product page link and base_url example
- Add comprehensive tests for DashScopeProvider

Addresses all maintainer feedback from PR pydantic#3596
Fixes pydantic#3530
- Add DashScope to provider lists in README.md and docs/index.md
- Add pydantic_ai.providers.dashscope to docs/api/providers.md
- Merge test_openai_audio.py into test_openai.py and remove redundant test
  - Rename DashScopeProvider → AlibabaProvider with prefix alibaba:
  - Keep DASHSCOPE_API_KEY env var (matches Alibaba official docs)
  - Update all documentation references
  - Add Alibaba Cloud to README.md, docs/index.md, docs/models/overview.md
  - Add ALIBABA_API_KEY as primary env var (easier to recognize)
  - Keep DASHSCOPE_API_KEY for compatibility with Alibaba's docs
  - ALIBABA_API_KEY takes precedence (like GOOGLE_API_KEY/GEMINI_API_KEY)
  - Update docs, tests, and test fixtures
@Pavanmanikanta98 Pavanmanikanta98 force-pushed the fix/qwen-omni-audio-encoding branch from 9e6b391 to 1126ba5 Compare December 17, 2025 17:03
@Pavanmanikanta98
Copy link
Contributor Author

@DouweM Done : )

Ready for review

@DouweM DouweM merged commit 81c0d4a into pydantic:main Dec 17, 2025
57 of 59 checks passed
@DouweM
Copy link
Collaborator

DouweM commented Dec 17, 2025

@Pavanmanikanta98 Thanks Pavan!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The way OpenAIChatModel sends input audio is incompatible with Qwen Omni API

2 participants