-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add Alibaba Cloud DashScopeProvider and support audio input for Qwen Omni
#3596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Alibaba Cloud DashScopeProvider and support audio input for Qwen Omni
#3596
Conversation
|
@DouweM , For the Qwen Omni integration specifically, I’d like to follow your suggestion and handle the Data URI requirement via a dedicated provider rather than changing the shared qwen_model_profile.
|
|
@Pavanmanikanta98 Thanks, makes sense. It should be just |
|
Hi @DouweM, |
|
Hi @DouweM, following up on this when you have time. Ready for review. |
docs/models/openai.md
Outdated
|
|
||
| ### Qwen | ||
|
|
||
| To use Qwen models via the OpenAI-compatible API from [Alibaba Cloud DashScope](https://www.alibabacloud.com/help/doc-detail/2712576.html), you can set the `QWEN_API_KEY` (or `DASHSCOPE_API_KEY`) environment variable and use [`QwenProvider`][pydantic_ai.providers.qwen.QwenProvider] by name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should the provider be named
AlibabaProviderinstead, and the prefixalibaba:, as that's the platform name, whole Qwen is a model family? OrDashScopeProvider? - Let's link to the product page rather than the docs: https://www.alibabacloud.com/en/product/modelstudio
- Let's support only 1 env var, likely
ALIBABA_API_KEYorDASHSCOPE_API_KEY
docs/models/openai.md
Outdated
|
|
||
| The `QwenProvider` uses the international DashScope compatible endpoint `https://dashscope-intl.aliyuncs.com/compatible-mode/v1` by default. | ||
|
|
||
| When using **Qwen Omni** models (e.g. `qwen-omni-turbo`), this provider automatically handles audio input using the Data URI format required by the DashScope API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop this, the user will assume everything just works, we don't have to explain the specific things we did to make it so
| elif item.is_audio: | ||
| assert item.format in ('wav', 'mp3') | ||
| audio = InputAudio(data=base64.b64encode(item.data).decode('utf-8'), format=item.format) | ||
| profile = OpenAIModelProfile.from_profile(self.profile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this just once at the top of the method
| profile = OpenAIModelProfile.from_profile(self.profile) | ||
| if profile.openai_chat_audio_input_encoding == 'uri': | ||
| mime_type = item.media_type or f'audio/{downloaded_item["data_type"]}' | ||
| data_uri = f'data:{mime_type};base64,{downloaded_item["data"]}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to do this ourselves, we can call download_item with data_format='base64_uri'
| @property | ||
| def base_url(self) -> str: | ||
| # Using the international endpoint by default as it's more standard for global users | ||
| # Users in China region can override this via passing `openai_client` or implementing logic to check region |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we take a base_url argument? And mention this in the docs
…r Omni models - Rename QwenProvider to DashScopeProvider with dashscope: prefix - Use single DASHSCOPE_API_KEY environment variable - Add base_url argument to DashScopeProvider constructor - Refactor audio mapping to fetch profile once at method top - Use download_item with base64_uri format for AudioUrl - Remove Qwen-specific mentions from docstrings - Update documentation with product page link and base_url example - Add comprehensive tests for DashScopeProvider Addresses all maintainer feedback from PR pydantic#3596 Fixes pydantic#3530
docs/models/openai.md
Outdated
| ... | ||
| ``` | ||
|
|
||
| ### DashScope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should mention it on docs/models/overview.md, docs/index.md and README.md where we mention all the other providers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### DashScope | |
| ### Alibaba Cloud DashScope |
DashScopeProvider and support audio input for Qwen Omni
|
Hi @DouweM, Addressed the feedback. Ready for review. |
docs/models/openai.md
Outdated
| ... | ||
| ``` | ||
|
|
||
| ### DashScope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### DashScope | |
| ### Alibaba Cloud DashScope |
docs/models/openai.md
Outdated
|
|
||
| ### DashScope | ||
|
|
||
| To use Qwen models via [Alibaba Cloud DashScope](https://www.alibabacloud.com/en/product/modelstudio), you can set the `DASHSCOPE_API_KEY` environment variable and use [`DashScopeProvider`][pydantic_ai.providers.dashscope.DashScopeProvider] by name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is DashScopeProvider really the most appropriate/recognizable name? I see DashScope mentioned only once on https://www.alibabacloud.com/en/product/modelstudio, so maybe it should just be AlibabaProvider?
|
Hi @DouweM, I'm happy to rename the provider to AlibabaProvider if that's the preferred direction for discovery, but I wanted to share my reasoning for choosing DashScopeProvider initially, backed by the official SDK and documentation:
If you still prefer AlibabaProvider for better recognizability, I can definitely make the switch! I just wanted to clarify that DashScopeProvider was chosen to match the specific API service name. Let me know what you think! |
|
@Pavanmanikanta98 It's a bit confusing/inconsistent, but I interpret these 2 sentences on https://www.alibabacloud.com/help/en/model-studio/first-api-call-to-qwen to mean that they actually prefer the platform to be called "Alibaba Cloud Model Studio", and
(this seems to contradict your saying that "DashScope" is the specific name of the OpenAI-compatible API service we are connecting to", as they here say "OpenAI endpoint OR DashScope")
(not "DashScope API key") But then again the URLs all include "dashscope" as well, so clearly it's not just the SDK. So maybe it's the name of their AI inference platform, but also they don't want to actually call it that in marketing material (anymore?) and say "Alibaba Cloud Model Studio" instead? I think the fact that https://www.alibabacloud.com/help/en/model-studio/what-is-model-studio only mentions "dashscope" in the code example is the deciding factor here. Someone who reads that page and wants to look for Pydantic AI support seems far more likely to scan a list for "Alibaba Cloud Model Studio" or (in one word) "Alibaba" than "DashScope". So yeah please make the rename, and where we can have the full name (in docs etc), we can say "Alibaba Cloud Model Studio (DashScope)". Of course Alibaba does a lot more, but it's similar to Azure and Bedrock are a little different because those are establish brand names in their own right; if Alibaba still called it All of that is to say, please make the change 😄 |
|
I've renamed DashScopeProvider to AlibabaProvider (using alibaba: prefix) and updated the documentation to reference "Alibaba Cloud Model Studio (DashScope)" as discussed. |
docs/models/openai.md
Outdated
|
|
||
| ### Alibaba Cloud Model Studio (DashScope) | ||
|
|
||
| To use Qwen models via [Alibaba Cloud Model Studio (DashScope)](https://www.alibabacloud.com/en/product/modelstudio), you can set the `DASHSCOPE_API_KEY` environment variable and use [`AlibabaProvider`][pydantic_ai.providers.alibaba.AlibabaProvider] by name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we additionally support ALIBABA_API_KEY please? Like for Google we support GOOGLE_ and GEMINI_ both. That'll be easier to recognize in a .env file for someone looking for the token that's used by AlibabaProvider who's not familiar with the DashScope name
This commit introduces `openai_audio_input_encoding` to `OpenAIModelProfile`, allowing users to choose between `'base64'` (default) and `'uri'` encoding for audio inputs. This addresses compatibility issues with providers like Qwen Omni that require Data URI format for audio data. Key changes: - Added `openai_audio_input_encoding` to `OpenAIModelProfile`. - Updated `OpenAIChatModel._map_user_prompt` to respect the configured encoding for `BinaryContent` and `AudioUrl`. - Added new tests in `tests/models/test_openai_audio.py` covering both encoding modes.
…i models - Add QwenProvider for DashScope OpenAI-compatible API - Rename openai_audio_input_encoding to openai_chat_audio_input_encoding - Use item.media_type for Data URI MIME types instead of hardcoded mapping - Automatically set Data URI audio encoding for Qwen Omni models - Add comprehensive tests for QwenProvider and audio encoding - Add Qwen documentation section to OpenAI-compatible models docs Fixes pydantic#3530
- Include 'qwen' in the model inference options for compatibility with Qwen models. - Set up environment variable for Qwen API key in test_examples.py to facilitate testing. This enhances the integration of Qwen models within the existing framework.
- Add tests for initializing QwenProvider with `openai_client` and `http_client` to ensure full branch coverage.
…r Omni models - Rename QwenProvider to DashScopeProvider with dashscope: prefix - Use single DASHSCOPE_API_KEY environment variable - Add base_url argument to DashScopeProvider constructor - Refactor audio mapping to fetch profile once at method top - Use download_item with base64_uri format for AudioUrl - Remove Qwen-specific mentions from docstrings - Update documentation with product page link and base_url example - Add comprehensive tests for DashScopeProvider Addresses all maintainer feedback from PR pydantic#3596 Fixes pydantic#3530
- Add DashScope to provider lists in README.md and docs/index.md - Add pydantic_ai.providers.dashscope to docs/api/providers.md - Merge test_openai_audio.py into test_openai.py and remove redundant test
- Rename DashScopeProvider → AlibabaProvider with prefix alibaba: - Keep DASHSCOPE_API_KEY env var (matches Alibaba official docs) - Update all documentation references - Add Alibaba Cloud to README.md, docs/index.md, docs/models/overview.md
- Add ALIBABA_API_KEY as primary env var (easier to recognize) - Keep DASHSCOPE_API_KEY for compatibility with Alibaba's docs - ALIBABA_API_KEY takes precedence (like GOOGLE_API_KEY/GEMINI_API_KEY) - Update docs, tests, and test fixtures
9e6b391 to
1126ba5
Compare
|
@DouweM Done : ) Ready for review |
|
@Pavanmanikanta98 Thanks Pavan! |
Fixes #3530
Key Changes:
encoding is 'uri'.