[Feature]: 后续会支持  即梦 (Jimeng) 和 Seedance 的 API 生成图片或者视频吗？

**<u>AgentScope-Java is an open-source project. To involve a broader community, we recommend asking your questions in English.</u>**


**Is your feature request related to a problem? Please describe.**
Currently, AgentScope Java provides built-in support for text-to-speech (TTS) via DashScope and multi-modal content understanding (image/video/audio as input), but there is no built-in model or tool support for **image generation** and **video generation**. Many real-world AI agent scenarios require the ability to create visual content — for example, marketing agents generating promotional images, design assistants producing concept art, or social media agents creating short videos. The [[Jimeng](https://jimeng.jianying.com/)](https://jimeng.jianying.com/) (即梦) and [[Seedance](https://www.doubao.com/seedance)](https://www.doubao.com/seedance) APIs from ByteDance/Douyin offer high-quality text-to-image and text-to-video generation capabilities, but there is currently no native integration in the framework.

**Describe the solution you'd like**
I'd like to see native support for image and video generation models, similar to how the framework already supports multiple chat models. Specifically:

1. **`ImageModel` interface** — a new model interface (similar to `Model`/`TTSModel`) with methods like `generate(prompt, options)` returning generated image data or URLs.

2. **`VideoModel` interface** — a similar interface for video generation.

3. **Built-in implementations**:
   - `JimengImageModel` — for ByteDance Jimeng text-to-image API
   - `SeedanceVideoModel` — for Seedance text-to-video API
   - (optionally) `OpenAIImageModel` for DALL-E as a reference

4. **Tool auto-registration** — when an `ImageModel` or `VideoModel` is attached to a `ReActAgent`, the framework could optionally register built-in tools like `generate_image(prompt)` and `generate_video(prompt)` that the agent can call during reasoning.

5. **ContentBlock extension** — optionally extend the `ContentBlock` system to natively represent generated images/videos in agent responses, making it easy to return and display generated media.

**Describe alternatives you've considered**

- **Custom Tool approach** — I can wrap the Jimeng/Seedance REST APIs as `@Tool` methods myself. This works for immediate needs but lacks standardization and doesn't benefit other users.
- **Using MCP servers** — there may be third-party MCP servers that expose image/video generation, but this adds external dependency and doesn't provide first-class framework integration.

Both alternatives require boilerplate for each project and don't leverage the framework's model abstraction layer (formatter, retry, tracing, etc.).

**Additional context**
The framework already has a clean model abstraction (`Model` → `ChatModelBase` → concrete models) with formatter, retry, tracing, and streaming support. Extending this pattern to image/video generation would make AgentScope a more complete multi-modal agent framework and align with industry trends (OpenAI's GPT-4o image editing, Google's Veo, etc.).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: 后续会支持即梦 (Jimeng) 和 Seedance 的 API 生成图片或者视频吗？ #1383

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: 后续会支持 即梦 (Jimeng) 和 Seedance 的 API 生成图片或者视频吗？ #1383

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Feature]: 后续会支持即梦 (Jimeng) 和 Seedance 的 API 生成图片或者视频吗？ #1383