diff --git a/README.md b/README.md index 87edd1d..a1773bd 100644 --- a/README.md +++ b/README.md @@ -8,11 +8,12 @@

blog paper - Hugging Face Hugging Face Spaces version python mit +
+ MOSS-TTSD-v0.5 # MOSS-TTSD 🪐 @@ -24,7 +25,7 @@ MOSS-TTSD (text to spoken dialogue) is an open-source bilingual spoken dialogue synthesis model that supports both Chinese and English. It can transform dialogue scripts between two speakers into natural, expressive conversational speech. MOSS-TTSD supports voice cloning and long single-session speech generation, making it ideal for AI podcast production, interviews, and chats. - For detailed information about the model and demos, please refer to our [Blog-en](https://www.open-moss.com/en/moss-ttsd/) and [中文博客](https://www.open-moss.com/cn/moss-ttsd/). You can also find the model on [Hugging Face](https://huggingface.co/fnlp/MOSS-TTSD-v0.5) and try it out in the [Spaces demo](https://huggingface.co/spaces/fnlp/MOSS-TTSD). + For detailed information about the model and demos, please refer to our [Blog-en](https://www.open-moss.com/en/moss-ttsd/) and [中文博客](https://www.open-moss.com/cn/moss-ttsd/). You can also find the model on [Hugging Face](https://huggingface.co/fnlp/MOSS-TTSD-v0.7) and try it out in the [Spaces demo](https://huggingface.co/spaces/fnlp/MOSS-TTSD). ## Highlights @@ -36,7 +37,7 @@ MOSS-TTSD supports voice cloning and long single-session speech generation, maki ## News 🚀 - - **[2025-11-01]** MOSS-TTSD v0.7 is released! v0.7 significantly improves audio quality, voice cloning capability, and stability, adds support for 32 kHz high‑quality output, greatly extends single‑pass generation length (960s→1700s), and more reliably generates speech events following speaker tags. We recommend using the v0.7 model by default. + - **[2025-11-01]** MOSS-TTSD v0.7 is released! v0.7 significantly improves audio quality, voice cloning capability, and stability, adds support for 32 kHz high‑quality output, greatly extends single‑pass generation length (960s→1700s). We recommend using the v0.7 model by default. [MOSS-TTSD v0.7 Model Address](https://huggingface.co/fnlp/MOSS-TTSD-v0.7) - **[2025-09-09]** We supported SGLang inference engine to accelerate model inference by up to **16x**. - **[2025-08-25]** We released the 32khz version of XY-Tokenizer. - **[2025-08-12]** We add support for streaming inference in MOSS-TTSD v0.5. @@ -59,7 +60,7 @@ pip install flash-attn ### Download XY-Tokenizer -You also need to download the XY Tokenizer model weights. You can find the weights in the [XY_Tokenizer repository](https://huggingface.co/fnlp/XY_Tokenizer_TTSD_V0_32k). +You also need to download the XY Tokenizer model weights. You can find the weights in the [XY-Tokenizer-TTSD version repository](https://huggingface.co/fnlp/MOSS_TTSD_tokenizer). ```bash mkdir -p XY_Tokenizer/weights diff --git a/README_zh.md b/README_zh.md index ecf5c79..d01b568 100644 --- a/README_zh.md +++ b/README_zh.md @@ -8,11 +8,12 @@

blog paper - Hugging Face Hugging Face Spaces version python mit +
+ MOSS-TTSD-v0.5 # MOSS-TTSD 🪐 @@ -22,7 +23,7 @@ ## 概述 MOSS-TTSD(text to spoken dialogue)是一个开源的中英双语口语对话合成模型,可以将包含两位说话人的对话脚本转换为自然、富有表现力的对话语音。MOSS-TTSD支持双说话人零样本音色克隆与长时间单段语音生成,非常适合播客,访谈,聊天等对话场景。 -详细模型介绍与演示请见我们的[中文博客](https://www.open-moss.com/cn/moss-ttsd/)和[Blog-en](https://www.open-moss.com/en/moss-ttsd/)。模型权重在 [Hugging Face](https://huggingface.co/fnlp/MOSS-TTSD-v0.5) 提供,并可在 [Spaces 演示](https://huggingface.co/spaces/fnlp/MOSS-TTSD) 在线体验。 +详细模型介绍与演示请见我们的[中文博客](https://www.open-moss.com/cn/moss-ttsd/)和[Blog-en](https://www.open-moss.com/en/moss-ttsd/)。模型权重在 [Hugging Face](https://huggingface.co/fnlp/MOSS-TTSD-v0.7) 提供,并可在 [Spaces 演示](https://huggingface.co/spaces/fnlp/MOSS-TTSD) 在线体验。 ## 亮点 @@ -34,7 +35,7 @@ MOSS-TTSD(text to spoken dialogue)是一个开源的中英双语口语对话 ## 最新动态 🚀 -- **[2025-11-01]** 我们发布了 MOSS-TTSD v0.7:显著提升了音质、声音克隆能力与稳定性,支持32khz高音质输出,并大幅拓展了单次生成长度(960s->1700s),更够比较稳定地根据说话人标签生成语音事件。 +- **[2025-11-01]** 我们发布了 MOSS-TTSD v0.7:显著提升了音质、声音克隆能力与稳定性,支持32khz高音质输出,并大幅拓展了单次生成长度(960s->1700s)。我们推荐默认使用MOSS-TTSD v0.7版本。[MOSS-TTSD v0.7 模型地址](https://huggingface.co/fnlp/MOSS-TTSD-v0.7) - **[2025-09-09]** 我们支持了 SGLang 推理引擎加速模型推理,最高可加速**16倍**。 - **[2025-08-25]** 我们发布了 32khz XY-Tokenizer。 - **[2025-08-12]** 我们支持了 MOSS-TTSD v0.5 的流式推理。 @@ -57,7 +58,7 @@ pip install flash-attn ### 下载 XY-Tokenizer 权重 -首先需要下载 XY-Tokenizer 的Codec模型权重,见 [XY_Tokenizer仓库](https://huggingface.co/fnlp/XY_Tokenizer_TTSD_V0_32k)。 +首先需要下载 XY-Tokenizer 的Codec模型权重,见[XY-Tokenizer-TTSD版本仓库](https://huggingface.co/fnlp/MOSS_TTSD_tokenizer)。 ```bash mkdir -p XY_Tokenizer/weights