Some comments on CFG and flow matching

Hi.

Thanks for the great code. I wanted to comment a bit on the internals, not 100% certain though:

1. If you use cosine scheduler for flow matching, you need to use it both in inference and training probably, see cosyvoice:

https://github.com/KdaiP/StableTTS/blob/main/models/flow_matching.py#L89

https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/flow/flow_matching.py#L67

2. CFG only applies to diffusion, it is not quite correct to add it to the encoder as you use CFG to add noise to speakers, it degrades loss a bit but doesn't really add the quality.

https://github.com/KdaiP/StableTTS/blob/main/models/model.py#L141


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some comments on CFG and flow matching #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some comments on CFG and flow matching #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions