Skip to content

Some comments on CFG and flow matching #30

@nshmyrev

Description

@nshmyrev

Hi.

Thanks for the great code. I wanted to comment a bit on the internals, not 100% certain though:

  1. If you use cosine scheduler for flow matching, you need to use it both in inference and training probably, see cosyvoice:

https://github.com/KdaiP/StableTTS/blob/main/models/flow_matching.py#L89

https://github.com/FunAudioLLM/CosyVoice/blob/main/cosyvoice/flow/flow_matching.py#L67

  1. CFG only applies to diffusion, it is not quite correct to add it to the encoder as you use CFG to add noise to speakers, it degrades loss a bit but doesn't really add the quality.

https://github.com/KdaiP/StableTTS/blob/main/models/model.py#L141

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions