Skip to content

Question about the randomness of Gumbel softmax in Differentiable Reward Optimization (DiffRO) #8

@kayden-kim0702

Description

@kayden-kim0702

I am currently reproducing the Cosy3 model and have a question regarding the DiffRO process described in your paper.

When I applied Gumbel-Softmax, the results changed depending on the randomness of the sampling in Gumbel-Softmax, even though the input features were the same. Could you provide any tips on how to handle this randomness?
for example, adjusting temperature of gumbel softmax ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions