Skip to content

SlateQ agent implementation #698

@rahul-zomato

Description

@rahul-zomato

Is next_state deliberate here in next_q_values calculation in slateQ agent - https://github.com/facebookresearch/ReAgent/blob/main/reagent/training/slate_q_trainer.py#L230

SlateQ agent implemented by SlateQ paper authors in recsim uses state instead of next state from replay buffer to get next_q_values - google-research/recsim#26

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions