You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project trains a small language model (FunctionGemma-270M) to play text adventure games using a novel **Imitation Learning → GRPO** pipeline. The model learns to generate reasoning traces before taking actions, creating a transparent decision-making process.