BinaryPPO An offline LLM reinforcement learning framework that reformulates binary classification as a reward maximization problem.