BIMU 1BIMU

🧑‍💻 About Me

🎓 Undergraduate student at Beijing University of Posts and Telecommunications (BUPT), School of Computer Science
🔬 Research interests: RLVR · RLHF · Optimization Algorithms
🌱 Currently exploring the intersection of reinforcement learning and large language model alignment
📍 Beijing, China

🔭 Research Interests

Area	Description
RLVR	Reinforcement Learning from Verifiable Rewards — scalable reward signals beyond human feedback
RLHF	Reinforcement Learning from Human Feedback — aligning LLMs with human preferences
Optimizer	Adaptive optimization methods (AdamW, Muon, Shampoo, etc.) for deep learning

📌 Pinned Repositories

APO_OFFICAL — The official repository for Anchored Policy Optimization: Mitigating Exploration Collapse via Support-Constrained Rectification ⭐ 12 🍴 0

SPPO — [ACL 2026 Main] SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks official repos. ⭐ 2 🍴 2

⚡ Recent Activity

No recent public activity.

📝 Latest Blog Posts

🛠️ Tech Stack

📊 GitHub Stats

📫 Contact

_{"The pursuit of intelligence — from theory to practice." · Last updated: auto-refreshed every 3 hours}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BIMU 1BIMU

Achievements

Achievements

Highlights

Block or report 1BIMU