Skip to content
View 1BIMU's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing

Highlights

  • Pro

Block or report 1BIMU

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
1BIMU/README.md

Typing SVG

Profile Views

๐Ÿง‘โ€๐Ÿ’ป About Me

  • ๐ŸŽ“ Undergraduate student at Beijing University of Posts and Telecommunications (BUPT), School of Computer Science
  • ๐Ÿ”ฌ Research interests: RLVR ยท RLHF ยท Optimization Algorithms
  • ๐ŸŒฑ Currently exploring the intersection of reinforcement learning and large language model alignment
  • ๐Ÿ“ Beijing, China

๐Ÿ”ญ Research Interests

Area Description
RLVR Reinforcement Learning from Verifiable Rewards โ€” scalable reward signals beyond human feedback
RLHF Reinforcement Learning from Human Feedback โ€” aligning LLMs with human preferences
Optimizer Adaptive optimization methods (AdamW, Muon, Shampoo, etc.) for deep learning

๐Ÿ“Œ Pinned Repositories

APO_OFFICAL โ€” The official repository for Anchored Policy Optimization: Mitigating Exploration Collapse via Support-Constrained Rectification Python โญ 12 ๐Ÿด 0

SPPO โ€” [ACL 2026 Main] SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks official repos. Python โญ 2 ๐Ÿด 2


โšก Recent Activity

No recent public activity.


๐Ÿ“ Latest Blog Posts


๐Ÿ› ๏ธ Tech Stack

Python PyTorch C++ Linux Git LaTeX


๐Ÿ“Š GitHub Stats

GitHub Streak


๐Ÿ“ซ Contact

GitHub Email Zhihu


"The pursuit of intelligence โ€” from theory to practice." ยท Last updated: auto-refreshed every 3 hours

Pinned Loading

  1. APO_OFFICAL APO_OFFICAL Public

    The official repository for Anchored Policy Optimization: Mitigating Exploration Collapse via Support-Constrained Rectification

    Python 12

  2. SPPO SPPO Public

    [ACL 2026 Main] SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks official repos.

    Python 2 2