Skip to content
View asd765973346's full-sized avatar
  • Peking University
  • Beijing

Block or report asd765973346

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
asd765973346/README.md

粟泽力 / Zeli Su

About

I am a master's student in Electronic Information at Minzu University of China and an incoming PhD student at the Data-Centric AI (DCAI) group, Peking University. I am currently looking for research internship opportunities related to LLM post-training, RL for LLMs, and Agent capability construction.

My research focuses on how large language models acquire, improve, and preserve capabilities through post-training. I am interested in reinforcement learning for LLMs, instruction-following robustness, semantic-reward optimization, data and benchmark construction, and Agent capability evaluation. My recent work studies post-training methods such as GRPO-based semantic reward learning, robustness recovery under distracting instructions, automatic benchmark generation for data-analysis agents, and systematic monitoring of Agent potential in pretraining and midtraining models.

Earlier in my research, I worked extensively on multilingual and low-resource language modeling, including datasets, models, and benchmarks for minority languages in China. I now treat these settings as important testbeds for studying capability expansion, alignment preservation, and post-training reliability.

我是中央民族大学电子信息专业硕士生,即将加入北京大学 Data-Centric AI (DCAI) 课题组攻读博士。目前希望寻找大模型后训练、RL4LLM 与 Agent 能力构建相关的科研实习机会。

我的研究关注大语言模型如何通过后训练获得、提升并保持复杂能力,具体包括 RL for LLM、指令遵循鲁棒性、语义奖励优化、数据与 benchmark 构建,以及 Agent 能力评测。近期工作围绕 GRPO 语义奖励强化学习、噪声指令干扰下的鲁棒性恢复、数据分析 Agent benchmark 自动构建,以及 pretrain/midtrain 阶段模型 Agent 潜能监测展开。

早期研究主要集中在多语言与低资源语言建模,包括中国少数民族语言的数据、模型和评测基准建设。现在我更多将这些场景作为研究模型能力扩展、对齐保持与后训练可靠性的实验载体。

Research Interests

  • LLM post-training, RLHF/RLVR, GRPO, and semantic-reward optimization
  • Instruction following, robustness, and reliability under noisy or conflicting contexts
  • Agentic RL, tool-use evaluation, and Agent capability monitoring
  • Data synthesis, trajectory construction, and automatic benchmark generation
  • Multilingual and low-resource capability expansion as post-training testbeds

Current Experience

  • Data-Centric AI (DCAI) Group, Peking University, Research Intern / Research Assistant, 2025.09 - Present
    Working on research and open-source systems related to Data-Centric AI, LLM data systems, Agents, benchmark construction, and AI4Science.

  • Foundation Model Group (Ling), Ant Group, Research Intern, 2025.10 - Present
    Working on instruction-following post-training optimization for Ling/Ring foundation models, including failure-mode analysis, training-feedback diagnosis, targeted data construction, and robustness-oriented alignment research.

Accepted Papers

  • Reinforcement Learning with Semantic Rewards Enables Low-Resource Language Expansion without Alignment Tax
    ACL 2026 Findings, first author. A GRPO-based semantic-reward post-training method for capability expansion while preserving general alignment. [arXiv]

  • InsightBenchMaker: Towards Generating Evolving and High-Fidelity Benchmarks for Data-Analysis Agents
    ACL 2026 Findings, co-first author. An automatic benchmark construction framework for data-analysis agents with heterogeneous data, executable verification, and high-fidelity insight-data alignment.

  • FTibSuite: A Comprehensive Resource Suite for Tibetan Vision-Language Modeling
    ACL 2026 Findings, co-first and corresponding author. A resource suite for Tibetan vision-language modeling, including data, benchmarks, and baseline models. [arXiv]

  • Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages
    ACL 2025 Main, first author. A shared-weight pretraining framework for adapting multilingual encoders to extremely low-resource generation tasks. [arXiv] [Code/Model]

  • CMHG: A Dataset and Benchmark for Headline Generation of Minority Languages in China
    EMNLP 2025 Main, advisor as first author; co-first author. A dataset and benchmark for headline generation in Tibetan, Uyghur, and Mongolian. [arXiv] [Dataset]

Preprints / Under Review

  • The Curse of Helpfulness: Inverse Scaling Law in Robustness to Distractor Instructions via DistractionIF
    A diagnostic benchmark and analysis of instruction-following robustness under distracting pseudo-instructions in reference text, with GRPO-based robustness recovery. [arXiv]

  • Base-model Agent-Potential Monitoring: Probing ReAct-like Interaction Ability in Pretrain and Midtrain Models
    A framework for monitoring Agent-like interaction ability in base and midtraining models through few-shot reasoning, structured action generation, and real environment interaction.

  • Source-Grounded Semantic Reinforcement Learning for Low-Resource Target-Language Generation
    A source-grounded semantic RL framework for reference-free low-resource target-language generation. [arXiv]

  • Expert Attention: Routing-Guided Static Pruning for Transformer Encoders
    A routing-guided structured pruning method for Transformer encoders.

Open Source

  • Paper2Any: An open-source system that turns papers, images, and text into editable research figures, technical route diagrams, and presentation materials. I contributed to DrawIO/PPT generation, PPTPolish, and knowledge-enhanced generation modules. [GitHub] GitHub stars

  • Open-NotebookLM: An open-source document-centered knowledge system supporting semantic retrieval, evidence-grounded QA, AI-assisted note generation, mind maps, slides, podcasts, quizzes, and deep research reports. I contributed to multi-source document ingestion and generation modules for slides, mind maps, DrawIO diagrams, podcasts, quizzes, and deep research reports. [GitHub] GitHub stars

Contact

Pinned Loading

  1. xlm-swcm xlm-swcm Public

    XLM-SWCM (Cross-lingual Language Model with Shared Weights Cross-lingual Modeling)

    Python 6 1

  2. OpenDCAI/Paper2Any OpenDCAI/Paper2Any Public

    Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.

    Python 2.6k 183

  3. OpenDCAI/ThinkFlow OpenDCAI/ThinkFlow Public

    An Open Source implementation of Notebook LM.

    Python 74 24