🥋 LLM-Dojo

A lightweight playground for RLHF and SFT experiments, with support for RLVR, KD, and Guide-KD.

轻量级 RLHF/SFT 实验平台，支持 RLVR、KD 与 Guide-KD。

📋 Overview

模块	说明
`openrlhf-kd`	当前主线，基于 OpenRLHF 重构，实现 `RLVR` + `KD` + `Guide-KD`
`main_train.py`	简洁 `SFT` 训练入口

🎯 RLVR

openrlhf-kd 是这个仓库当前最核心的部分，基于 OpenRLHF 构建，具体训练使用可参见文档 openrlhf-kd/examples/README.md

主要改动：

精简框架，只保留 RLVR 部分，移除了 critic 等不需要的内容
增加 KD、Guide-KD 与 reward 的混合训练，支持按 datasource 路由

✏️ SFT

根目录的 SFT 部分保持了比较简洁的训练入口，适合快速微调实验。

特性：

支持 Deepspeed
支持 LoRA、QLoRA、全参微调
自动适配 chat template

示例文件可参见 data/sft_data.jsonl

Quick Start：

bash run_example.sh

或：

deepspeed --include localhost:0,1 main_train.py \
  --train_data_path /path/to/data.jsonl \
  --model_name_or_path /path/to/model \
  --task_type sft \
  --train_mode qlora \
  --output_dir /path/to/output

Name		Name	Last commit message	Last commit date
Latest commit History 268 Commits
data		data
llm_tricks		llm_tricks
openrlhf-kd		openrlhf-kd
train_args		train_args
utils		utils
.gitattributes		.gitattributes
README.md		README.md
main_train.py		main_train.py
multinode_run.sh		multinode_run.sh
requirements.txt		requirements.txt
run_example.sh		run_example.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🥋 LLM-Dojo

📋 Overview

🎯 RLVR

✏️ SFT

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🥋 LLM-Dojo

📋 Overview

🎯 RLVR

✏️ SFT

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages