Skip to content

Commit 133434b

Browse files
authored
update info (#8)
1 parent d38932c commit 133434b

1 file changed

Lines changed: 17 additions & 10 deletions

File tree

README.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
[![license](https://img.shields.io/github/license/InternLM/opencompass.svg)](./LICENSE)
55
[![arXiv](https://img.shields.io/badge/arXiv-2502.06781-b31b1b.svg)](https://arxiv.org/abs/2502.06781)
6-
[![huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-OREAL-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/internlm/OREAL-32B)
6+
[![huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-OREAL-ffc107?color=ffc107&logoColor=white)](https://huggingface.co/collections/internlm/oreal-67aaccf5a8192c1ba3cff018)
77

88

99
## ✨ Introduction
@@ -30,12 +30,15 @@ With OREAL, for the first time, a 7B model can obtain 94.0 pass@1 accuracy on MA
3030

3131
![main_table](./figures/main_table.png)
3232

33-
## 🤗 HuggingFace Model Zoo
33+
## 🤗 HuggingFace
34+
35+
### Model
3436

3537
Our OREAL models are available on Hugging Face 🤗:
3638

3739
| Model | Huggingface Repo |
3840
|----------|------------------|
41+
| OREAL-DeepSeek-R1-Distill-Qwen-7B | [Model Link](https://huggingface.co/internlm/OREAL-DeepSeek-R1-Distill-Qwen-7B) |
3942
| OREAL-7B | [Model Link](https://huggingface.co/internlm/OREAL-7B) |
4043
| OREAL-32B | [Model Link](https://huggingface.co/internlm/OREAL-32B) |
4144

@@ -46,6 +49,13 @@ We also release the models of SFT version. You can construct your own RL pipelin
4649
| OREAL-7B-SFT | [Model Link](https://huggingface.co/internlm/OREAL-7B-SFT) |
4750
| OREAL-32B-SFT | [Model Link](https://huggingface.co/internlm/OREAL-32B-SFT) |
4851

52+
### Data
53+
54+
We release the prompts utilzed in our RL training phase.
55+
56+
| Dataset | Huggingface Repo |
57+
|----------|------------------|
58+
| RL Prompts | [Model Link](https://huggingface.co/datasets/internlm/OREAL-RL-Prompts) |
4959

5060
## 🚄 Training Tutorial
5161

@@ -117,14 +127,11 @@ More detailed training settings can be found in the [oreal/configs](./oreal/conf
117127
## 🖊️ Citation
118128

119129
```
120-
@misc{lyu2025exploringlimitoutcomereward,
121-
title={Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning},
122-
author={Chengqi Lyu and Songyang Gao and Yuzhe Gu and Wenwei Zhang and Jianfei Gao and Kuikun Liu and Ziyi Wang and Shuaibin Li and Qian Zhao and Haian Huang and Weihan Cao and Jiangning Liu and Hongwei Liu and Junnan Liu and Songyang Zhang and Dahua Lin and Kai Chen},
123-
year={2025},
124-
eprint={2502.06781},
125-
archivePrefix={arXiv},
126-
primaryClass={cs.CL},
127-
url={https://arxiv.org/abs/2502.06781},
130+
@article{lyu2025exploring,
131+
title={Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning},
132+
author={Lyu, Chengqi and Gao, Songyang and Gu, Yuzhe and Zhang, Wenwei and Gao, Jianfei and Liu, Kuikun and Wang, Ziyi and Li, Shuaibin and Zhao, Qian and Huang, Haian and others},
133+
journal={arXiv preprint arXiv:2502.06781},
134+
year={2025}
128135
}
129136
```
130137

0 commit comments

Comments
 (0)