Athena-PRM: Enhancing Large Reasoning Model with Data-efficient Process Reward Models

⚡️ This repository provides training recipes for the AMD Athena-PRM, which are designed to enhance the performance of large reasoning models and reduce the training costs.

Ennvironment

Please use docker image: sgl-llamafactory:latest (container id: 0d2d0f06d29d)

cd training
sh llamafactory_docker.sh

Start the docker and upgrade transformer to latest verision (>=4.49).

Training

For training the reward model:

sh rm_exp.sh

For fine-tuning Qwen2.5-VL-7B:

sft_qwen2.5vl7b.sh

If you want to export model, please remeber to change the model path:

sh export_model.sh

Evaluation

For evaluation, please use docker image `rocm_vllm_0.8.3:latest (container id: 661a8d989c9a)``.

text-only benchmark

For math and gsm8k, you can evaluation the model use

cd evaluation/Qwen2.5-Math/evaluation
pip install pebble latex2sympy2 word2number timeout_decorator
sh run_7b.sh & sh run_72b.sh

The results can be found at outputs/. If you want to get major vote or reward guided results, you can run:

python rm_maj_eval.py

Multi-modal benchmark

vp-bench

If you want to evaluation on visual process bench, you can run:

cd evalatuon/vp_bench
sh scripts/download_data.sh
sh scripts/vllm_docker.sh
sh scripts/run.sh

Other benchmarks

For the other benchmarks such as MathVision, MathVista and MMMU, you can run:

cd v2/
python convert_mathvision.py
python convert_mathvista.py
python convert_mmmu.py

Models

The fine-tuned model using rejection sampling, process reward model and the outcome reward model can be found at checkout/models.

Call to Action

You are welcome to download and try this model on AMD platforms. For more details on training, inferencing and insights of this model, please see our paper. AMD also offers a dedicated cloud infrastructure that includes latest GPU instances, visit AMD Developer Cloud for specific accessing request and usage. Furthermore, you can deploy advanced AI models on AMD Ryzen AI PCs and can learn more here.

For any questions, you may reach out to the AMD team at amd_ai_mkt@amd.com.

If you find this work help you, please cite:

@article{sh2025athena,
    title={Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum},
    author={"Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models"},
    journal={arXiv preprint arXiv:2506.09532}
    year={2025}   
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
evaluation		evaluation
training		training
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Athena-PRM: Enhancing Large Reasoning Model with Data-efficient Process Reward Models

Ennvironment

Training

Evaluation

text-only benchmark

Multi-modal benchmark

vp-bench

Other benchmarks

Models

Call to Action

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AMD-AGI/Athena-PRM

Folders and files

Latest commit

History

Repository files navigation

Athena-PRM: Enhancing Large Reasoning Model with Data-efficient Process Reward Models

Ennvironment

Training

Evaluation

text-only benchmark

Multi-modal benchmark

vp-bench

Other benchmarks

Models

Call to Action

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages