⚡️ This repository provides training recipes for the AMD Athena-PRM, which are designed to enhance the performance of large reasoning models and reduce the training costs.
Please use docker image: sgl-llamafactory:latest (container id: 0d2d0f06d29d)
cd training
sh llamafactory_docker.shStart the docker and upgrade transformer to latest verision (>=4.49).
For training the reward model:
sh rm_exp.sh
For fine-tuning Qwen2.5-VL-7B:
sft_qwen2.5vl7b.sh
If you want to export model, please remeber to change the model path:
sh export_model.sh
For evaluation, please use docker image `rocm_vllm_0.8.3:latest (container id: 661a8d989c9a)``.
For math and gsm8k, you can evaluation the model use
cd evaluation/Qwen2.5-Math/evaluation
pip install pebble latex2sympy2 word2number timeout_decorator
sh run_7b.sh & sh run_72b.sh
The results can be found at outputs/. If you want to get major vote or reward guided results, you can run:
python rm_maj_eval.py
If you want to evaluation on visual process bench, you can run:
cd evalatuon/vp_bench
sh scripts/download_data.sh
sh scripts/vllm_docker.sh
sh scripts/run.sh
For the other benchmarks such as MathVision, MathVista and MMMU, you can run:
cd v2/
python convert_mathvision.py
python convert_mathvista.py
python convert_mmmu.py
The fine-tuned model using rejection sampling, process reward model and the outcome reward model can be found at checkout/models.
You are welcome to download and try this model on AMD platforms. For more details on training, inferencing and insights of this model, please see our paper. AMD also offers a dedicated cloud infrastructure that includes latest GPU instances, visit AMD Developer Cloud for specific accessing request and usage. Furthermore, you can deploy advanced AI models on AMD Ryzen AI PCs and can learn more here.
For any questions, you may reach out to the AMD team at amd_ai_mkt@amd.com.
If you find this work help you, please cite:
@article{sh2025athena,
title={Shuai Wang, Zhenhua Liu, Jiaheng Wei, Xuanwu Yin, Dong Li, Emad Barsoum},
author={"Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models"},
journal={arXiv preprint arXiv:2506.09532}
year={2025}
}