This document provides a ready-to-use math evaluation program for the VibeThinker-1.5B model.
First, you need to clone the deepcoder branch of the rllm project.
git clone https://github.com/rllm-org/rllm.git
cd rllm
git checkout deepcoderMove the provided main_evaluation.py file into the verl/verl/trainer/ directory of the rllm project.
Navigate to the rllm directory and install the required environment and dependencies according to its official guide.
# Example installation command. Please refer to the deepcoder project for specific requirements.
pip install -e ./verl
pip install -e .Modify the following two variables in the eval_model.sh script:
OUTPUT_DIR: Specify the output directory for the evaluation results.DATA_PATH: Specify the path to the evaluation dataset. We have provided the processed dataset in thedatafolder; please point this variable to its location.
For example:
# eval_model.sh
OUTPUT_DIR="./eval_output"
DATA_PATH="../data" # Assuming the data folder is in the parent directory of the rllm projectAfter completing the configuration, execute the evaluation script:
bash eval_model.sh- Multi-GPU and Multi-Node Support: The evaluation code natively supports distributed evaluation across multiple GPUs and nodes to accelerate the process.
- Pre-processed Dataset: We provide a pre-processed evaluation dataset in the
datafolder, allowing you to start the evaluation directly.
This evaluation program is built upon the rllm/deepcoder project. Thanks to the original authors for their contributions.