|
1 | | -# GraphPRM |
| 1 | +# GraphPRM: Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners |
2 | 2 |
|
3 | | -Code and data for KDD 2025 Research Track Anonymous Submission: "Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners" |
| 3 | +<div align="left"> |
| 4 | + <p> |
| 5 | + <a href='https://arxiv.org/abs/2503.00845'><img src='https://img.shields.io/badge/arXiv-2503.00845-b31b1b'></a> |
| 6 | + <a href='https://huggingface.co/datasets/GraphPRM/GraphSilo'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-GraphSilo-blue'></a> |
| 7 | + <a href='https://huggingface.co/GraphPRM/GraphPRM-7B'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-GraphPRM-purple'></a> |
| 8 | + <a href='https://github.com/GKNL/GraphPRM'><img src='https://img.shields.io/badge/GitHub-GraphPRM-green'></a> |
| 9 | + </p> |
| 10 | +</div> |
4 | 11 |
|
5 | | -## Dataset and Model Weight Link |
| 12 | +**GraphPRM** is the first Process Reward Model tailored for graph reasoning tasks, which further enhancing LLMs' mathematical reasoning capabilities on other reasoning domains, including mathematical problem-solving tasks. We also developed **GraphSilo**, the largest dataset for graph reasoning with fine-grained CoT solutions, with 118,189 samples and 394,165 step-wise labels. |
6 | 13 |
|
7 | | -**Full dataset can also be accessed at:** [GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo), [GraphSilo-Test](https://huggingface.co/datasets/GraphPRM/GraphSilo-Test) (Anonymous Repository) |
| 14 | +This repository contains the code and data for training and evaluating GraphPRM models, along with the full GraphSilo dataset. Please check our [paper](https://arxiv.org/abs/2503.00845) for more details. |
8 | 15 |
|
9 | | -**Full GraphPRM model weight can be accessed at:** [GraphPRM-1.5B](https://huggingface.co/GraphPRM/GraphPRM-1.5B), [GraphPRM-7B](https://huggingface.co/GraphPRM/GraphPRM-7B) (Anonymous Repository) |
| 16 | +<p align="center"> |
| 17 | + <img src="image/overview.jpg" width="800px"/> |
| 18 | +</p> |
10 | 19 |
|
11 | | -## Key File Descriptions |
| 20 | +## 💫 News |
12 | 21 |
|
13 | | -### `data/` |
| 22 | +- **[2025.05.15]** GraphPRM is accepted to **KDD 2025 Research Track**. 🔥🔥🔥 |
| 23 | +- **[2025.02.15]** Initial release of 🤗[GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo) dataset and 🤗[GraphPRM](https://huggingface.co/GraphPRM/GraphPRM-7B) models. 🚀🚀🚀 |
14 | 24 |
|
15 | | -- `GraphSilo/`: Training set for GraphPRM model (containing step-wise labels from "Task-oriented Trajectories" and "Monte Carlo Estimation"). |
| 25 | +## 📊 Dataset and Models |
16 | 26 |
|
17 | | -- `GraphSilo_test/`: Test set of 13 graph tasks in GraphSilo. |
18 | | - - `[graph_task].jsonl`: Test samples for corresponding graph tasks. |
19 | | - - `GraphSilo_test_in_domain.jsonl`: Test samples for 10 in-domain graph tasks (that used to train GraphPRM): Degree, Clustering Coefficient, Jaccard, Common Connectivity, Diameter, Page Rank, MST, Maximum Flow, Predecessor. |
20 | | - - `GraphSilo_test_out_domain.jsonl`: Test samples for 3 out-domain graph tasks (that not used to train GraphPRM): BFS, Neighbor, Cycle. |
21 | | - - `GraphSilo_test.jsonl`: All test samples including 13 graph tasks. |
| 27 | +The full GraphSilo dataset and GraphPRM models can be accessed at: |
22 | 28 |
|
23 | | -### `prm/` |
| 29 | +- **GraphSilo Dataset**: [GraphSilo](https://huggingface.co/datasets/GraphPRM/GraphSilo), [GraphSilo-Test](https://huggingface.co/datasets/GraphPRM/GraphSilo-Test) |
| 30 | +- **GraphPRM Models**: [GraphPRM-1.5B](https://huggingface.co/GraphPRM/GraphPRM-1.5B), [GraphPRM-7B](https://huggingface.co/GraphPRM/GraphPRM-7B) |
24 | 31 |
|
25 | | -- `code/finetune_qwen_SFT.py`: Codes for SFT training GraphPRM with step-wise labels from GraphSilo. |
26 | | -- `config/deepspeed_config_stage3.json`: Configuration for deepspeed stage3 training. |
| 32 | +## 📦 Installation |
27 | 33 |
|
28 | | -### `reason/` |
29 | | - |
30 | | -- `llm_service/create_service_graph.sh`: Script to start LM and RM services. |
31 | | - |
32 | | -### `scripts/` |
33 | | - |
34 | | -- `eval/best_of_N.sh`: Perform inference-time computation via Best-of-N strategy with GraphPRM. |
35 | | -- `eval/beam_search.sh`: Perform inference-time computation via Beam Search strategy with GraphPRM. |
36 | | - |
37 | | -## Usage Instructions |
38 | | - |
39 | | -### Installation |
40 | | - |
41 | | -``` |
| 34 | +```bash |
42 | 35 | conda create -n GraphPRM python=3.10 |
43 | 36 | conda activate GraphPRM |
44 | 37 | pip install -r requirements.txt |
45 | | -pip3 install "fschat[model_worker,webui]" |
| 38 | +pip3 install "fschat[model_worker,webui]" |
46 | 39 | pip install -U pydantic |
47 | 40 | cd envs/MATH/latex2sympy |
48 | 41 | pip install -e . |
49 | 42 | cd - |
50 | 43 | ``` |
51 | 44 |
|
| 45 | +## 🛠️ Usage |
| 46 | + |
52 | 47 | ### Download Models |
53 | 48 |
|
54 | 49 | Before running the project, please ensure that all required base models are downloaded to directory `hugging_cache`. |
55 | 50 |
|
56 | | -1. Download base LLM models: `Qwen2.5-1.5B-Instruct, Qwen2.5-7B-Instruct, Qwen2.5-Math-7B-Instruct, LLaMA3.1-8B-Instruct, Gemma2-9B-Instruct` |
57 | | -2. Download GraphPRM models: `GraphPRM-7B` |
58 | | - |
59 | | -To download these models, please refer to the [Hugging Face model downloading tutorial](https://huggingface.co/docs/hub/models-downloading) for step-by-step guidance on downloading models from the Hugging Face Hub. |
60 | | - |
61 | 51 | ### Start LM & RM Services |
62 | 52 |
|
63 | | -Before running inference, please modify the following variables in the script at `reason/llm_service/create_service.sh` to set the appropriate base models: |
64 | | - |
65 | | -- `$MODEL_BASE`: Set this to the directory where the models are stored. |
66 | | -- `$POLICY_MODEL_NAME`: Set this to the name of the policy model. |
67 | | -- `$VALUE_MODEL_NAME`: Set this to the name of the graph reward model. |
68 | | -- `$NUM_LM_WORKER`: Set this to the number of language model (LM) workers to start. |
69 | | -- `$NUM_RM_WORKER`: Set this to the number of reward model (RM) workers to start. |
| 53 | +1. Modify the following variables in `reason/llm_service/create_service.sh`: |
| 54 | + - `$MODEL_BASE`: Directory where models are stored |
| 55 | + - `$POLICY_MODEL_NAME`: Name of the policy model |
| 56 | + - `$VALUE_MODEL_NAME`: Name of the graph reward model |
| 57 | + - `$NUM_LM_WORKER`: Number of language model workers |
| 58 | + - `$NUM_RM_WORKER`: Number of reward model workers |
70 | 59 |
|
71 | | -Then it prepares and runs inference using different techniques. |
72 | | - |
73 | | -For example, to start the LM and RM services for scaling inference-time computing with GraphPRM, run the following command: |
| 60 | +2. Start the services: |
74 | 61 | ```bash |
75 | 62 | sh reason/llm_service/create_service.sh |
76 | 63 | ``` |
77 | 64 |
|
78 | | -To kill the server processes, recommend using the following command: |
| 65 | +3. To stop the services: |
79 | 66 | ```bash |
80 | 67 | tmux kill-session -t {Your Session Name} # default is `GraphPRM` |
81 | 68 | ``` |
82 | 69 |
|
83 | | -### Run GraphPRM Self-supervised Finetuning |
| 70 | +### Training GraphPRM |
| 71 | + |
84 | 72 | ```bash |
85 | 73 | cd prm/code |
86 | 74 |
|
87 | | -CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune_qwen_SFT.py |
88 | | - --model_path $YOUR_MODEL_PATH \ |
89 | | - --data_path $YOUR_DATA_FOLDER_PATH |
| 75 | +CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 finetune_qwen_SFT.py \ |
| 76 | + --model_path $YOUR_MODEL_PATH \ |
| 77 | + --data_path $YOUR_DATA_FOLDER_PATH |
90 | 78 | ``` |
91 | 79 |
|
92 | | -### Perform Inference-time Computation with GraphPRM |
| 80 | +### Inference Methods |
93 | 81 |
|
94 | | -#### Best-of-N |
| 82 | +#### Best-of-N Strategy |
95 | 83 | ```bash |
96 | 84 | export PYTHONPATH=$(pwd) |
97 | | - |
98 | 85 | sh scripts/eval/cot_rerank.sh |
99 | 86 |
|
100 | 87 | # Key parameters: |
101 | | -# --LM Qwen2.5-7B-Instruct # The name of Policy Model |
102 | | -# --RM GraphPRM-7B # The name of Reward Model |
103 | | -# --temperature 0.7 # The temperature hyper-parameter during generation |
104 | | -# --num_sequence 8 # The number of generated samples during generation |
105 | | -# --max_new_tokens 2048 # Max new token number during generation |
106 | | -# --test_set_path dataset/GraphSilo_test.jsonl # The path to test data file |
107 | | - |
| 88 | +# --LM Qwen2.5-7B-Instruct # Policy Model name |
| 89 | +# --RM GraphPRM-7B # Reward Model name |
| 90 | +# --temperature 0.7 # Generation temperature |
| 91 | +# --num_sequence 8 # Number of generated samples |
| 92 | +# --max_new_tokens 2048 # Max new tokens |
| 93 | +# --test_set_path dataset/GraphSilo_test.jsonl # Test data path |
108 | 94 | ``` |
109 | 95 |
|
110 | | -#### Beam Search |
| 96 | +#### Beam Search Strategy |
111 | 97 | ```bash |
112 | 98 | export PYTHONPATH=$(pwd) |
113 | | - |
114 | 99 | sh scripts/eval/beam_search.sh |
115 | 100 |
|
116 | 101 | # Key parameters: |
117 | | -# --LM Qwen2.5-7B-Instruct # The name of Policy Model |
118 | | -# --RM GraphPRM-7B # The name of Reward Model |
119 | | -# --temperature 0.7 # The temperature hyper-parameter during generation |
120 | | -# --num_sequence 2 # The number of samples to remain per step |
121 | | -# --tree_max_width 4 # The number of generated samples per step during generation |
122 | | -# --tree_max_depth 50 # Max step number |
123 | | -# --max_new_tokens 2048 # Max new token number during generation |
124 | | -# --test_set_path dataset/GraphSilo_test.jsonl # The path to test data file |
| 102 | +# --LM Qwen2.5-7B-Instruct # Policy Model name |
| 103 | +# --RM GraphPRM-7B # Reward Model name |
| 104 | +# --temperature 0.7 # Generation temperature |
| 105 | +# --num_sequence 2 # Samples per step |
| 106 | +# --tree_max_width 4 # Generated samples per step |
| 107 | +# --tree_max_depth 50 # Max steps |
| 108 | +# --max_new_tokens 2048 # Max new tokens |
| 109 | +# --test_set_path dataset/GraphSilo_test.jsonl # Test data path |
| 110 | +``` |
| 111 | + |
| 112 | +## 📁 Project Structure |
| 113 | + |
| 114 | +``` |
| 115 | +GraphPRM/ |
| 116 | +├── data/ |
| 117 | +│ ├── GraphSilo/ |
| 118 | +│ │ ├── train.jsonl |
| 119 | +│ │ └── step_wise_labels.jsonl |
| 120 | +│ └── GraphSilo_test/ |
| 121 | +│ ├── in_domain/ |
| 122 | +│ │ ├── degree.jsonl |
| 123 | +│ │ ├── clustering_coefficient.jsonl |
| 124 | +│ │ ├── jaccard.jsonl |
| 125 | +│ │ └── ... |
| 126 | +│ └── out_domain/ |
| 127 | +│ ├── bfs.jsonl |
| 128 | +│ ├── neighbor.jsonl |
| 129 | +│ └── cycle.jsonl |
| 130 | +├── prm/ |
| 131 | +│ ├── code/ |
| 132 | +│ │ └── finetune_qwen_SFT.py |
| 133 | +│ └── config/ |
| 134 | +│ └── deepspeed_config_stage3.json |
| 135 | +├── reason/ |
| 136 | +│ └── llm_service/ |
| 137 | +│ └── create_service_graph.sh |
| 138 | +└── scripts/ |
| 139 | + └── eval/ |
| 140 | + ├── best_of_N.sh |
| 141 | + └── beam_search.sh |
| 142 | +``` |
| 143 | + |
| 144 | +### Key Components |
| 145 | + |
| 146 | +- **data/**: Contains the GraphSilo dataset |
| 147 | + - `GraphSilo/`: Training set with step-wise reasoning trajectories |
| 148 | + - `GraphSilo_test/`: Test set for 13 graph tasks |
| 149 | + - In-domain tasks (10): Degree, Clustering Coefficient, Jaccard, etc. |
| 150 | + - Out-domain tasks (3): BFS, Neighbor, Cycle |
125 | 151 |
|
| 152 | +- **prm/**: Process Reward Modeling related code |
| 153 | + - `code/`: SFT training code |
| 154 | + - `config/`: DeepSpeed configuration files for training |
| 155 | + |
| 156 | +- **reason/**: Reasoning service implementation |
| 157 | + - `llm_service/`: Service startup and management scripts |
| 158 | + |
| 159 | +- **scripts/**: Evaluation and utility scripts |
| 160 | + - `eval/`: Inference scripts for different strategies |
| 161 | + |
| 162 | +## Acknowledge |
| 163 | +Some code implementations are built upon [OpenR](https://github.com/openreasoner/openr) Repository. We sincerely appreciate the efforts for their contributions. |
| 164 | + |
| 165 | +## 📜 Citation |
| 166 | + |
| 167 | +If you find GraphPRM useful for your research and applications, please kindly cite using this BibTeX: |
| 168 | + |
| 169 | +``` |
| 170 | +@misc{graphprm, |
| 171 | + title={Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners}, |
| 172 | + author={Miao Peng and Nuo Chen and Zongrui Suo and Jia Li}, |
| 173 | + year={2025}, |
| 174 | + eprint={2503.00845}, |
| 175 | + archivePrefix={arXiv}, |
| 176 | + primaryClass={cs.CL}, |
| 177 | + url={https://arxiv.org/abs/2503.00845}, |
| 178 | +} |
126 | 179 | ``` |
0 commit comments