MATH RELATED

We provide the thinking chain text generated by ChatGPT/GPT4 through ICL on the MATH training set (8 for each piece of data), which are saved in the data folder.

We also provide the code for fine-tuning llama on these datasets, as follows:

step1:

prepare llama-7b checkpoint and store it in the main directory

step2:

prepare conda environment following requirements.txt

step3:

conda activate llm

step4:

finetune

bash finetune.sh

step5:

infer

bash infer.sh

step6:

eval

bash eval.sh

The following are related papers and work we have organized：

一、Involves distillation on mathematical reasoning tasks

1. Teaching Small Language Models to Reason（Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn）

paper：https://arxiv.org/abs/2212.08410

2. Specializing Smaller Language Models towards Multi-Step Reasoning（Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot）

paper：https://arxiv.org/abs/2301.12726

code：https://github.com/FranxYao/FlanT5-CoT-Specialization

dataset：Google Drive

3. Large Language Models Are Reasoning Teachers（Namgyu Ho, Laura Schmid, Se-Young Yun）

paper：https://arxiv.org/abs/2212.10071

code：https://github.com/itsnamgyu/reasoning-teacher

dataset：Dropbox 、 Google Drive

4. PaD: Program-aided Distillation Specializes Large Models in Reasoning（Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xingwei Long, Bowen Zhou）

paper：https://arxiv.org/abs/2305.13888

二、Experiment on the MATH dataset

1. Measuring Mathematical Problem Solving With the MATH Dataset（original paper）（Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt）

paper：https://arxiv.org/abs/2103.03874

code：https://github.com/hendrycks/math

dataset： https://drive.google.com/file/d/1hQsua3TkpEmcJD_UWQx8dmNdEZPyxw23/view?usp=sharing

2. Sparks of Artificial General Intelligence: Early experiments with GPT-4（Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang）

paper：https://arxiv.org/abs/2303.12712

code：https://github.com/guidance-ai/guidance

3. A Neural Network Solves, Explains, and Generates University Math Problems by Program Synthesis and Few-Shot Learning at Human Level（Iddo Drori, Sarah Zhang, Reece Shuttleworth, Leonard Tang, Albert Lu, Elizabeth Ke, Kevin Liu, Linda Chen, Sunny Tran, Newman Cheng, Roman Wang, Nikhil Singh, Taylor L. Patti, Jayson Lynch, Avi Shporer, Nakul Verma, Eugene Wu, Gilbert Strang）

paper：https://arxiv.org/abs/2112.15594

code：https://github.com/idrori/mathq

4. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models（Zhipeng Chen, Kun Zhou, Beichen Zhang, Zheng Gong, Wayne Xin Zhao, Ji-Rong Wen）

paper：https://arxiv.org/abs/2305.14323

dataset：MATH、HotPotQA

5.Deductive Verification of Chain-of-Thought Reasoning（Zhan Ling, Yunhao Fang, Xuanlin Li, Zhiao Huang, Mingu Lee, Roland Memisevic, Hao Su）

paper：https://arxiv.org/abs/2306.03872

dataset：MATH

6.CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation（Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji）

paper：https://arxiv.org/abs/2305.14318

dataset：MATH、TabMWP、Creation Challenge

7.An Empirical Study on Challenging Math Problem Solving with GPT-4 （Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang）

paper：https://arxiv.org/abs/2306.01337

8.Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference （Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah）

paper：https://arxiv.org/abs/2303.04673

三、Research work related to MATH

1. MINIF2F: A CROSS-SYSTEM BENCHMARK FOR FORMAL OLYMPIAD-LEVEL MATHEMATICS（Kunhao Zheng, Jesse Michael Han, Stanislas Polu）

（Drawing on the MATH dataset, propose miniF2F）

paper：https://arxiv.org/abs/2109.00110

code：https://github.com/openai/minif2f

2. DRAFT, SKETCH, AND PROVE: GUIDING FORMALTHEOREM PROVERS WITH INFORMAL PROOFS（Albert Q. Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample）

（ MATH is only used as a source of informal data, a way to map informal proofs to formal proofs）

paper：https://arxiv.org/abs/2210.12283

code：https://github.com/facebookresearch/minif2f

https://github.com/albertqjiang/draft_sketch_prove

3. LAMBADA: Backward Chaining for Automated Reasoning in Natural Language（Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran）

（The reference is the post pretrain method in MATH, reverse reasoning）

paper：https://arxiv.org/abs/2212.13894

4.AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models（Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan）

（ MATH is part of the benchmark, AGIEval: A Human-Centric Benchmark for Evaluating Base Models）

paper：https://arxiv.org/abs/2304.06364

code：https://github.com/microsoft/agieval

dataset： data/v1

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
Readme.md		Readme.md
dataset.py		dataset.py
eval.py		eval.py
eval.sh		eval.sh
finetune.py		finetune.py
finetune.sh		finetune.sh
img.jpg		img.jpg
infer.py		infer.py
infer.sh		infer.sh

Folders and files

Latest commit

History

Repository files navigation

MATH RELATED

step1:

step2:

step3:

step4:

step5:

step6:

一、Involves distillation on mathematical reasoning tasks

1. Teaching Small Language Models to Reason（Lucie Charlotte Magister, Jonathan Mallinson, Jakub Adamek, Eric Malmi, Aliaksei Severyn）

paper：https://arxiv.org/abs/2212.08410

2. Specializing Smaller Language Models towards Multi-Step Reasoning（Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot）

paper：https://arxiv.org/abs/2301.12726

code：https://github.com/FranxYao/FlanT5-CoT-Specialization

dataset：Google Drive

3. Large Language Models Are Reasoning Teachers（Namgyu Ho, Laura Schmid, Se-Young Yun）

paper：https://arxiv.org/abs/2212.10071

code：https://github.com/itsnamgyu/reasoning-teacher

dataset：Dropbox 、 Google Drive

4. PaD: Program-aided Distillation Specializes Large Models in Reasoning（Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xingwei Long, Bowen Zhou）

paper：https://arxiv.org/abs/2305.13888

二、Experiment on the MATH dataset

1. Measuring Mathematical Problem Solving With the MATH Dataset（original paper）（Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt）

paper：https://arxiv.org/abs/2103.03874

code：https://github.com/hendrycks/math

dataset： https://drive.google.com/file/d/1hQsua3TkpEmcJD_UWQx8dmNdEZPyxw23/view?usp=sharing

2. Sparks of Artificial General Intelligence: Early experiments with GPT-4（Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang）

paper：https://arxiv.org/abs/2303.12712

code：https://github.com/guidance-ai/guidance

paper：https://arxiv.org/abs/2112.15594

code：https://github.com/idrori/mathq

4. ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models（Zhipeng Chen, Kun Zhou, Beichen Zhang, Zheng Gong, Wayne Xin Zhao, Ji-Rong Wen）

paper：https://arxiv.org/abs/2305.14323

dataset：MATH、HotPotQA

5.Deductive Verification of Chain-of-Thought Reasoning（Zhan Ling, Yunhao Fang, Xuanlin Li, Zhiao Huang, Mingu Lee, Roland Memisevic, Hao Su）

paper：https://arxiv.org/abs/2306.03872

dataset：MATH

6.CREATOR: Disentangling Abstract and Concrete Reasonings of Large Language Models through Tool Creation（Cheng Qian, Chi Han, Yi R. Fung, Yujia Qin, Zhiyuan Liu, Heng Ji）

paper：https://arxiv.org/abs/2305.14318

dataset：MATH、TabMWP、Creation Challenge

7.An Empirical Study on Challenging Math Problem Solving with GPT-4 （Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang）

paper：https://arxiv.org/abs/2306.01337

8.Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference （Chi Wang, Susan Xueqing Liu, Ahmed H. Awadallah）

paper：https://arxiv.org/abs/2303.04673

三、Research work related to MATH

1. MINIF2F: A CROSS-SYSTEM BENCHMARK FOR FORMAL OLYMPIAD-LEVEL MATHEMATICS（Kunhao Zheng, Jesse Michael Han, Stanislas Polu）

（Drawing on the MATH dataset, propose miniF2F）

paper：https://arxiv.org/abs/2109.00110

code：https://github.com/openai/minif2f

2. DRAFT, SKETCH, AND PROVE: GUIDING FORMALTHEOREM PROVERS WITH INFORMAL PROOFS（Albert Q. Jiang, Sean Welleck, Jin Peng Zhou, Wenda Li, Jiacheng Liu, Mateja Jamnik, Timothée Lacroix, Yuhuai Wu, Guillaume Lample）

（ MATH is only used as a source of informal data, a way to map informal proofs to formal proofs）

paper：https://arxiv.org/abs/2210.12283

code：https://github.com/facebookresearch/minif2f

https://github.com/albertqjiang/draft_sketch_prove

3. LAMBADA: Backward Chaining for Automated Reasoning in Natural Language（Mehran Kazemi, Najoung Kim, Deepti Bhatia, Xin Xu, Deepak Ramachandran）

（The reference is the post pretrain method in MATH, reverse reasoning）

paper：https://arxiv.org/abs/2212.13894

4.AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models（Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan）

（ MATH is part of the benchmark, AGIEval: A Human-Centric Benchmark for Evaluating Base Models）

paper：https://arxiv.org/abs/2304.06364

code：https://github.com/microsoft/agieval

dataset： data/v1

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages