Modifying Attention for Mathematical Reasoning in Language Models

Gayathri Ganesh Lakshmy, Rithvik Senthil, Rupsa Dhar

Large Language Models (LLMs) often struggle with complex mathematical reasoning due to inadequate handling of mathematical operators. This project explores modifying the attention mechanism to scale focus on operators and numbers, aiming to better capture mathematical structure. We evaluate this approach using Pass@1 and BERTScore on the MMIQC dataset.

To run inference:

Baseline (set --baseline to True)

python src/flan_t5_attention_mod.py --model google/flan-t5-xl --output <OUTPUT_FILE_PATH> --num_scaling 1 --op_scaling 1 --modification "Please solve the following problem and only output the answer at the end with \"The answer is: \". " --baseline True

Attention Modification

Without explanation prompting:

python src/flan_t5_attention_mod.py --model google/flan-t5-xl --output <OUTPUT_FILE_PATH> --num_scaling 1 --op_scaling 0.7 --modification "Please solve the following problem and only output the answer at the end with \"The answer is: \". " --model_part <ENCODER/DECODER/BOTH>

With explanation (chain-of-thought) prompting:

python src/flan_t5_attention_mod.py --model google/flan-t5-xl --output <OUTPUT_FILE_PATH> --num_scaling 1 --op_scaling 0.7 --modification "Please solve the following problem and give a concise explanation with the answer at the end with \"The answer is: \"." --model_part <ENCODER/DECODER/BOTH>

Evaluation

python utils/evaluate_using_llama.py --file <EVAL_OUTPUT_PATH>

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
Llama_Experiments		Llama_Experiments
analysis		analysis
processed_dataset		processed_dataset
raw_datasets		raw_datasets
results		results
src		src
utils		utils
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modifying Attention for Mathematical Reasoning in Language Models

Gayathri Ganesh Lakshmy, Rithvik Senthil, Rupsa Dhar

To run inference:

Baseline (set --baseline to True)

Attention Modification

Without explanation prompting:

With explanation (chain-of-thought) prompting:

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Modifying Attention for Mathematical Reasoning in Language Models

Gayathri Ganesh Lakshmy, Rithvik Senthil, Rupsa Dhar

To run inference:

Baseline (set --baseline to True)

Attention Modification

Without explanation prompting:

With explanation (chain-of-thought) prompting:

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages