NanoGPT + Math: Direct Preference Optimization

Project Overview

This project aims to build a math training dataset and fine-tune a NanoGPT model to solve simple arithmetic problems with reasoning. We utilize Direct Preference Optimization (DPO) to align the model's outputs with a specific format and reasoning capability.

The goal is to teach the model to reply in the format:

PROMPT The answer is A because B is C

Where:

PROMPT: The arithmetic or algebraic question (e.g., 98/x=14,x=?).
A: The Answer.
B: The Reasoning.
C: The Answer (reconfirmed).

Methodology

The team settled on generating datasets with familiar mathematical patterns (e.g. multiply by 10, commutative property, etc.) this allows DPO to pick up patterns better, thus improving model's performance.

The model is fine-tuned using the AdamW optimizer and CosineAnnealingLR scheduler.

No additional fine-tuning method was introduced, other then the DPO method provided by the assignment authors.

Installation and Usage

Install Dependencies:

pip install matplotlib torch numpy transformers datasets tiktoken wandb tqdm

Pretrained Model: Please download the pretrained NanoGPT model on QA data from the link, and place it into the folder ./sft!
Run the Project: The main training and evaluation logic is contained within the Jupyter Notebooks in the dpo/ directory (e.g., dpo/dpo.ipynb).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
datasetgenerator		datasetgenerator
dpo		dpo
sft		sft
.gitattributes		.gitattributes
.gitignore		.gitignore
25S1 SC3000_CZ3005 Assignment 1.pdf		25S1 SC3000_CZ3005 Assignment 1.pdf
README.md		README.md
configurator.py		configurator.py
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NanoGPT + Math: Direct Preference Optimization

Project Overview

Methodology

Installation and Usage

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

davelimstudio/SC3000-Assignment-1

Folders and files

Latest commit

History

Repository files navigation

NanoGPT + Math: Direct Preference Optimization

Project Overview

Methodology

Installation and Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages