Skip to content
/ GIFT Public

The official repository of "GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization"

Notifications You must be signed in to change notification settings

zzy1127/GIFT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GIFT: Gibbs Initialization with Finite Temperature

GIFT is a unified post-training framework designed to bridge the optimization gap within the prevailing SFT-to-RL paradigm. By replacing traditional one-hot SFT with a finite-temperature Gibbs distribution, GIFT establishes a distributional bridge that preserves base priors while ensuring consistency with global post-training objectives. We theoretically and empirically demonstrate that GIFT provides an optimal initialization for RL in mathematical reasoning. Specifically, we show that standard SFT is merely a degenerate zero-temperature limit of this ideal policy. Our results indicate that GIFT significantly outperforms robust SFT variants across diverse and out-of-distribution benchmarks. Furthermore, geometric and distributional analyses reveal that GIFT preserves the exploration landscape, facilitating accelerated convergence and superior asymptotic performance to unlock the model’s full reasoning potential.

Installation

cd verl
pip install -e .

Usage

Quick Start

bash exp_scripts/run_gift_training.sh

Configuration

Edit exp_scripts/run_gift_training.sh to set your own paths:

train_file="/path/to/your/train.parquet"
val_file="/path/to/your/val.parquet"
model_path="/path/to/your/base_model"

Data Format

Your parquet files should contain:

  • prompt_key: column name for input prompts (default: sft_prompt)
  • response_key: column name for target responses (default: solution)

Acknowledgement

GIFT builds upon veRL, deepmath, and utilizes vLLM for inference. We utilize Math-Verify for math reasoning evaluation. We thank the open-source community for codes, datasets and backbones, including veRL, LUFFY, ReLIFT.

About

The official repository of "GIFT: Unlocking Global Optimality in Post-Training via Finite-Temperature Gibbs Initialization"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages