Learning a Zeroth-Order Optimizer for Fine-Tuning LLM

Official Implementation of "Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay".

In the foundation-model era, most downstream models derive from a small set of base checkpoints, making it efficient to learn an optimizer once per base model and reuse it across many tasks. Moreover, doing so in the zeroth-order setting preserves near-inference compute and memory costs, greatly improving the accessibility of model customization. To this end, we propose ZO Fine-tuner, a learning-based zeroth-order optimizer that augments the standard two-point ZO update with learned, adaptive, and non-uniform perturbations. Guided by our theoretical analysis, ZO Fine-tuner achieves minimal memory overhead by sharing a single perturbation variance within each parameter group. Trained once under a learning-to-learn objective for a given base LLM, the same finetuner transfers across diverse tasks and derivative checkpoints, enabling a practical “train once, reuse widely” workflow. Experiments on 4 LLMs ranging from 1B to 30B and 7 diverse datasets show that ZO Fine-tuner outperforms prior zeroth-order baselines in 23/28 of the task-model combinations in terms of final loss and achieves an average of 2.5% improvement in accuracy compared to MeZO, thereby demonstrating strong performance and scalability for efficient LLM fine-tuning.

Installation

conda create -n ZO_fine_tuner python==3.9.19
conda activate ZO_fine_tuner 
pip install -r requirements.txt

This environment can support the OPT, LLaMA, Qwen and other recent LLMs.

Usage

Use run_l2l.py to learn a ZO Fine-tuner and use run_zo_fune_tuner.py for performing downstream zeroth-order fine-tuning with the learned optimizer.

python run_l2l.py {ARGUMENTS}
python run_zo_fune_tuner.py {ARGUMENTS}

We provide example scripts below for reproducing our experiments, for both learning the ZO Fine-Tuner and fine-tuning on a downstream task, respectively.

#learning to learn
WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0 NEED_NORMALIZATION=True LOAD_FLOAT_16=False LR_MLP=0.1 EPOCH=15  LR_UPDATE=1e-6 TRAIN_MODE='l2l' EPOCHS_PER_RESTART=5 MODEL=meta-llama/Llama-3.2-1B TASK=Copa LR_LLM=0.01  SAVE_MLP_PATH='./learned_finetuner/llama1B_finetuner.pth' bash ./scripts/l2l.sh

# zero-order fine-tuning with the learned optimizer
WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0 LR=1e-7 LOAD_FLOAT_16=True NEED_NORMALIZATION=True TRAIN_MODE='zo_fine_tuner' STEPS=20000 MODEL=meta-llama/Llama-3.2-1B TASK=SST2 MODE=ft LOAD_MLP_PATH='./learned_finetuner/llama1B_finetuner.pth' bash ./scripts/zo_fine_tuner.sh

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
figures		figures
learned_finetuner		learned_finetuner
scripts		scripts
.gitignore		.gitignore
README.md		README.md
metrics.py		metrics.py
opt.py		opt.py
requirements.txt		requirements.txt
run_l2l.py		run_l2l.py
run_zo_fine_tuner.py		run_zo_fine_tuner.py
tasks.py		tasks.py
templates.py		templates.py
trainer_l2l.py		trainer_l2l.py
trainer_l2l_multi_dataset.py		trainer_l2l_multi_dataset.py
trainer_zo_fine_tuner.py		trainer_zo_fine_tuner.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning a Zeroth-Order Optimizer for Fine-Tuning LLM

Installation

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning a Zeroth-Order Optimizer for Fine-Tuning LLM

Installation

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages