Skip to content

robcost/mlx_lora_fine_tuning

Repository files navigation

MLX LoRA Function Calling Fine-Tuning

Fine-tune Mistral 7B for function calling using LoRA/QLoRA on Apple Silicon with the xLAM dataset.

What This Does

This project enables you to:

  • Fine-tune Mistral 7B Instruct on function calling tasks
  • Use the xLAM dataset (Salesforce's 60K function calling examples)
  • Apply LoRA (Low-Rank Adaptation) for efficient fine-tuning
  • Use QLoRA (4-bit quantization) for memory-constrained devices
  • Run everything locally on Apple Silicon (M1/M2/M3/M4)

Hardware Requirements

Tested Configuration:

  • Apple M3 with 32GB unified memory
  • macOS Sonoma or later

Recommended:

  • M2 Ultra (64GB+): Best performance
  • M3 Max (36GB+): Very good
  • M1 Max (32GB): Works well

Minimum:

  • M1 with 16GB (use QLoRA with batch size 1)

Quick Start

Prerequisites

  • Apple Silicon Mac with 16GB+ RAM
  • Python 3.9+
  • Hugging Face account and token (Get one here)

Installation

# Install dependencies
pip install -r requirements.txt

# Configure Hugging Face token
cp .env.example .env
# Edit .env and add: HF_TOKEN=your_token_here

Run Complete Pipeline (One Command)

./run_pipeline.sh

This will:

  1. Download and prepare 1000 xLAM samples (800 train, 100 valid, 100 test)
  2. Convert Mistral 7B to MLX format with 4-bit quantization
  3. Train with LoRA for 600 iterations
  4. Run test suite with example prompts

Expected time: ~30-40 minutes on M3 32GB

Manual Step-by-Step

1. Prepare Data (~2 minutes)

python prepare_xlam_data.py \
    --train-samples 800 \
    --valid-samples 100 \
    --test-samples 100

Creates data/xlam/ with train/valid/test splits.

2. Convert Model (~5 minutes)

python convert.py \
    --hf-path mistralai/Mistral-7B-Instruct-v0.2 \
    --mlx-path mlx_model \
    -q

The -q flag enables 4-bit quantization (QLoRA), reducing memory from ~14GB to ~4GB.

3. Train (~25 minutes)

python train_function_calling.py \
    --model mlx_model \
    --data data/xlam \
    --train \
    --batch-size 2 \
    --iters 600 \
    --learning-rate 1e-5 \
    --lora-layers 16

For M1 with 16GB:

python train_function_calling.py \
    --model mlx_model \
    --data data/xlam \
    --train \
    --batch-size 1 \
    --lora-layers 8 \
    --iters 400

4. Test (~1 minute)

# Run automated test suite
python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz

# Interactive mode
python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz \
    --interactive

# Test with specific prompt
python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz \
    --prompt '<user>What is the weather in Tokyo?</user>\n\n<tools>'

Data Format

The xLAM dataset uses this format:

<user>[user query]</user>

<tools>[available tool definitions]</tools>

<calls>[expected function calls]</calls>

Example:

<user>Check if the numbers 8 and 1233 are powers of two.</user>

<tools>{'name': 'is_power_of_two', 'description': 'Checks if a number is a power of two.', 'parameters': {'num': {'description': 'The number to check.', 'type': 'int'}}}</tools>

<calls>{'name': 'is_power_of_two', 'arguments': {'num': 8}}
{'name': 'is_power_of_two', 'arguments': {'num': 1233}}</calls>

Training Configuration

LoRA Settings

  • LoRA Rank: 8 (number of low-rank matrices)
  • LoRA Layers: 16 (fine-tune last 16 transformer layers)
  • Target Modules: Q and V projections in attention layers
  • Trainable Parameters: ~1-2% of total model parameters

Hyperparameters

Parameter Default Description
--batch-size 2 Training batch size
--iters 600 Number of training iterations
--learning-rate 1e-5 Adam learning rate
--lora-layers 16 Number of layers to fine-tune
--lora-rank 8 LoRA rank parameter
--steps-per-report 10 Log training loss every N steps
--steps-per-eval 100 Evaluate on validation set every N steps
--save-every 100 Save checkpoint every N iterations

Expected Performance

On M3 32GB with batch size 2:

  • ~150-200 tokens/second
  • ~600 iterations in 20-30 minutes

Training progress indicators:

Iter 1:   Val loss 1.512  ← Baseline
Iter 100: Val loss 1.200  ← Should drop ~15-25%
Iter 200: Val loss 1.050  ← Continuing to drop
Iter 300: Val loss 0.950  ← Good progress

Signs to stop early:

  • Validation loss plateaued or increasing (overfitting)
  • Model gives reasonable responses in tests
  • Training loss << 0.5

Memory Usage

QLoRA (4-bit quantized):

  • Base model: ~4GB
  • Training peak: ~12-16GB
  • Recommended for: 16GB+ systems

Regular LoRA (float16):

  • Base model: ~14GB
  • Training peak: ~22-28GB
  • Recommended for: 32GB+ systems

Advanced Usage

Resume Training

python train_function_calling.py \
    --model mlx_model \
    --train \
    --resume-adapter-file adapters.npz

Evaluate on Test Set

python train_function_calling.py \
    --model mlx_model \
    --adapter-file adapters.npz \
    --test

Custom Dataset Sizes

python prepare_xlam_data.py \
    --train-samples 4000 \
    --valid-samples 500 \
    --test-samples 500

Adjust Generation Parameters

python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz \
    --max-tokens 300 \
    --temp 0.5

Project Structure

mlx_lora_fine_tuning/
├── README.md                      # This file
├── LICENSE                        # MIT License
├── requirements.txt               # Python dependencies
├── .env.example                   # Environment configuration template
├── .gitignore                     # Git ignore rules
│
├── prepare_xlam_data.py          # Data preprocessing script
├── convert.py                     # Model conversion to MLX
├── train_function_calling.py     # Training script
├── test_function_calling.py      # Testing/inference script
├── run_pipeline.sh                # Complete automation script
│
├── models.py                      # Model architecture and LoRA
├── utils.py                       # Utility functions
│
├── data/xlam/                     # Dataset directory (created by scripts)
│   ├── train.jsonl
│   ├── valid.jsonl
│   └── test.jsonl
│
├── mlx_model/                     # Converted model (created by convert.py)
└── adapters.npz                   # Trained LoRA weights

Troubleshooting

Out of Memory

  1. Use QLoRA: Add -q flag to convert.py
  2. Reduce batch size: Use --batch-size 1
  3. Reduce LoRA layers: Use --lora-layers 8 or --lora-layers 4
  4. Close other applications to free memory

Hugging Face Authentication

# Option 1: Environment variable
export HF_TOKEN=your_token

# Option 2: CLI login
huggingface-cli login

# Option 3: .env file (recommended)
echo "HF_TOKEN=your_token" > .env

Slow Training

  • Close other applications to free memory
  • Ensure you're using QLoRA (-q flag in convert.py)
  • Try smaller batch size (--batch-size 1)
  • Monitor Activity Monitor for memory pressure

Model Not Learning

  • Increase iterations (--iters 1000)
  • Adjust learning rate (--learning-rate 5e-5)
  • Increase LoRA layers (--lora-layers 24)
  • Check that validation loss is decreasing

Mac Overheating

  • Ensure good ventilation
  • Use a cooling pad if available
  • Stop training (Ctrl+C), let cool for 10 minutes
  • Resume with: --resume-adapter-file adapters.npz.iter_XXX

Citations

xLAM Dataset:

@misc{xlam2024,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
  author={Salesforce Research},
  year={2024},
  publisher={Hugging Face}
}

LoRA:

@article{hu2021lora,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
  journal={arXiv preprint arXiv:2106.09685},
  year={2021}
}

QLoRA:

@article{dettmers2023qlora,
  title={QLoRA: Efficient Finetuning of Quantized LLMs},
  author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke},
  journal={arXiv preprint arXiv:2305.14314},
  year={2023}
}

References

Acknowledgments

This project is based on:

License

MIT License - See LICENSE file for details.

This project is adapted from Apple's MLX Examples and follows the same MIT License.

About

Fine-tune Mistral 7B for function calling using LoRA/QLoRA on Apple Silicon with the xLAM dataset.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages