MLX LoRA Function Calling Fine-Tuning

Fine-tune Mistral 7B for function calling using LoRA/QLoRA on Apple Silicon with the xLAM dataset.

What This Does

This project enables you to:

Fine-tune Mistral 7B Instruct on function calling tasks
Use the xLAM dataset (Salesforce's 60K function calling examples)
Apply LoRA (Low-Rank Adaptation) for efficient fine-tuning
Use QLoRA (4-bit quantization) for memory-constrained devices
Run everything locally on Apple Silicon (M1/M2/M3/M4)

Hardware Requirements

Tested Configuration:

Apple M3 with 32GB unified memory
macOS Sonoma or later

Recommended:

M2 Ultra (64GB+): Best performance
M3 Max (36GB+): Very good
M1 Max (32GB): Works well

Minimum:

M1 with 16GB (use QLoRA with batch size 1)

Quick Start

Prerequisites

Apple Silicon Mac with 16GB+ RAM
Python 3.9+
Hugging Face account and token (Get one here)

Installation

# Install dependencies
pip install -r requirements.txt

# Configure Hugging Face token
cp .env.example .env
# Edit .env and add: HF_TOKEN=your_token_here

Run Complete Pipeline (One Command)

./run_pipeline.sh

This will:

Download and prepare 1000 xLAM samples (800 train, 100 valid, 100 test)
Convert Mistral 7B to MLX format with 4-bit quantization
Train with LoRA for 600 iterations
Run test suite with example prompts

Expected time: ~30-40 minutes on M3 32GB

Manual Step-by-Step

1. Prepare Data (~2 minutes)

python prepare_xlam_data.py \
    --train-samples 800 \
    --valid-samples 100 \
    --test-samples 100

Creates data/xlam/ with train/valid/test splits.

2. Convert Model (~5 minutes)

python convert.py \
    --hf-path mistralai/Mistral-7B-Instruct-v0.2 \
    --mlx-path mlx_model \
    -q

The -q flag enables 4-bit quantization (QLoRA), reducing memory from ~14GB to ~4GB.

3. Train (~25 minutes)

python train_function_calling.py \
    --model mlx_model \
    --data data/xlam \
    --train \
    --batch-size 2 \
    --iters 600 \
    --learning-rate 1e-5 \
    --lora-layers 16

For M1 with 16GB:

python train_function_calling.py \
    --model mlx_model \
    --data data/xlam \
    --train \
    --batch-size 1 \
    --lora-layers 8 \
    --iters 400

4. Test (~1 minute)

# Run automated test suite
python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz

# Interactive mode
python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz \
    --interactive

# Test with specific prompt
python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz \
    --prompt '<user>What is the weather in Tokyo?</user>\n\n<tools>'

Data Format

The xLAM dataset uses this format:

<user>[user query]</user>

<tools>[available tool definitions]</tools>

<calls>[expected function calls]</calls>

Example:

<user>Check if the numbers 8 and 1233 are powers of two.</user>

<tools>{'name': 'is_power_of_two', 'description': 'Checks if a number is a power of two.', 'parameters': {'num': {'description': 'The number to check.', 'type': 'int'}}}</tools>

<calls>{'name': 'is_power_of_two', 'arguments': {'num': 8}}
{'name': 'is_power_of_two', 'arguments': {'num': 1233}}</calls>

Training Configuration

LoRA Settings

LoRA Rank: 8 (number of low-rank matrices)
LoRA Layers: 16 (fine-tune last 16 transformer layers)
Target Modules: Q and V projections in attention layers
Trainable Parameters: ~1-2% of total model parameters

Hyperparameters

Parameter	Default	Description
`--batch-size`	2	Training batch size
`--iters`	600	Number of training iterations
`--learning-rate`	1e-5	Adam learning rate
`--lora-layers`	16	Number of layers to fine-tune
`--lora-rank`	8	LoRA rank parameter
`--steps-per-report`	10	Log training loss every N steps
`--steps-per-eval`	100	Evaluate on validation set every N steps
`--save-every`	100	Save checkpoint every N iterations

Expected Performance

On M3 32GB with batch size 2:

~150-200 tokens/second
~600 iterations in 20-30 minutes

Training progress indicators:

Iter 1:   Val loss 1.512  ← Baseline
Iter 100: Val loss 1.200  ← Should drop ~15-25%
Iter 200: Val loss 1.050  ← Continuing to drop
Iter 300: Val loss 0.950  ← Good progress

Signs to stop early:

Validation loss plateaued or increasing (overfitting)
Model gives reasonable responses in tests
Training loss << 0.5

Memory Usage

QLoRA (4-bit quantized):

Base model: ~4GB
Training peak: ~12-16GB
Recommended for: 16GB+ systems

Regular LoRA (float16):

Base model: ~14GB
Training peak: ~22-28GB
Recommended for: 32GB+ systems

Advanced Usage

Resume Training

python train_function_calling.py \
    --model mlx_model \
    --train \
    --resume-adapter-file adapters.npz

Evaluate on Test Set

python train_function_calling.py \
    --model mlx_model \
    --adapter-file adapters.npz \
    --test

Custom Dataset Sizes

python prepare_xlam_data.py \
    --train-samples 4000 \
    --valid-samples 500 \
    --test-samples 500

Adjust Generation Parameters

python test_function_calling.py \
    --model mlx_model \
    --adapter adapters.npz \
    --max-tokens 300 \
    --temp 0.5

Project Structure

mlx_lora_fine_tuning/
├── README.md                      # This file
├── LICENSE                        # MIT License
├── requirements.txt               # Python dependencies
├── .env.example                   # Environment configuration template
├── .gitignore                     # Git ignore rules
│
├── prepare_xlam_data.py          # Data preprocessing script
├── convert.py                     # Model conversion to MLX
├── train_function_calling.py     # Training script
├── test_function_calling.py      # Testing/inference script
├── run_pipeline.sh                # Complete automation script
│
├── models.py                      # Model architecture and LoRA
├── utils.py                       # Utility functions
│
├── data/xlam/                     # Dataset directory (created by scripts)
│   ├── train.jsonl
│   ├── valid.jsonl
│   └── test.jsonl
│
├── mlx_model/                     # Converted model (created by convert.py)
└── adapters.npz                   # Trained LoRA weights

Troubleshooting

Out of Memory

Use QLoRA: Add -q flag to convert.py
Reduce batch size: Use --batch-size 1
Reduce LoRA layers: Use --lora-layers 8 or --lora-layers 4
Close other applications to free memory

Hugging Face Authentication

# Option 1: Environment variable
export HF_TOKEN=your_token

# Option 2: CLI login
huggingface-cli login

# Option 3: .env file (recommended)
echo "HF_TOKEN=your_token" > .env

Slow Training

Close other applications to free memory
Ensure you're using QLoRA (-q flag in convert.py)
Try smaller batch size (--batch-size 1)
Monitor Activity Monitor for memory pressure

Model Not Learning

Increase iterations (--iters 1000)
Adjust learning rate (--learning-rate 5e-5)
Increase LoRA layers (--lora-layers 24)
Check that validation loss is decreasing

Mac Overheating

Ensure good ventilation
Use a cooling pad if available
Stop training (Ctrl+C), let cool for 10 minutes
Resume with: --resume-adapter-file adapters.npz.iter_XXX

Citations

xLAM Dataset:

@misc{xlam2024,
  title={xLAM: A Family of Large Action Models to Empower AI Agent Systems},
  author={Salesforce Research},
  year={2024},
  publisher={Hugging Face}
}

LoRA:

@article{hu2021lora,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward J and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu},
  journal={arXiv preprint arXiv:2106.09685},
  year={2021}
}

QLoRA:

@article{dettmers2023qlora,
  title={QLoRA: Efficient Finetuning of Quantized LLMs},
  author={Dettmers, Tim and Pagnoni, Artidoro and Holtzman, Ari and Zettlemoyer, Luke},
  journal={arXiv preprint arXiv:2305.14314},
  year={2023}
}

References

Acknowledgments

This project is based on:

Apple MLX Examples - LoRA implementation
xLAM Dataset - Salesforce Research
Mistral 7B - Mistral AI

License

MIT License - See LICENSE file for details.

This project is adapted from Apple's MLX Examples and follows the same MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
models.py		models.py
prepare_xlam_data.py		prepare_xlam_data.py
requirements.txt		requirements.txt
run_pipeline.sh		run_pipeline.sh
test_function_calling.py		test_function_calling.py
train_function_calling.py		train_function_calling.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

MLX LoRA Function Calling Fine-Tuning

What This Does

Hardware Requirements

Quick Start

Prerequisites

Installation

Run Complete Pipeline (One Command)

Manual Step-by-Step

1. Prepare Data (~2 minutes)

2. Convert Model (~5 minutes)

3. Train (~25 minutes)

4. Test (~1 minute)

Data Format

Training Configuration

LoRA Settings

Hyperparameters

Expected Performance

Memory Usage

QLoRA (4-bit quantized):

Regular LoRA (float16):

Advanced Usage

Resume Training

Evaluate on Test Set

Custom Dataset Sizes

Adjust Generation Parameters

Project Structure

Troubleshooting

Out of Memory

Hugging Face Authentication

Slow Training

Model Not Learning

Mac Overheating

Citations

References

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages