A powerful and flexible Python library designed to simplify the training and fine-tuning of modern foundation models on tabular data.
Provides a high-level, scikit-learn-compatible API that abstracts away the complexities of data preprocessing and model-specific training loops, allowing you to focus on results.
The library is built on four main components that work together seamlessly:
-
DataProcessor-- A smart, model-aware data preparation engine.
Automatically handles imputation, scaling, and categorical encoding based on the requirements of the selected model (e.g., integer encoding for TabPFN, text embeddings for ContextTab). -
TuningManager-- The computational core of the library.
Manages the model adaptation process, applying the correct training strategyโwhether it's zero-shot inference, episodic fine-tuning for ICL models, or full fine-tuning with optional PEFT (Parameter-Efficient Fine-Tuning). -
TabularPipeline-- The main user-facing object.
Provides simple yet efficient functionalities -.fit(),.predict(),.evaluate(),.save(), and.load()API that chains all components into a seamless, end-to-end experience. -
TabularLeaderboard-- A leaderboard utility for model comparison.
Makes it easy to compare multiple models and strategies on the same dataset splits with automatic ranking and metric reporting.
Using diverse tabular foundation models often requires writing model-specific boilerplate for data preparation, training, and inference. TabTune solves this by providing:
-
Unified API: A single, consistent interface (
.fit(),.predict(),.evaluate()) for multiple models such as TabPFN, TabPFNv2.6, TabICL, TabICLv2, Mitra, ContextTab, TabDPT, OrionMSP, and OrionBix. -
Automated Preprocessing: The DataProcessor is model-aware, automatically applying the correct transformations without manual configuration.
-
Flexible Fine-Tuning Strategies:
- Inference mode for zero-shot predictions
- Meta-learning mode for episodic fine-tuning (recommended for ICL models)
- Supervised Fine-Tuning (SFT) for task-optimized learning
- PEFT mode for parameter-efficient adaptation using LoRA adapters
-
Easy Model Comparison: The TabularLeaderboard allows you to benchmark multiple models and strategies to quickly find the best performer.
-
Checkpoint Management: Automatic saving and loading of fine-tuned model weights with support for resuming training.
- โ
TabPFNv2.6 Integration -- Full support for the latest TabPFN release, covering classification and regression (inference + finetune), with a dedicated native fine-tuning mode (
finetune_mode='native') that leveragesFinetunedTabPFNClassifier/FinetunedTabPFNRegressor. - โ TabICLv2 Integration -- Full support for TabICLv2 for both classification (inference + finetune) and regression (inference + finetune), using episodic turn-by-turn fine-tuning.
| Model | Family / Paradigm | Key Innovation | Supported Strategies |
|---|---|---|---|
| TabPFN-v2 | PFN / ICL | Approximates Bayesian inference on synthetic data | Inference, Meta-Learning FT, SFT, PEFT*, Regression, Regression FT |
| TabICL | Scalable ICL | Two-stage column-then-row attention | Inference, Meta-Learning FT, SFT, PEFT |
| OrionMSP v1.0 | Scalable ICL | Multi-Scale Sparse Attention | Inference, Meta-Learning FT, SFT, PEFT |
| OrionMSP v1.5 | Scalable ICL | Stabilized prototype refinement | Inference, Meta-Learning FT, SFT, PEFT |
| OrionBix | Scalable ICL | Tabular Bi-Axial In-Context Learning | Inference, Meta-Learning FT, SFT, PEFT |
| Mitra | Scalable ICL | 2D attention (row & column) | Inference, Meta-Learning FT, SFT, PEFT, Regression, Regression-FT |
| ContextTab | Semantics-Aware ICL | Modality-specific semantic embeddings | Inference, Full Fine-Tuning, PEFT*, Regression, Regression-FT |
| TabDPT | Denoising Transformer | Denoising pre-training | Inference, Meta-Learning FT, SFT, Regression, Regression-FT |
| LimiX | Probabilistic / ICL | Likelihood-based mixture modeling; uncertainty-aware | Inference, Regression, Regression-FT |
| TabPFN-v2.6 | PFN / ICL | Latest PriorLabs release with native finetuning API | Inference, Meta-Learning FT, SFT, Native FT, Regression, Regression FT |
| TabICLv2 | Scalable ICL | Improved column-then-row attention | Inference, FT, Regression, Regression FT |
Note: PEFT for ContextTab and TabPFN is experimental; inference strategy is fully supported.
git clone https://github.com/Lexsi-Labs/TabTune.git
cd TabTune
pip install -r requirements.txt
pip install -e .Here is a complete example of loading a dataset, fine-tuning a TabPFN model, saving the pipeline, and making predictions.
import pandas as pd
from sklearn.model_selection import train_test_split
import openml
from tabtune.TabularPipeline.pipeline import TabularPipeline
# 1. Load a dataset from OpenML
dataset = openml.datasets.get_dataset(42178)
X, y, _, _ = dataset.get_data(target=dataset.default_target_attribute)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
# 2. Configure and Initialize the Pipeline
pipeline = TabularPipeline(
model_name="TabPFN",
task_type="classification",
tuning_strategy="inference", # or 'finetune'
tuning_params={"device": "cpu"}
)
# 3. Fit the pipeline on the raw training data
pipeline.fit(X_train, y_train)
# 4. Save the fine-tuned pipeline
pipeline.save("fitted_pipeline.joblib")
# 5. Load the pipeline and make predictions on new data
loaded_pipeline = TabularPipeline.load("fitted_pipeline.joblib")
predictions = loaded_pipeline.predict(X_test)
# 6. Evaluate the pipeline
metrics = pipeline.evaluate(X_test, y_test)
print(metrics)TabTune provides multiple fine-tuning strategies to suit different use cases:
Zero-shot predictions without any training. The model uses its pre-trained weights directly on your data.
pipeline = TabularPipeline(
model_name="TabPFN",
tuning_strategy="inference"
)
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)Full parameter fine-tuning. Updates all model weights using task data.
- Meta-Learning (default for ICL models): Episodic training that mimics the in-context learning paradigm
- SFT (Supervised Fine-Tuning): Standard supervised training on batches
pipeline = TabularPipeline(
model_name="TabICL",
tuning_strategy="finetune", # Defaults to 'base-ft'
tuning_params={
"epochs": 5,
"learning_rate": 1e-5,
"finetune_mode": "meta-learning" # or "sft"
}
)
pipeline.fit(X_train, y_train)TabPFNv2.6 exposes PriorLabs' FinetunedTabPFNClassifier / FinetunedTabPFNRegressor directly, offering their native advanced fine-tuning pipeline.
# Classification
pipeline = TabularPipeline(
model_name="TabPFNv26",
task_type="classification",
tuning_strategy="finetune",
finetune_mode="native", # uses FinetunedTabPFNClassifier
tuning_params={
"epochs": 30,
"learning_rate": 1e-5,
"early_stopping": True,
"early_stopping_patience": 8,
}
)
pipeline.fit(X_train, y_train)
# Regression
pipeline = TabularPipeline(
model_name="TabPFNv26",
task_type="regression",
tuning_strategy="finetune",
finetune_mode="native", # uses FinetunedTabPFNRegressor
tuning_params={
"epochs": 30,
"learning_rate": 1e-5,
"early_stopping": True,
}
)
pipeline.fit(X_train, y_train)Applies LoRA (Low-Rank Adaptation) adapters to only a subset of parameters, reducing memory and computation.
pipeline = TabularPipeline(
model_name="TabICL",
tuning_strategy="peft",
tuning_params={
"epochs": 10,
"learning_rate": 5e-5,
"peft_config": {
"r": 8,
"lora_alpha": 16,
"lora_dropout": 0.05
}
}
)
pipeline.fit(X_train, y_train)PEFT Support by Model:
- โ Full Support: TabICL, OrionMSP, OrionBix, TabDPT, Mitra
โ ๏ธ Experimental: ContextTab and TabPFN (may cause prediction issues; use 'base-ft' instead)
When calling .evaluate(), TabTune computes the following metrics:
- Accuracy -- Fraction of correct predictions
- Weighted F1 Score -- Harmonic mean of precision and recall, weighted by class support
- ROC AUC Score -- Area under the Receiver Operating Characteristic curve (binary and multi-class supported)
- Matthews Correlation Coefficient (MCC) -- Correlation between predicted and actual values
- Precision & Recall -- Per-class performance metrics
- Brier Score -- Mean squared error of probabilistic predictions
metrics = pipeline.evaluate(X_test, y_test)
print(metrics)
# Output: {'accuracy': 0.92, 'f1_score': 0.89, 'roc_auc_score': 0.95, ...}TabTune now fully supports regression tasks with standardized evaluation metrics.
from tabtune import TabularPipeline
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
pipeline = TabularPipeline(
model_name="OrionMSP",
task_type="regression",
tuning_strategy="inference",
tuning_params={
"epochs": 5,
"learning_rate": 2e-5
}
)
pipeline.fit(X_train, y_train)
metrics = pipeline.evaluate(X_test, y_test)
print(metrics)- RMSE
- MAE
- Rยฒ Score
TabTune provides two complementary mechanisms for handling data imbalance and episodic construction:
- Dataset-Level Resampling (via
DataProcessor) - Context / Support-Query Sampling (for meta-learning models)
Both integrate seamlessly into TabularPipeline.
Strategy Description Task Support
smote Synthetic minority oversampling Classification
random_over Random oversampling Classification
random_under Random undersampling Classification
tomek Tomek links cleaning Classification
kmeans KMeans-SMOTE hybrid Classification
knn KNN-based synthetic sampling Classification
Resampling is primarily designed for imbalanced classification tasks.
Resampling is configured through processor_params and is applied before training. An example usage is as follows :-
from tabtune import TabularPipeline
pipeline = TabularPipeline(
model_name="TabICL",
tuning_strategy="inference",
processor_params={
"resampling_strategy": "smote"
},
tuning_params={
"epochs": 5,
"learning_rate": 2e-5
}
)
pipeline.fit(X_train, y_train)The TabularLeaderboard makes it easy to compare multiple models and strategies on the same dataset.
from tabtune.TabularLeaderboard.leaderboard import TabularLeaderboard
# 1. Initialize the leaderboard with your data splits
leaderboard = TabularLeaderboard(X_train, X_test, y_train, y_test)
# 2. Add model configurations to compare
leaderboard.add_model(
model_name='TabICL',
tuning_strategy='inference',
model_params={'n_estimators': 16}
)
leaderboard.add_model(
model_name='TabICL',
tuning_strategy='finetune',
model_params={'n_estimators': 16},
tuning_params={'epochs': 5, 'learning_rate': 1e-5, 'finetune_mode': 'meta-learning'}
)
leaderboard.add_model(
model_name='TabPFN',
tuning_strategy='inference'
)
# 3. Run the benchmark and display ranked results
leaderboard.run()TabularPipeline(
model_name: str,
task_type: str = 'classification',
tuning_strategy: str = 'inference',
tuning_params: dict | None = None,
processor_params: dict | None = None,
model_params: dict | None = None,
model_checkpoint_path: str | None = None,
finetune_mode: str = 'meta-learning'
)-
model_name(str): The name of the model to use. Supported values:'TabPFN','TabPFNv26','TabICL','TabICLv2','ContextTab','Mitra','TabDPT','OrionMSP','OrionMSPv1.5','OrionBix','Limix'. -
task_type(str): The type of task โ'classification'or'regression'. -
tuning_strategy(str): The strategy for model adaptation:'inference','finetune', or'peft'. -
finetune_mode(str, optional): Controls the fine-tuning algorithm. IfNone, a smart default is chosen per task type ('turn_by_turn'for regression,'meta-learning'for classification). Supported values per model:'meta-learning'โ episodic meta-learning (TabICL, TabICLv2, OrionMSP, OrionBix, TabDPT, Mitra, TabPFNv26)'sft'โ supervised fine-tuning (TabPFN, TabPFNv26, Mitra, TabDPT)'native'โ PriorLabs native finetuner with bar distribution loss, AMP, early stopping (TabPFNv2.6 only, classification and regression)'turn_by_turn'/'tbt'โ episodic turn-by-turn (TabPFN regression, Mitra regression, TabDPT regression, ContextTab regression)
-
tuning_params(dict, optional): Parameters for theTuningManager:epochs(int): Number of training epochslearning_rate(float): Learning rate for optimizationbatch_size(int): Batch size for fine-tuningdevice(str):'cuda'or'cpu'save_checkpoint_path(str): Path to save fine-tuned weightscheckpoint_dir(str): Directory for automatic checkpoint savingshow_progress(bool): Whether to show progress barspeft_config(dict): Configuration for LoRA adaptersearly_stopping(bool): Enable early stopping โ TabPFNv2.6 native mode onlyearly_stopping_patience(int): Patience for early stopping โ TabPFNv2.6 native mode onlyn_estimators_finetune(int): Ensemble size during fine-tuning โ TabPFNv2.6 native mode only
-
processor_params(dict, optional): Parameters for theDataProcessor:imputation_strategy(str):'mean','median','iterative','knn'categorical_encoding(str):'onehot','ordinal','target','hashing','binary'scaling_strategy(str):'standard','minmax','robust','power_transform'resampling_strategy(str):'smote','random_over','random_under','tomek','kmeans','knn'feature_selection_strategy(str):'variance','select_k_best_anova','select_k_best_chi2'
-
model_params(dict, optional): Model-specific parameters. -
model_checkpoint_path(str, optional): Path to a.ptfile containing pre-trained model weights.
Fine-tuned models are automatically saved during training:
tuning_params = {
'save_checkpoint_path': './checkpoints/my_model.pt',
'checkpoint_dir': './checkpoints' # Used if save_checkpoint_path is None
}# Load pre-trained weights when initializing
pipeline = TabularPipeline(
model_name="TabPFN",
model_checkpoint_path="./checkpoints/pretrained.pt"
)# Save entire pipeline
pipeline.save("my_pipeline.joblib")
# Load and use
loaded_pipeline = TabularPipeline.load("my_pipeline.joblib")
predictions = loaded_pipeline.predict(X_test)LoRA (Low-Rank Adaptation) adapters can significantly reduce memory usage during fine-tuning.
peft_config = {
'r': 8, # LoRA rank (lower = fewer parameters)
'lora_alpha': 16, # Scaling factor for LoRA updates
'lora_dropout': 0.05, # Dropout in LoRA modules
'target_modules': None # Auto-detect by model (optional override)
}
pipeline = TabularPipeline(
model_name="TabICL",
tuning_strategy="peft",
tuning_params={
'epochs': 10,
'learning_rate': 5e-5,
'peft_config': peft_config
}
)Memory Savings: PEFT typically reduces memory usage by 60-80% compared to full fine-tuning.
|Below are 13 Example Notebooks showcasing all the features of the Library in-depth!
Override default preprocessing for specific needs:
processor_params = {
'imputation_strategy': 'iterative',
'categorical_encoding': 'target',
'scaling_strategy': 'robust',
'resampling_strategy': 'smote'
}
pipeline = TabularPipeline(
model_name="TabICL",
processor_params=processor_params
)Combine meta-learning with PEFT for optimal results:
pipeline = TabularPipeline(
model_name="TabICL",
tuning_strategy="peft",
tuning_params={
'epochs': 20,
'learning_rate': 1e-5,
'finetune_mode': 'meta-learning',
'peft_config': {
'r': 16,
'lora_alpha': 32,
'lora_dropout': 0.1
}
}
)For detailed documentation, API reference, model configurations, and usage examples, please visit: Documentation
TabTune is built upon the excellent work of the following projects and research teams:
- OrionMSP1.0/1.5 - Multi-Scale Sparse Attention for Tabular In-Context Learning
- OrionBix - Tabular BiAxial In-Context Learnin
- TabPFN - Prior-data Fitted Networks for tabular data
- TabICL - Tabular In-Context Learning with scalable attention
- Mitra (Tab2D) - 2D Attention mechanism (Tab2D) for tabular data, included within AutoGluon
- ContextTab - Semantics-Aware In-Context Learning for Tabular Data
- TabDPT - Denoising Pre-training Transformer for Tabular Data
- AutoGluon - AutoML framework that inspired our unified API design
- LimiX โ Likelihood-based mixture modeling and probabilistic inference framework for structured tabular learning
- Reduce
batch_sizeintuning_params - Use
tuning_strategy='peft'for PEFT mode - Decrease
n_ensemblesorcontext_sizefor inference
- Some models have experimental PEFT support; use 'base-ft' strategy instead
- Check logs for model-specific warnings
- Ensure
deviceparameter matches your hardware (cuda/cpu) - Use
torch.cuda.is_available()to check GPU availability
This project is released under the MIT License.
Please cite appropriately if used in academic or production projects.
Citation:
@misc{tanna2025tabtuneunifiedlibraryinference,
title={TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models},
author={Aditya Tanna and Pratinav Seth and Mohamed Bouadi and Utsav Avaiya and Vinay Kumar Sankarapu},
year={2025},
eprint={2511.02802},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.02802},
}- Issues and discussions are welcomed on the GitHub issue tracker and Discord .
- Please see the Contributing section for contribution standards, code reviews, and documentation tips.