MLbat is a Python implementation of the ComBat algorithm that allows fitting the batch adjustment parameters on training data and imputing the transformation to testing data, hence avoiding data leakage in the preprocessing of the data.
MLbat has been developed using as a base the InMoose implementation of the ComBat batch adjustment algorithm, in accordance with their free-of-use license.
Keep in mind there is another version of the ComBat algorithm that is suitable for Machine Learning pipelines called pycombat.
You can install MLbat with pip using
pip install git+https://github.com/alafca/mlbat
Assuming we have split the data into train and test, and we have stored normalized counts as a dataframe, and batch information as a series:
from mlbat.mlbat import MLbat
mlbat = MLbat()
adj_train_counts_df = mlbat.fit_transform(counts=train_counts_df, batch=train_batch.to_list())
adj_test_df = mlbat.transform(counts=test_counts_df, batch=test_batch.to_list())The MLbat class follows a scikit-learn-like class structure, and has three available methods:
.fit(): fits the parameters without returning the adjusted data..fit_transform(): fits the parameters and returns the adjusted data..transform(): uses the fitted parameters to adjust data.
No documentation is currently available, however, you can refer to InMoose's pycombat_norm documentation.
For the complete license terms, please see the LICENSE file in the repository.
This respository is based on the code from InMoose used under the GNU GPLv3.