A performance comparison between feature reduction and feature selection algorithms preprocessing on wide data
In this repository the R code for the feature selection and feature selection algorithms used in article A performance comparison between feature reduction and feature selection algorithms preprocessing on wide data, the stored algorithms are:
| Algorithm | Original package |
|---|---|
| Feature reduction - Linear Unsupervised | |
| PCA (Principal Component Analysis) | Rdimtools |
| LPE (Locality Pursuit Embedding) | Rdimtools |
| PFLPP (Parameter-Free Locality Preserving Projection) | Rdimtools |
| RNDPROJ (Random Projection) | Rdimtools |
| Feature reduction - Linear - Supervised | |
| FSCORE (Fisher Score) | Rdimtools |
| LSLS (Least Squares Linear Discriminant Analysis) | Rdimtools |
| LFDA (Local Fisher Discriminant Analysis) | Rdimtools |
| MMC (Maximum Margin Criterion) | Rdimtools |
| SAVE (Spectral Anticorrelation via Variance Expansion) | Rdimtools |
| SLPE (Supervised Locality Preserving Embedding) | Rdimtools |
| Feature reduction - Non linear | |
| MDS (Multidimensional Scaling) | Rdimtools |
| MMDS (Maximum Margin Dimensionality Reduction) | Rdimtools |
| LLE (Locally Linear Embedding) | Rdimtools |
| NPE (Neighborhood Preserving Embedding) | Rdimtools |
| LEA (Laplacian Eigenmaps) | Rdimtools |
| SNE (Stochastic Neighbor Embedding) | Rdimtools |
| Autoencoder | h2o |
| Feature selection | |
| SVM-RFE (Support Vector Machine - Recursive Feature Elimination) | sigFeature |
Also the an algorithm to estimate the reduction in non-linear algorithms proposed by (Yang et al., 2010 is included.
Run requeriments.R which will install the necessary libraries.
Rscript requeriments.R
Import preprocessing functions:
source("featureReducers.R")
This line also imports preprocessing_methods.R file which has the functions to format and manage the datasets to the one needed by the algorithms.
These algorithms receive as input a list with element "d" as the dataset and "tag" as its tags, function partitionDataTag can be used to format any dataframe in the desired format. Notice that the tag is placed in the last column.
# Create data based on iris dataset, we only select 2 classes since svm_rfe only
# works this way
data = iris[1:100,] %>%
partitionDataTag() %>%
partitionTrainTest()
data = iris[1:100,] %>%
partitionDataTag() %>%
partitionTrainTest()
Then, we can launch any of the feature reduction algorithms:
# The dimensionality reduction functions return a list with the reduced data and the transformation matrix
ndim=2
# Linear unsupervised
fReduction_pca(data, ndim)
fReduction_lpe(data, ndim)
fReduction_pflpp(data, ndim)
fReduction_rndproj(data, ndim)
# Linear supervised
fReduction_fscore(data, ndim)
fReduction_lfda(data, ndim)
fReduction_lsls(data, ndim)
fReduction_mmc(data, ndim)
fReduction_save(data, ndim)
fReduction_slpe(data, ndim)
# Non linear
fReduction_mds(data, ndim)
fReduction_mmds(data, ndim)
fReduction_lle(data, ndim)
fReduction_lea(data, ndim)
fReduction_npe(data, ndim)
fReduction_sne(data, ndim)
fReduction_autoencoder(data, ndim)
The functions can be used without the test dataset:
fReduction_pca(data$train, ndim)
To obtain the transformation matrix in linear algorithms, it is necessary to call the function from the Rdimtools library
Rdimtools::do.pca(
as.matrix(data$train$d),
ndim
)
To estimate the reduction in non-linear algorithms, the aproximate_nonlinear_transformation function is used
dataReduced = Rdimtools::do.mds(
as.matrix(data$train$d),
ndim
)$Y
aproximate_nonlinear_transformation(
as.matrix(data$train$d),
dataReduced,
as.matrix(data$test$d),
k=5
)
The feature selector returns the list of features ordered from highest to lowest importance
fSelection_svm_rfe(data$train)
@article{ramos2024extensive,
title={An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data},
author={Ramos-P{\'e}rez, Ismael and Barbero-Aparicio, Jos{\'e} Antonio and Canepa-Oneto, Antonio and Arnaiz-Gonz{\'a}lez, {\'A}lvar and Maudes-Raedo, Jes{\'u}s},
journal={Information},
volume={15},
number={4},
pages={223},
year={2024},
publisher={MDPI}
}