Skip to content

Ismael-rp/feature_reduction_feature_selection_wide_data_comparison

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A performance comparison between feature reduction and feature selection algorithms preprocessing on wide data

In this repository the R code for the feature selection and feature selection algorithms used in article A performance comparison between feature reduction and feature selection algorithms preprocessing on wide data, the stored algorithms are:

Algorithm Original package
Feature reduction - Linear Unsupervised
PCA (Principal Component Analysis) Rdimtools
LPE (Locality Pursuit Embedding) Rdimtools
PFLPP (Parameter-Free Locality Preserving Projection) Rdimtools
RNDPROJ (Random Projection) Rdimtools
Feature reduction - Linear - Supervised
FSCORE (Fisher Score) Rdimtools
LSLS (Least Squares Linear Discriminant Analysis) Rdimtools
LFDA (Local Fisher Discriminant Analysis) Rdimtools
MMC (Maximum Margin Criterion) Rdimtools
SAVE (Spectral Anticorrelation via Variance Expansion) Rdimtools
SLPE (Supervised Locality Preserving Embedding) Rdimtools
Feature reduction - Non linear
MDS (Multidimensional Scaling) Rdimtools
MMDS (Maximum Margin Dimensionality Reduction) Rdimtools
LLE (Locally Linear Embedding) Rdimtools
NPE (Neighborhood Preserving Embedding) Rdimtools
LEA (Laplacian Eigenmaps) Rdimtools
SNE (Stochastic Neighbor Embedding) Rdimtools
Autoencoder h2o
Feature selection
SVM-RFE (Support Vector Machine - Recursive Feature Elimination) sigFeature

Also the an algorithm to estimate the reduction in non-linear algorithms proposed by (Yang et al., 2010 is included.

How to use

Run requeriments.R which will install the necessary libraries.

Rscript requeriments.R

Import preprocessing functions:

source("featureReducers.R")

This line also imports preprocessing_methods.R file which has the functions to format and manage the datasets to the one needed by the algorithms.

These algorithms receive as input a list with element "d" as the dataset and "tag" as its tags, function partitionDataTag can be used to format any dataframe in the desired format. Notice that the tag is placed in the last column.

# Create data based on iris dataset, we only select 2 classes since svm_rfe only
# works this way
data = iris[1:100,] %>%
  partitionDataTag() %>%
  partitionTrainTest()
data = iris[1:100,] %>%
  partitionDataTag() %>%
  partitionTrainTest()

Then, we can launch any of the feature reduction algorithms:

# The dimensionality reduction functions return a list with the reduced data and the transformation matrix

ndim=2

# Linear unsupervised
fReduction_pca(data, ndim)
fReduction_lpe(data, ndim)
fReduction_pflpp(data, ndim)
fReduction_rndproj(data, ndim)

# Linear supervised
fReduction_fscore(data, ndim)
fReduction_lfda(data, ndim)
fReduction_lsls(data, ndim)
fReduction_mmc(data, ndim)
fReduction_save(data, ndim)
fReduction_slpe(data, ndim)

# Non linear
fReduction_mds(data, ndim)
fReduction_mmds(data, ndim)
fReduction_lle(data, ndim)
fReduction_lea(data, ndim)
fReduction_npe(data, ndim)
fReduction_sne(data, ndim)
fReduction_autoencoder(data, ndim)

The functions can be used without the test dataset:

fReduction_pca(data$train, ndim)

To obtain the transformation matrix in linear algorithms, it is necessary to call the function from the Rdimtools library

Rdimtools::do.pca(
  as.matrix(data$train$d),
  ndim
)

To estimate the reduction in non-linear algorithms, the aproximate_nonlinear_transformation function is used

dataReduced = Rdimtools::do.mds(
  as.matrix(data$train$d),
  ndim
)$Y

aproximate_nonlinear_transformation(
  as.matrix(data$train$d),
  dataReduced,
  as.matrix(data$test$d),
  k=5
)

The feature selector returns the list of features ordered from highest to lowest importance

fSelection_svm_rfe(data$train)

Cite this article

@article{ramos2024extensive,
  title={An Extensive Performance Comparison between Feature Reduction and Feature Selection Preprocessing Algorithms on Imbalanced Wide Data},
  author={Ramos-P{\'e}rez, Ismael and Barbero-Aparicio, Jos{\'e} Antonio and Canepa-Oneto, Antonio and Arnaiz-Gonz{\'a}lez, {\'A}lvar and Maudes-Raedo, Jes{\'u}s},
  journal={Information},
  volume={15},
  number={4},
  pages={223},
  year={2024},
  publisher={MDPI}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages