🚀 Blind Data Classifier

This project provides a robust framework for classifying blind datasets, offering data preparation, feature selection, and the application of various machine learning classifiers. It's designed to streamline the process from raw data to trained and evaluated models.

🌟 Features

Automated Data Selection: Automatically identifies the "best" suitable dataset for classification from a directory of CSV files based on criteria like unique classes, number of rows, and class balance.
Comprehensive Data Preprocessing: Handles missing values, removes duplicate rows, and normalizes numerical features to ensure data quality.
Recursive Feature Elimination (RFE): Employs RFE with Logistic Regression to select the most relevant features, reducing dimensionality and improving model performance.
Flexible Data Splitting: Splits data into training and testing sets for model development and evaluation.
Hyperparameter Tuning with GridSearchCV: Utilizes the train_and_tune_classifier function to optimize classifier performance by performing a grid search over specified hyperparameters, using StratifiedKFold cross-validation and weighted F1 score to handle class imbalance effectively.
Multiple Classifiers: Implements and evaluates popular classification algorithms:
- K-Nearest Neighbors (KNN) 🏘️
- Random Forest 🌳
- Multi-layer Perceptron (MLP) / Neural Network 🧠
Detailed Classifier Evaluation: Provides comprehensive evaluation metrics including Confusion Matrix, Classification Report, Accuracy, F1 Score, Precision, and Recall.
Insightful Visualizations: Generates plots for data distribution, feature selection, train/test data distribution, and decision boundaries (for 2D data).

├── Blind_data_classifier/

│ └── bases/ # Place your raw CSV datasets here

│ ├── 01.csv

│ ├── 02.csv

│ └── ...

│ └── processed_data.csv # Processed data will be saved here

├── main.py # Main script to run the classification pipeline

├── data_preparation.py # Functions for data import, preprocessing, splitting, and feature selection

├── data_selection.py # Logic for selecting the best database from the 'bases' folder

├── classifiers.py # Functions for training and evaluating different classifiers

└── plot.py # Utilities for generating various plots

🔍 Curiosity:

The databases used here are actually the same databases used in the Research article - Clustering cancer gene expression data: a comparative study. Available at the link:

https://link.springer.com/article/10.1186/1471-2105-9-497

🛠️ Installation - 🌐 Setting up your environment:

0. Pre steps:

On Linux 🐧:

sudo apt update
sudo apt install python3-venv python3-full

On Windows 🪟: ⚠️ On PowerShell: ⚠️

python --version
pip --version

If these commands fail, you may need to reinstall Python, ensuring you check the "Add Python to PATH" option.

1. Virtual enviroment creation Linux/Windows:

# Create and enter a directory for your project
mkdir Classify
cd Classify

# Create the virtual environment named 'venv'
python3 -m venv venv

🌎 2. Activate the Virtual Environment:

On Linux 🐧:

source venv/bin/activate

On Windows 🪟:

.\venv\Scripts\activate

You will know it's active because (venv) will appear at the beginning of your terminal prompt.

3. Install Libraries:

pip install pandas numpy scikit-learn matplotlib seaborn

🚀 Usage - 4. Clone the Repository:

⚠️ Make sure that you still in the correct directory ⚠️

git clone https://github.com/JvFg92/Blind_data_classifier

5. Running the scripts: ▶️

On Linux 🐧:

cd Classify

source venv/bin/activate

cd Blind_data_classifier

python main.py

On Windows 🪟:

cd Classify

.\venv\Scripts\activate

cd Blind_data_classifier

#py main.py
python main.py

⚠️ 6. When you're finished, you can deactivate the environment with a single command: ⚠

deactivate

exit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Blind Data Classifier

🌟 Features

🔍 Curiosity:

🛠️ Installation - 🌐 Setting up your environment:

0. Pre steps:

1. Virtual enviroment creation Linux/Windows:

🌎 2. Activate the Virtual Environment:

3. Install Libraries:

🚀 Usage - 4. Clone the Repository:

5. Running the scripts: ▶️

⚠️ 6. When you're finished, you can deactivate the environment with a single command: ⚠

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
bases		bases
README.md		README.md
classifiers.py		classifiers.py
data_preparation.py		data_preparation.py
data_selection.py		data_selection.py
main.py		main.py
plot.py		plot.py
processed_data.csv		processed_data.csv

JvFg92/Blind_data_classifier

Folders and files

Latest commit

History

Repository files navigation

🚀 Blind Data Classifier

🌟 Features

🔍 Curiosity:

🛠️ Installation - 🌐 Setting up your environment:

0. Pre steps:

1. Virtual enviroment creation Linux/Windows:

🌎 2. Activate the Virtual Environment:

3. Install Libraries:

🚀 Usage - 4. Clone the Repository:

5. Running the scripts: ▶️

⚠️ 6. When you're finished, you can deactivate the environment with a single command: ⚠

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages