Forest Cover Type Classification Project

Overview

This project aims to classify forest cover types based on cartographic variables (such as elevation, slope, soil type, etc.) using Machine Learning algorithms. The project utilizes the Forest Cover Type dataset and implements multiple models to compare their performance.

Dataset

Name: Forest Cover Type (Covtype)
Source: UCI Machine Learning Repository / Scikit-Learn
Target: 7 different forest cover types
Features: 54 columns (Elevation, Aspect, Slope, Distances to Hydrology/Roadways/Firepoints, Hillshade, Wilderness Areas, Soil Types)

Models Implemented

The following models were trained and evaluated:

Multi-Layer Perceptron (MLP)
- Type: Neural Network (MLPClassifier)
- Optimization: Hyperparameter tuning using RandomizedSearchCV
- Best Params: (Found via tuning, e.g., hidden layers, activation, alpha)
- Performance: High Accuracy (~95%) and AUC (~0.99)
Support Vector Machine (SVM)
- Type: Linear SVM (LinearSVC)
- Configuration: Wrapped in CalibratedClassifierCV for probability estimates.
- Parameters: dual=False, random_state=42
- Performance: Good baseline, efficient for large datasets.
Logistic Regression (LR)
- Type: Logistic Regression
- Parameters: solver='saga', max_iter=500, n_jobs=-1
- Performance: Comparable to SVM, serves as a linear baseline.

Evaluation Metrics

Models are evaluated using:

Accuracy Score: Overall correctness of predictions.
AUC Score (Macro): Area Under the ROC Curve, handling multi-class classification via One-vs-Rest (OvR).
Confusion Matrix: Visualizing true vs. predicted classes.
Classification Report: Precision, Recall, and F1-Score for each class.

Visualizations

The notebook includes several visualizations to understand the data and model performance:

Class Distribution: Count plot of the target variable.
Feature Correlation: Heatmap showing relationships between features.
Boxplots: Distribution of continuous features by class.
ROC Curves: Multi-class ROC curves for the MLP model.
Confusion Matrices: Heatmaps for MLP, SVM, and Logistic Regression.
Model Comparison: Bar chart comparing Accuracy and AUC across all models.
Loss Curve: Training loss over iterations for the MLP.

Requirements

To run this notebook, you need the following Python libraries:

numpy
pandas
scikit-learn
matplotlib
seaborn
joblib

How to Run

Ensure all dependencies are installed (pip install -r requirements.txt if available, or install individually).
Open ML_Project.ipynb in Jupyter Notebook or VS Code.
Run all cells sequentially.
- Note: The SVM and Logistic Regression training steps might take a few minutes due to the dataset size.
The final cells will display the model comparison table and plots.

Files

ML_Project.ipynb: Main project notebook.
mlp_model_metrics.csv: Saved metrics for the MLP model.
mlp_covtype_tuned_final_model.joblib: Saved trained MLP model.
scaler_covtype.joblib: Saved data scaler.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
ML_Project.ipynb		ML_Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forest Cover Type Classification Project

Overview

Dataset

Models Implemented

Evaluation Metrics

Visualizations

Requirements

How to Run

Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Forest Cover Type Classification Project

Overview

Dataset

Models Implemented

Evaluation Metrics

Visualizations

Requirements

How to Run

Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages