Data Analysis and Machine Learning Streamlit Application

This interactive Streamlit application provides a comprehensive platform for data analysis and machine learning. It allows users to load, explore, preprocess, group data, train various machine learning models, and save/load them for future use.

Features

Data Loading: Easily upload and preview your datasets in Excel (.xlsx) or CSV (.csv) formats.
Data Exploration: Gain insights into your data with descriptive statistics, data type displays, missing value summaries, correlation matrices, and a variety of interactive visualizations including histograms, bar charts, box plots, scatter plots, and pie charts.
Data Preprocessing: Prepare your data for modeling with options to:
- Use the first row as column headers.
- Handle missing values using methods like dropping rows/columns, mean, median, or most frequent imputation.
- Encode categorical variables using One-Hot Encoding or Label Encoding.
- Normalize numerical data using StandardScaler or MinMaxScaler.
Data Grouping: Aggregate your data by selected columns and apply various aggregation functions (mean, sum, count, min, max).
Model Training: Train machine learning models with a user-friendly interface:
- Select target and feature columns.
- Choose between classification (Logistic Regression, Decision Tree, Random Forest) and regression (Linear Regression, Decision Tree, Random Forest) tasks.
- Configure the test set size.
- Optionally perform hyperparameter optimization using GridSearch.
- Optionally perform cross-validation to assess model robustness.
Model Evaluation & Visualization: Evaluate trained models with key metrics and visualizations:
- Display accuracy (for classification) or Mean Squared Error (MSE) and R² score (for regression).
- Visualize classification results with ROC curves.
- Visualize regression results with residuals plots.
- Display feature importance for applicable models.
Model Saving & Loading: Save your trained models along with their feature and target column information, and load previously saved models for deployment or further analysis.

Technologies Used

Python
Streamlit
Pandas
NumPy
Matplotlib
Seaborn
Scikit-learn
Joblib
Chardet

Installation

To set up and run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/Ysen0603/DataScience_app
cd DataScience_app

Create a virtual environment (recommended):
```
python -m venv venv
```
Activate the virtual environment:
- Windows:
```
.\venv\Scripts\activate
```
- macOS/Linux:
```
source venv/bin/activate
```
Install dependencies:
```
pip install -r requirements.txt
```

Usage

To run the Streamlit application:

Activate your virtual environment (if not already active).
Navigate to the project root directory (where main.py is located).
Run the Streamlit application:
```
streamlit run main.py
```
This command will open the application in your default web browser.
Interact with the application:
- Use the sidebar navigation to switch between different sections: "Chargement des données" (Data Loading), "Exploration des données" (Data Exploration), "Prétraitement" (Preprocessing), "Groupement" (Grouping), "Entraînement du modèle" (Model Training), and "Sauvegarde du modèle" (Model Saving).
- Follow the on-screen instructions to upload data, perform analysis, train models, and save/load them.

Project Structure

.
├── main.py                     # Main Streamlit application entry point
├── data/                       # Directory for sample datasets
│   └── Titanic.csv             # Example dataset
├── PG/                         # Contains modules for different stages of the data pipeline
│   ├── __init__.py
│   ├── data_loading.py         # Handles data upload and initial display
│   ├── data_exploration.py     # Provides tools for data visualization and statistics
│   ├── data_preprocessing.py   # Manages data cleaning and transformation
│   ├── data_grouping.py        # Enables data aggregation and grouping
│   ├── model_training.py       # Facilitates machine learning model training
│   └── model_saving.py         # Manages saving and loading of trained models
├── utils/                      # Contains utility functions
│   ├── data_utils.py           # Helper functions for data handling
│   ├── model_utils.py          # Helper functions for model training and evaluation
│   └── preprocessing_utils.py  # Helper functions for data preprocessing
└── requirements.txt            # List of Python dependencies

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests.

Contact

Developed with ❤️ by Ennaya Yassine yassineennaya@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analysis and Machine Learning Streamlit Application

Features

Technologies Used

Installation

Usage

Project Structure

Contributing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
PG		PG
data		data
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Data Analysis and Machine Learning Streamlit Application

Features

Technologies Used

Installation

Usage

Project Structure

Contributing

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages