Automated Data Preprocessing Web App (Streamlit)

A production-style Streamlit application for uploading CSV datasets and performing automated preprocessing, cleaning, analysis, transformation, and interactive visualization — with AI-style rule-based recommendations.

Features

CSV upload (drag & drop supported) with dataset preview and metadata
Automated profiling: missing values, duplicates, types, unique counts, memory usage
Data quality score (0–100) + health breakdown
Cleaning tools:
- Missing value handling (mean/median/mode/ffill/bfill/drop rows/drop columns)
- Duplicate detection + removal
- Automatic datatype detection & intelligent conversions
- IQR-based outlier detection + optional removal
Transformations:
- Categorical encoding (Label / One-Hot)
- Feature scaling (Standard / MinMax / Robust)
- Reusable sklearn Pipeline + ColumnTransformer
Interactive dashboard (Plotly): histograms, box plots, scatter, correlations, missingness heatmaps, pie charts
Export:
- Download cleaned CSV
- Download transformed (encoded/scaled) dataset
- Download preprocessing report (JSON/Markdown, and PDF if enabled)

Project Structure

project/
├── app.py
├── preprocessing.py
├── visualization.py
├── utils.py
├── report_generator.py
├── requirements.txt
├── assets/
└── sample_data/

Run Locally

cd /home/nuraxx/Documents/Data_preprocessing_project
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

Try the Sample Dataset

Use the file in sample_data/sample_employee_data.csv.

Deploy (Streamlit Community Cloud)

Push this repo to GitHub.
On Streamlit Community Cloud, create a new app.
Select the repository and set the main file path to app.py.
Ensure requirements.txt is in the repo root.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated Data Preprocessing Web App (Streamlit)

Features

Project Structure

Run Locally

Try the Sample Dataset

Deploy (Streamlit Community Cloud)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
sample_data		sample_data
.gitignore		.gitignore
README.md		README.md
app.py		app.py
preprocessing.py		preprocessing.py
report_generator.py		report_generator.py
requirements.txt		requirements.txt
utils.py		utils.py
visualization.py		visualization.py

Folders and files

Latest commit

History

Repository files navigation

Automated Data Preprocessing Web App (Streamlit)

Features

Project Structure

Run Locally

Try the Sample Dataset

Deploy (Streamlit Community Cloud)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages