This project is an end-to-end system for data cleaning, transformation, exploratory data analysis (EDA), and interactive visualization using only essential dependencies.
-
Data Cleaning Enhancements:
- Handling missing values (drop rows/columns and impute numeric values).
- Duplicate removal.
- Outlier detection (flagging using IQR or Z-score).
- Automatic data type correction with basic label encoding.
- (Optional) Normalization using StandardScaler or MinMaxScaler.
-
Data Transformation & Preprocessing:
- Feature engineering with date/time handling (extract month, day, weekday).
- Creation of polynomial (squared) features for numeric columns.
-
Exploratory Data Analysis (EDA):
- Summary statistics and correlation matrices.
- Advanced analysis: KMeans clustering and PCA.
-
Interactive Visualizations:
- Interactive histogram and heatmap using Plotly.