Diamond Dynamics Streamlit App
- Data Cleaning & Preprocessing
- EDA & Data Visualization
- Feature Engineering
- Outlier & Skewness Handling
- Regression using ML algorithms, including ANN
- Dimensionality Reduction with PCA
- Clustering & Cluster Labeling
- Streamlit UI Design
- Data Insertion and Preparation data_exploration.ipynb
- Data Preprocessing
- Feature Engineering
- Handle x,y,z values which were 0
- Derived new columns such as Volume,Dimension Ratio.etc
- Encoding
- Created Ordinal Encoding for columns Cut,Clarity and Color
- Created Pickle file to dump the encoders
- Exploratory Data Analysis EDA.ipynb
- Regression Model Reg_model_training.ipynb
- Created 9 different Regression Algorithms and Chose the best Model for Deployment to predict Diamond Prices
- Linear Regression
- KNN Regressor
- Support Vector Regressor
- Decision Tree Regressor
- Random Forest Regressor
- Ada Boost Regressor
- Gradient Boost Regressor
- Xg Boost Regressor
- ANN
- Mean Squared Error Loss Function
- Huber Loss Function
- Created 9 different Regression Algorithms and Chose the best Model for Deployment to predict Diamond Prices
- Saved the Best Performing Model in a Pickle File
- Clustering Model cluster_model.ipynb
- Used PCA for dimensionality reduction to 2 components
- Created 5 different Clustering Model and chose the best model for deployment to cluster Diamond, then analyze average price, carat, and cut distribution per cluster
- K-Means Clustering
- Performed Elbow Method to get the K-value
- Mini Batch K means Clustering
- Hierarchical Clustering
- Density Based Clustering
- Gaussian Mixture
- K-Means Clustering
- Saved The model with best Silhoutte Score in a Pickle File
-
S3 Bucket
- Saved all the encoders and models in the S3 Bucket.
-
Streamlit UI (Application) streamlit.py
- Retrieved all the files from S3 bucket for final predictions.
- Created Input Features with sliders and number input boxes
- Predicted the input values