Skip to content

kevinxaviour/Diamond_Dynamics

Repository files navigation

Diamond Dynamics Streamlit App

Project Takeaways:

  • Data Cleaning & Preprocessing
  • EDA & Data Visualization
  • Feature Engineering
  • Outlier & Skewness Handling
  • Regression using ML algorithms, including ANN
  • Dimensionality Reduction with PCA
  • Clustering & Cluster Labeling
  • Streamlit UI Design

Work Flow

  • Data Insertion and Preparation data_exploration.ipynb
    • Data Preprocessing
    • Feature Engineering
      • Handle x,y,z values which were 0
      • Derived new columns such as Volume,Dimension Ratio.etc
    • Encoding
      • Created Ordinal Encoding for columns Cut,Clarity and Color
      • Created Pickle file to dump the encoders
  • Exploratory Data Analysis EDA.ipynb
  • Regression Model Reg_model_training.ipynb
    • Created 9 different Regression Algorithms and Chose the best Model for Deployment to predict Diamond Prices
      • Linear Regression
      • KNN Regressor
      • Support Vector Regressor
      • Decision Tree Regressor
      • Random Forest Regressor
      • Ada Boost Regressor
      • Gradient Boost Regressor
      • Xg Boost Regressor
      • ANN
        • Mean Squared Error Loss Function
        • Huber Loss Function
image
- Saved the Best Performing Model in a Pickle File
  • Clustering Model cluster_model.ipynb
    • Used PCA for dimensionality reduction to 2 components
    • Created 5 different Clustering Model and chose the best model for deployment to cluster Diamond, then analyze average price, carat, and cut distribution per cluster
      • K-Means Clustering
        • Performed Elbow Method to get the K-value
      • Mini Batch K means Clustering
      • Hierarchical Clustering
      • Density Based Clustering
      • Gaussian Mixture
    • Saved The model with best Silhoutte Score in a Pickle File
image
  • S3 Bucket

    • Saved all the encoders and models in the S3 Bucket.
    image
  • Streamlit UI (Application) streamlit.py

    • Retrieved all the files from S3 bucket for final predictions.
    • Created Input Features with sliders and number input boxes
    • Predicted the input values
    image