Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 663 Bytes

File metadata and controls

22 lines (16 loc) · 663 Bytes

Clickstream Analysis on Databricks Using Spark (PySpark)

Analyzing clickstream data from Germany.

  • Data Exploration
  • Data Transformation
  • Feature Selection
  • Machine Learning Model
    • Random Forest Classifier
    • Gradient Boosted Tree Classifier
  • Model Prediction
  • Model Evaluation
    • Accuracy
    • Confustion Matrix
  • SQL Queries

Recommender System

Using collaborative filtering for latent feature discovery. We are wanting to find impressions that users would likely click. We use Alternate Least Square technique to perform matrix factorization to rank the offers for the users.

** Note that this project is not tuned for the best solution **