Skip to content

Latest commit

 

History

History
11 lines (6 loc) · 1.17 KB

File metadata and controls

11 lines (6 loc) · 1.17 KB

Introduction

This repository is created as part of Machine Learning Methods in Health Science (CHEP 898) couse at the Department of Community Health & Epidemiology (CHEP), University of Saskatchewan (USask).

1: Data Wrangling and Visualization

This section focuses on preparing, exploring, and visualizing data derived from the CANPATH dataset to ensure data quality and generate preliminary insights. The process includes data cleaning, dataset integration, descriptive statistical analysis, and visualization of key variables using the CANPATH Dataset

2: Unsupervised Learning- PCA, Cluster Tuning, and Linear Regression Analysis

Using the CanPath Student Dataset (Not Imputed), this section covers the implementation of a PCA model with default parameters and its evaluation, followed by tuning procedures for optimal cluster selection. It then compares PCA regression models using BMI as the outcome variable across both the baseline PCA configuration and the tuned cluster configuration to assess performance differences, and concludes with a feature‑importance analysis that identifies, visualizes, and interprets the most influential variables from the PCA regression model.