machine_learning_training/README.md at main · Tesfa3028/machine_learning_training

Introduction

This repository is created as part of Machine Learning Methods in Health Science (CHEP 898) couse at the Department of Community Health & Epidemiology (CHEP), University of Saskatchewan (USask).

1: Data Wrangling and Visualization

This section focuses on preparing, exploring, and visualizing data derived from the CANPATH dataset to ensure data quality and generate preliminary insights. The process includes data cleaning, dataset integration, descriptive statistical analysis, and visualization of key variables using the CANPATH Dataset

2: Unsupervised Learning- PCA, Cluster Tuning, and Linear Regression Analysis

Using the CanPath Student Dataset (Not Imputed), this section covers the implementation of a PCA model with default parameters and its evaluation, followed by tuning procedures for optimal cluster selection. It then compares PCA regression models using BMI as the outcome variable across both the baseline PCA configuration and the tuned cluster configuration to assess performance differences, and concludes with a feature‑importance analysis that identifies, visualizes, and interprets the most influential variables from the PCA regression model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction

1: Data Wrangling and Visualization

2: Unsupervised Learning- PCA, Cluster Tuning, and Linear Regression Analysis

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Introduction

1: Data Wrangling and Visualization

2: Unsupervised Learning- PCA, Cluster Tuning, and Linear Regression Analysis