Cardiovascular Diseases Integrated Risk Model

Project Overview

This project integrates genomic, clinical, lifestyle, and social risk factors to predict coronary artery disease (CAD), atrial fibrillation (AFIB), and heart failure (HF)
Genomic data is aggregated into a polygenic score (PGS), clinical risk factors are aggregated into clinical risk scores (CRS), and lifestyle and social risk factors are aggregated into a polyexposure score (PXS)
This is an extension of a PSB paper published in 2026 that just focused on HF in All of Us (https://doi.org/10.1142/9789819824755_0046)
This project did analyses in UK Biobank and All of Us

Extract variants that are in PGS score files from PLINK files
Run PGSC-CALC to get PGS computed in study population
Conduct phenotyping for outcomes, covariates, and predictors for CRS and PXS
Split dataset into feature selection and IRM computation groups, and assess age, sex, and case/control distributions between the groups
Compute CRS
Normalize and scale PXS variables
Conduct feature selection on PXS variables
Select most important features for PXS
Run model evaluation script to get performance metrics (AUROC, AUPRC, F1 score, Balanced Accuracy)
Calculate case proportions in low, medium, and high risk groups defined by the risk scores
Make ROC/PRC curves