- This project integrates genomic, clinical, lifestyle, and social risk factors to predict coronary artery disease (CAD), atrial fibrillation (AFIB), and heart failure (HF)
- Genomic data is aggregated into a polygenic score (PGS), clinical risk factors are aggregated into clinical risk scores (CRS), and lifestyle and social risk factors are aggregated into a polyexposure score (PXS)
- This is an extension of a PSB paper published in 2026 that just focused on HF in All of Us (https://doi.org/10.1142/9789819824755_0046)
- This project did analyses in UK Biobank and All of Us
- Extract variants that are in PGS score files from PLINK files
- Run PGSC-CALC to get PGS computed in study population
- Conduct phenotyping for outcomes, covariates, and predictors for CRS and PXS
- Split dataset into feature selection and IRM computation groups, and assess age, sex, and case/control distributions between the groups
- Compute CRS
- Normalize and scale PXS variables
- Conduct feature selection on PXS variables
- Select most important features for PXS
- Run model evaluation script to get performance metrics (AUROC, AUPRC, F1 score, Balanced Accuracy)
- Calculate case proportions in low, medium, and high risk groups defined by the risk scores
- Make ROC/PRC curves