PRS-work

This pipeline is structured to generate polygenic risk scores with the PRS-CSx tool in the Pan-UK Biobank dataset and validate and test them in the All of Us Research Program dataset. The phenotype I worked with is Standing Height.

Dependencies

R and Python Programming languages
Python packages scipy and h5py

Getting Started

Download the GWAS summary statistics from the Pan-UK Biobank Phenotypes page
Run the script to add rsids to the sumstats, 00_add_rsids.sh
The GWAS sumstats must be in the following format to run PRScsx:

    SNP          A1   A2   BETA      SE
    rs4970383    C    A    -0.0064   0.0090
    rs4475691    C    T    -0.0145   0.0094
    rs13302982   A    G    -0.0232   0.0199
    ...

If not already in the above format, run the script to reformat the PUKBB sumstats, 01_reformat_sumstats.R
Clone the PRS-CSx GitHub page

git clone https://github.com/getian107/PRScsx.git

Download and extract LD Reference Panel files using the following commands:

tar -zxvf ldblk_ukbb_afr.tar.gz

tar -zxvf ldblk_ukbb_amr.tar.gz

tar -zxvf ldblk_ukbb_eas.tar.gz

tar -zxvf ldblk_ukbb_eur.tar.gz

tar -zxvf ldblk_ukbb_sas.tar.gz

Download the SNP information file and put it in the same folder containing the reference panels

Using PRS-CSx

Run the script 02_run_prscsx.sh with the correct population sample sizes, seed, and chromosome numbers
You will need to edit the first part of the script that looks like this:

pop1=afr
n1=6556 # pop1 sample size
pop2=amr
n2=972 # pop2 sample size
pop3=sas
n3=8657 # pop3 sample size
pop4=eas
n4=2397 # pop4 sample size
pop5=eur
n5=419596 # pop5 sample size
phi=1e-04 # phi value
seed= 60556 # seed value

I ran the following populations: AFR, AMR, EAS, EUR, SAS
The following seeds: 60556, 2001, 4928
And the following phi values: 1e-02, 1e-04, 1e-06, 1e-08
I split the runs into separate chromosomes in the following way to be more efficient:

1, 8, 9, 16, 17

2, 7, 10, 15, 18

3, 6, 11, 14, 19, 22

4, 5, 12, 13, 20, 21

PRScsx output

Files with the following format: Standing_height_POP_pst_eff_a1_b0.5_phiX_chr#.txt are generated
- POP is replaced with each population that you ran
- X is replaced with the phi value that you used
- The chromosome numbers correspond to the chromosome each file was generated from
These files include the following unlabelled columns: rsid, base position, A1, A2, posterior effect size estimate
This data is what is used to generate individual polygenic risk scores
They will be used as input for the All of Us Validation step

All of Us Validation and Testing Steps

Upload the PRScsx output files to your Google bucket by running the following commands on the command line:
- gcloud auth login
- Follow the steps, copy the url and then paste the verification code
- Then run this command with the prefix of your PRS-CSx files in place of Standing and the file path that you want to use in your Google Bucket:
gsutil cp Standing* gs://fc-secure-0b5d7336-c242-426a-8854-548d4ed254d8/data/PRScsx_Standing_Height_output
This section of the pipeline is designed to be run on the All of Us Research Program server
03_retrieve_height_weight.html retrieves the height and weight data for the All of Us research subjects
- This file is specific to the height phenotype, and will need to be edited to work with other phenotypes
04_calc_PRS_in_AoU.ipynb calculates polygenic risk scores with the All of Us dataset using the weights generated with PRS-CSx
- You will need to change a few lines of this script to fit the phenotype and populations you are working with
  1. Change the Standing* prefix to the prefix of your PRS-CSx output in line 11
  2. Change the populations in this script to the populations that you are using in the bash portion of the script:
```
for pop in AFR AMR EAS EUR SAS
```
05_part1_compare_PRS_to_observed_join_data.html retrieves the phenotypes and ancestry PCs of the All of Us dataset and joins this data with the calculated PRS's to create joint phenotype and PRS file: Pan-UKB_Standing_height_PRSCSx_phiX_in_AoU_w_pheno.txt where X is the phi value
06_Validation.txt to output Validation weights
- Use this command with the pheno file created in the previous step, the name of your desired validation weights file, and the populuations you wish to run the script with
```
 Rscript 06_Validation.txt -f Pan-UKB_Standing_height_PRSCSx_phi1e-02_in_AoU_w_pheno.txt -v Validation_output_weights.txt -p afr,amr,eas,eur,sas
```
07_Testing.txt to output Testing Adjusted R Squared Values for each pop
- Use this command with the validation weights file created int eh previous step and the list of populations that you wish to test in
```
 Rscript 07_Testing.txt -v Validation_output_weights.txt -p afr,amr,eas,eur,sas -o Adjusted_R_squared_phiX_SEED.txt
```
- Replace X with the correct phi value and SEED with the correct seed value

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PRS-work

Dependencies

Getting Started

Using PRS-CSx

PRScsx output

All of Us Validation and Testing Steps

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
00_add_rsids.sh		00_add_rsids.sh
01_reformat_sumstats.R		01_reformat_sumstats.R
02_run_prscsx.sh		02_run_prscsx.sh
03_retrieve_height_weight.html		03_retrieve_height_weight.html
04_calc_PRS_in_AoU.ipynb		04_calc_PRS_in_AoU.ipynb
05_part1_compare_PRS_to_observed_join_data.R		05_part1_compare_PRS_to_observed_join_data.R
06_Validation.txt		06_Validation.txt
07_Testing.txt		07_Testing.txt
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

PRS-work

Dependencies

Getting Started

Using PRS-CSx

PRScsx output

All of Us Validation and Testing Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages