-
Notifications
You must be signed in to change notification settings - Fork 2
Example
This document provides instructions for a first time user to setup his/her environment and test by running an example analysis.
Table of Contents generated with DocToc
Depending on your current environment setup, you may need to make the following changes before proceeding. Start by logging into us1salx00635.corpnet2.com with the terminal client of your choice (e.g. putty or exceed).
You need to use the bash shell. To determine your current shell, run ps -p $$ and the shell will be listed under CMD.
Ensure you have the slurm commands in your path by trying squeue. If you get an error indicating the command is not found then add this line:
export PATH=$PATH:/usr/local/slurm/bin
to the end of the my.bashrc file in your home directory (if you don't have this file, create a text file and name it my.bashrc) and then start a new session.
Ensure you have tabix in your path by trying tabix. If you get an error indicating the command is not found then add this line:
export PATH=$PATH:/GWD/bioinfo/apps/bin
to the end of the my.bashrc file in your home directory.
At the us1salx00635.corpnet2.com terminal, run the following command once, log out and log in again.
/GWD/appbase/common/adm/set-me-up.sh
Note, this is a toy dataset derived from HapMap subjects so there are no concerns over security of personally-identifiable information. Further, GSK indicated in research use applications to Coriell that the samples would be used to generate whole-genome data and this data would be used for system testing so there is no expiration associated with this use of the data. All of the clinical data is simulated.
Check to confirm the directory structure looks like this:
> tree /GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/ABC123456 /GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/input
/GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/ABC123456
|-- Analysis
| |-- input -> /GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/input
|-- AnalysisReadyData
|-- genotypes
| |-- ABC123456-pca.eigenvec
| |-- ABC123456-pca.eigenvec.all10
|-- imputed-20120314
|-- Imputed_HLAalleles_AllSubjects_Additive.dose.gz
|-- Imputed_HLAalleles_AllSubjects_Additive.info.gz
|-- chr21chunk3.dose.gz
|-- chr21chunk3.info.gz
|-- chr22chunk3.dose.gz
|-- chr22chunk3.info.gz
|-- chr22chunk4.dose.gz
|-- chr22chunk4.info.gz
/GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/input
|-- Pheno.txt
|-- config.txt
|-- cvlist.txt
|-- demo.txt
|-- groups.txt
|-- models.txt
|-- pop.txt
|-- variables.txt
5 directories, 18 files
If there are extra files, then either another user is currently testing or completed testing but failed to reset the workspace. Check the owner of the files and follow up with that user.
If there are missing files, please check with an administrator to restore these from the GitHub repository.
Submit this command to run the test
/GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/qsub_slurm.sh Example Email /GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/ABC123456/Analysis
Replacing Email with your e-mail address. This example is small so analysis should finish within a few minutes, no need to monitor progress.
You will know the analysis is complete when you receive an e-mail from root [root@gsk.com] with the subject Job NNNNNN (Example) Complete. The exit status in this e-mail should be 0. Also run the following to check results are as expected:
diff /GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/ABC123456/Analysis/outputs/report-short.html /GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/Expected_report-short.html
If either of these conditions is not met, please consult an administrator for support.
Once complete, please restore the workspace by deleting your output as follows:
/GWD/appbase/projects/statgen/GXapp/G-P_assoc_pipeline/GDCgtx/reset_workspace.sh