You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: example/README.md
+4-51Lines changed: 4 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ Here we assume that
4
4
5
5
1. the data to be analyzed are already uploaded to S3 bucket by [the Data Admin](https://wanggroup.org/productivity_tips/memverge-aws#notes-for-data-admin).
6
6
2. the analysis script is also available on S3 --- in this example the [xqtl-pipeline repo](https://github.com/cumc/xqtl-pipeline) is cloned to the bucket.
7
-
3. the container image used for the analysis is the latest
7
+
3. the FSx or EFS volume already has all necessary software installed
8
8
9
9
We use the command below to submit commands in `commands_to_submit.txt`.
-`-g g-sidlpgb7oi9p48kxycpmn` and `-sg sg-02867677e76635b25` are gateway ID and security group, respectively. You can ask your admin for these IDs. These help in the VM's networking.
38
36
-`-efs 10.1.10.210` specifies the IP of the EFS used in order to access installed packages.
39
37
-`--job-script ./example/commands_to_submit.txt` provides the actual commands we want to submit to the VM. Providing this specifies batch mode
40
-
-`--oem-packages` and `--mount-packages` are two modes that specify how the user can use certain packages. The former allows the user to use shared packages, and the latter allow the user to use user-installed packages. Specifying both will aloo both these features, but either can be used.
38
+
-`--oem-packages` and `--mount-packages` are two modes that specify how the user can use certain packages. The former allows the user to use shared packages, and the latter allow the user to use user-installed packages. One and only one can be used.
41
39
-`-c 2` and `-m 16` specifies that the VM should have 2 CPU threads and 16GB of memory.
42
40
-`--job-size 100` will split commands per line within `commands_to_submit.txt` into batches, each batch has at most 100 commands.
43
41
-`--mount` includes three folders: the AWS folder `s3://statfungen/ftp_fgc_xql` is mounted to the VM as `~/data`; the AWS folder `s3://statfungen/ftp_fgc_xqtl/sos_cache/aw3600` is mounted to the VM as `~/.sos`; the AWS folder `statfungen/ftp_fgc_xqtl/analysis_result/finemapping_twas` is mounted to the VM as `~/output`. Notice how they are comma-separated.
44
42
-`--mountOpts` specifies "mode=r" for the first folder that mounts it as read-only to the analysis command. That means the analysis command cannot directly change or add anything to `~/data` folder in the VM. The second folder is mounted with "mode=rw", that is, the analysis command can write into the `~/.sos` folder in the VM. The third folder is mounted with "mode=rw", so we can directly write the outputs to that folder as they are generated. Notice how they are comma-separated.
45
43
-`--download` specifies the folder inside of the S3 bucket that we would like to download to the VM, at the begin of the analysis. If any data has been downloaded using this command, you should update the file paths in the 'commands_to_submit.txt' file accordingly. And **add `/` after the local folder in download** (because we want to download into a folder). For instance, if we downloaded genotype data from `statfungen/ftp_fgc_xqtl/ROSMAP/genotype/analysis_ready/geno_by_chrom/` to the VM at `/home/$username/input/`, then the genotype data path in your 'commands_to_submit.txt' should be specified as `../input`.
46
44
-`--download-include` should be used to specify the prefix or suffix of files you want to download from S3 bucket.
47
45
-`--ebs-mount` Mount a dedicated local EBS volume to the VM instance. When downloading data from an S3 bucket instead of using direct mounts, ensure you allocate sufficient storage space to the destination path by mounting a dedicated EBS volume. It must be different from the path in `--mount` which mounts a folder on the S3 bucket.
48
-
-`-jn` is the job name of the batch job. By default, the name of the batch job is the name of the image. If a job name is specified, a number suffix will be added to the job name. For example, if there were 10 jobs submitted with this command, you will see job names from `example_job_1` to `example_job_10`.
46
+
-`-jn` is the job name of the batch job. If a job name is specified, a number suffix will be added to the job name. For example, if there were 10 jobs submitted with this command, you will see job names from `example_job_1` to `example_job_10`.
49
47
-`--no-fail-fast` when this switch is turned on, all commands in a batch will be executed regardless if the previous ones failed or succeeded.
50
48
51
49
To test this for yourself without submitting the job, please add `--dryrun` to the end of the command (eg right after `--no-fail-fast`) and run on your computer. You should find a file called `commands_to_submit_1.mmjob.sh` you can take a look at it to see the actual script that will be executed on the VM.
@@ -56,57 +54,12 @@ To test this for yourself without submitting the job, please add `--dryrun` to t
56
54
-g g-sidlpgb7oi9p48kxycpmn \
57
55
-sg sg-02867677e76635b25 \
58
56
-efs 10.1.10.210 \
59
-
--oem-packages \
60
57
--mount-packages \
61
58
-jn TEST_ROCKEFELLER_oem_mount_packages \
62
59
-ide juypter
63
60
```
64
61
65
62
Some of these parameters are shared with the batch job above. They will be skipped in the following explanation:
66
-
-`--oem-packages` and `--mount-packages` are two modes that specify how the user can use certain packages. The former allows the user to use shared packages, and the latter allow the user to use user-installed packages. Specifying both will allow both these features, but either can be used.
63
+
-`--oem-packages` and `--mount-packages` are two modes that specify how the user can use certain packages. The former allows the user to use shared packages, and the latter allow the user to use user-installed packages. One and only one can be used.
67
64
-`-jn` is the job name of the interactive job. By default, the name of the interactive job would be `<user>_<ide>_<port>`.
68
65
-`-ide jupyter` specifies the ide used for the interactive job and providing this specifies an interactive job. By default, it will use the shell session `tmate`, however, `jupyter`, `vscode`, and `rstudio` can be used.
check the contents of `commands_to_submit_0.mmjob.sh` to understand what it is. Then you can use this for analysis on the cluster to submit eg 1300 jobs. To do so, you can put `--job-size 20` so you will submit 1300 / 20 = 65 jobs. Each of these jobs will use 3 CPU and 32 G of memory, which you can change. If you use multiple CPU, the jobs will be running in parallel by batches of size specified by `--parallel-commands`, default value set to `-c`.
111
-
112
-
Once you are comfortable with the outcome of the `--dryrun`, you can remove `--dryrun` and run on the HPC, which will submit all the jobs.
0 commit comments