diff --git a/docs/devops.md b/docs/devops.md index 2957f21..d594936 100644 --- a/docs/devops.md +++ b/docs/devops.md @@ -16,25 +16,6 @@ Content ![](https://img.shields.io/badge/status-WorkInProgress-yellow) - Each time we merge a Pull request, we need to make a release - Publish on Posit connect with a new name that matches the version -## Data management - -### AWS - -- Samplesheet input files for pipelines - - `pipelineName_PI_hbcNNNNNN` - - Have a copy in project folder in O2 - - Manually removing weekly during platform meeting -- Raw data is under `input` folder - - Alex and Lorena and Emma can move data from O2/FAS to S3 - - `pipelineName_PI_hbcNNNNNN` - - lifecycle 14 days -- Pipeline outputs are under `results`: - - `pipelineName_PI_hbcNNNNNN` - - lifecycle 14 days for bigger than 1gb - - Move output pipeline to project folder under `final` folder -- Data cleaning every platform meeting -- Quarterly Evaluation: RNAseq, CHIPseq - ## Configure to use posit package manager [source](https://packagemanager.posit.co/client/#/repos/bioconductor/setup?bioconductor_version=3.18) @@ -107,4 +88,18 @@ BiocManager::install("BiocNeighbors") install.packages('NMF') install.packages("circlize") devtools::install_github("jinworks/CellChat") +``` + +## Build environments + +### scGPT in FAS + +``` +conda create -p ./scgpt-2 python=3.9 pip ipykernel -c conda-forge +conda install ipywidgets -c conda-forge +pip install torch==2.1.2 +conda install numpy +pip install scgpt +conda install wandb -c conda-forge +python -m ipykernel install --prefix=/n/holylfs05/LABS/hsph_bioinfo/Lab/shared_resources/scgpt-2 --name 'dcgpt2' --display-name 'scgpt-2' ``` \ No newline at end of file diff --git a/docs/environments.md b/docs/environments.md new file mode 100644 index 0000000..b5b06e7 --- /dev/null +++ b/docs/environments.md @@ -0,0 +1,28 @@ +# Environments availables + +## scGPT + +Only available at FAS computing resources. + +Please, reach out to platform to access for the first time to this env: + +- Add the environment first: + - Only the first time, add the env to the notebook kernels: + - `echo "n/holylfs05/LABS/hsph_bioinfo/Lab/shared_resources/scgpt-2" >> ~/.conda/environments.txt` + +- Start a python notebook on the [ondemand web-page](https://rcood.rc.fas.harvard.edu/pun/sys/dashboard/batch_connect/sessions). + - You need to be connected to the VPN to be connected. + - Use `gpu` partition and if you need more than one GPU or `gpu_test` if you need something that won't take a long time + - add these to the sbatch options `--gres=gpu:n` in the advance options, they are below in the page. + - Add `gcc/13.2.0-fasrc01` to the list of modules to load +- Once connected, if you get asked to use a password, just close the windows and open again. + - If you didn't add module `gcc/13.2.0-fasrc01`, do this now: + - Initiate a terminal session + - `module load gcc/13.2.0-fasrc01` + - Then, start a notebook choosing as kernel `scgpt-2` +- Validate everything works with these commands in the python notebook + - `import torch` + - `torch.cuda.is_available()` -> Should return `True` + - `torch.cuda.device_count()` -> Should return ` + - `import scgpt` + diff --git a/docs/index.md b/docs/index.md index 3e3a92a..375c9dc 100644 --- a/docs/index.md +++ b/docs/index.md @@ -8,6 +8,9 @@ Most analyses will follow the similar trajectory for set-up. We will note where ## Set up the package +
+ O2 instructions- Click to expand! + Log onto O2 via the command line and check two things (first-time only): * Remove `bcbio` from you `PATH` by commenting the line in your `.bashrc` if you have it @@ -33,7 +36,9 @@ When the session is started, set your library path by typing this command in you ``` .libPaths("/n/app/bcbio/R4.3.1") ``` +
+

Next, load `bcbioR` with: ``` @@ -88,7 +93,20 @@ usethis::proj_activate(project_path) > Note: This will restart the session in the project directory. This restart will clear the `.libPaths("/n/app/bcbio/R4.3.1")` and `library(bcbioR)` that we used earlier, so we will need to re-do them in the following steps. -### Setting up your workspace +## Using the template reports + +Many analyses have template reports that you can use. You can use these by using the approriate `bcbioR::bcbio_templates()` command from the table below: + +| Type of Analysis | `bcbioR::bcbio_templates()` command | +|:---:|:---| +| Bulk RNA-seq ![](https://img.shields.io/badge/status-stable-blue)| `bcbioR::bcbio_templates(type="rnaseq", outpath="reports")` | +| Single-cell RNA-seq ![](https://img.shields.io/badge/status-beta-yellow) | `bcbioR::bcbio_templates(type="singlecell", outpath="reports")` | +| ChIP-Seq ![](https://img.shields.io/badge/status-beta-yellow) | `bcbioR::bcbio_templates(type="chipseq", outpath="reports")` | +| CellChat ![](https://img.shields.io/badge/status-draft-grey)| **Under development:** `bcbioR::bcbio_templates(type="singlecell_delux", outpath="reports")` | +| COSMX ![](https://img.shields.io/badge/status-draft-grey)| `bcbioR::bcbio_templates(type="spatial", outpath="reports")` | +| DNA Methylation | **Under development** | + +## Setting up your workspace in O2 We will now add the `.libPath()` that is appropriate for our type of analysis. You can use the table below to determine which `.libPath()` is appropriate for your analysis: @@ -129,7 +147,7 @@ Now, we will use `bcbioR` to set-up the directory structure that we will be usin bcbioR::bcbio_templates(type="base", outpath=".", org="hcbc") ``` -### Setting up GitHub and RStudio +## Setting up GitHub and RStudio Now, we will connect O2 with GitHub. First, check in your Home directory if a `.gitconfig` file exists. ***You should only need to do this once.*** The contents should look like: @@ -169,7 +187,7 @@ Note: In order to see hidden file in your file browser on the O2 Portal, you wil
-#### Getting the Git tab +### Getting the Git tab Now, we would like to get the Git tab into our Workspace Browser (where `Environment`, `History`, `Connections` and `Tutorial` tabs are located). We show this transition below: @@ -219,7 +237,7 @@ Restart now? We will need to restart R in order to get the Git tab in our R Studio, so select `For sure`, `Yeah` or some other option for answering in the affirmative. -#### Creating the first commit +### Creating the first commit Now, we are going to create our first commit. In order to do this, we need to: @@ -234,7 +252,7 @@ These steps are summarized in the GIF below:

-#### Pushing our initial commit +### Pushing our initial commit Now we will use the function to push these changes to GitHub with the following command: @@ -260,7 +278,7 @@ If the push is successful, then it will look like this GIF below: > Note: You might get a GitHub 404 error page (see image below) when you do your first push to GitHub. Just refresh the page in your browser and it should be resolve itself. >

-##### Expired or non-existent GitHub token +#### Expired or non-existent GitHub token However, if your token is expired or this is your first time using GitHub from O2, then you will get this message: @@ -356,18 +374,6 @@ You should now see the HBC code as the header to the `README.md` on GitHub. Thes


-## Using the template reports - -Many analyses have template reports that you can use. You can use these by using the approriate `bcbioR::bcbio_templates()` command from the table below: - -| Type of Analysis | `bcbioR::bcbio_templates()` command | -|:---:|:---| -| Bulk RNA-seq| `bcbioR::bcbio_templates(type="rnaseq", outpath="reports")` | -| Single-cell RNA-seq | `bcbioR::bcbio_templates(type="singlecell", outpath="reports")` | -| ChIP-Seq | `bcbioR::bcbio_templates(type="chipseq", outpath="reports")` | -| CellChat | **Under development:** `bcbioR::bcbio_templates(type="singlecell_delux", outpath="reports")` | -| DNA Methylation | **Under development** | - # Tips for Moving Forward Now that we've gotten set-up for our project, here are a few last tips to try to make your experience smooth: @@ -376,19 +382,6 @@ Now that we've gotten set-up for our project, here are a few last tips to try to - Try to avoid editing files directly on GitHub. If you do, it will be important that you `Pull` the repository onto O2 before continuing on with your work on O2. If you forget to do this pull and make commits on O2, you can fix it, but it is beyond the scope of this guide. - Use the checklist in the `README.md` to help keep track of your progress. -# bcbioR supported templates - -We used `bcbioR` to deploy folders and code to our project directories to improve robustness in our analysis. - -You can install `bcbioR` as indicated here: `https://github.com/bcbio/bcbioR/tree/main` - -- RNAseq ![](https://img.shields.io/badge/status-stable-blue) -- ChipSeq ![](https://img.shields.io/badge/status-beta-yellow) -- scRNAseq ![](https://img.shields.io/badge/status-beta-yellow) -- CELLCHAT ![](https://img.shields.io/badge/status-draft-grey) -- TEASeq ![](https://img.shields.io/badge/status-draft-grey) -- COSMX ![](https://img.shields.io/badge/status-draft-grey) - # Note >These materials have been developed by members of the teaching and platform team at the Harvard Chan Bioinformatics Core (HBC) RRID:SCR_025373. diff --git a/docs/pipelines.md b/docs/pipelines.md index cc6d39a..fbf77cd 100644 --- a/docs/pipelines.md +++ b/docs/pipelines.md @@ -2,6 +2,39 @@ Content - ![](https://img.shields.io/badge/status-WorkInProgress-yellow) +## Data Management + +### AWS + +- Samplesheet input files for pipelines + - `pipelineName_PI_hbcNNNNNN` + - Have a copy in project folder in O2 + - Manually removing weekly during platform meeting +- Raw data is under `input` folder + - See instructions below to move data in/out + - `pipelineName_PI_hbcNNNNNN` + - lifecycle 14 days +- Pipeline outputs are under `results`: + - `pipelineName_PI_hbcNNNNNN` + - lifecycle 14 days for bigger than 1gb + - Move output pipeline to project folder under `final` folder + +### Move that in/out of AWS + +Follow this to copy data in and out of our AWS space: + +- Log in into transfer node in O2 +- Type `sudo -su bcbio` to be bcbio user +- Use this command to copy data to AWS: +``` +/usr/local/bin/aws s3 sync $FOLDER_WITH_FASTQ s3://hcbc-seqera/input/rnaseq_piname_hbcNNNN +``` +- Use this command to copy data from AWS: +``` +/usr/local/bin/aws s3 sync s3://hcbc-seqera/results/rnaseq_piname_hbcNNNN $FOLDER_PROJECT +``` +**Make sure bcbio group has read/write access to the folders otherwise `aws` command won't work, but won't error either.** + ## Parameters ### RNAseq diff --git a/mkdocs.yml b/mkdocs.yml index fd01b22..5f727ce 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -2,6 +2,7 @@ site_name: HCBC Platform nav: - Home: index.md - Pipelines: pipelines.md + - Tools environments: environments.md - Platform members: devops.md theme: