Genome Analysis Project
Project Plan
Aim: "Dead Zones" or areas of ocean with seasonal to long-term low dissolved oxygen (DO) concentrations are becoming increasingly more frequent. This study focuses on one of the largest costal dead zones on the northern Gulf of Mexico. The low DO levels are due to eutrophication-enhanced bacterioplankton respiration as well as strong seasonal stratification. I am going to assemble genomes of becterioplankton collected from the northern Gulf of Mexico dead zone by Thrash et al. as described in Metabolic roles of uncultivated bacterioplankton lineages in the Northern Gulf of Mexico “dead zone”. Analysis will include metagenome assembly, binning to create individual genomes, as well as expression analysis and phylogenetic assignment. Knowing which species or genuses of bacerioplankton are able to live in the nGOM, in both oxic and hypoxic environment is imperative to understanding the environment. Learning more about these species, both previously described and not will help us to better understand the active biogeochemical cycling they mediate. Expression analysis conducted along with functional annotation gives even more insight as we can see the metabolic pathways present and how they contribute to the environment.
Workflow
| Analysis | Program | Output | Expected Run Time | Expected Deadline | Completed |
|---|---|---|---|---|---|
| Reads Quality Control | FastQC | .html | ~15 min | April 7th | Yes |
| RNA Trimming | Trimmomatic | fastq | ~30 min | April 7th | Yes |
| DNA Assembly | Megahit |
|
~6 h | April 7th | Yes |
| Assembly Evaluation | QUAST |
|
~ 45 min | April 14th | Yes |
| DNA Alignment | BWA | .bam | ~ 4-6 h | April 21st | Yes |
| Binning | Metabat | .fasta for each bin | <30 min | April 21st | Yes |
| Binning Evaluation | CheckM | output report | ~2 h | April 21st | Yes |
| RNA Alignment | BWA | .sam | ~ 4-6 h | April 21st | Yes |
| Annotation | Prokka |
|
~ 1 h | April 28th | Yes |
| Phylogenetic Placement | PhyloPhlan |
|
~6 h | April 28th | Yes |
| Expression Analysis | HTseq | .txt with read count table | ~6 h | April 28th | Yes |
| Extra Analysis: Abundance of Organisms | BWA | .bam | ~ 4-6 h | May 12th | Yes |
Data Management System
├── genome_analyses
├── 01_reads_quality_control
│ └── fastqc_trimmed_RNA_script
│ └── slurm-7528571.out
│ ├── fastqc_DNA_results
│ ├── fastqc_RNA_post_trim_results
│ ├── fastqc_RNA_pre_trim_results
├── 02_RNA_trimming
│ └── adapter_sequences
│ └── slurm-7528563.out
│ └── trimmomatic_script
│ ├── trimmed_RNA
├── 03_DNA_assembly
│ └── megahit_script
│ └── slurm-7723214.out
│ ├── DNA_assembly_results
├── 04_assembly_evaluation
│ └── quast_script
│ └── slurm-7731177.out
│ ├── quast_results
├── 05_alignment
│ └── bwa_results_SRR4342129.bam
│ └── bwa_results_SRR4342133.bam
│ └── bwa_script
│ └── slurm-7795240.out
├── 06_binning
│ └── change_bin_names_script
│ └── depth.txt
│ └── metabat_script
│ └── slurm-7797909.out
│ └── slurm-7799427.out
│ ├── bins
├── 07_binning_evaluation
│ └── checkm_qa
│ └── checkm_qa_script
│ └── checkm_script
│ └── slurm-7799916.out
│ └── slurm-7827270.out
│ ├── CheckM_data
│ ├── checkm_results
├── 08_RNA_mapping
│ └── bwa_script_for_loop
│ └── bwa_script_for_loop_sample_2
│ └── slurm-7855951.out
│ └── slurm-7855966.out
├── 09_annotation
│ └── prokka_script
│ └── rename_annotation
│ └── slurm-7852786.out
│ ├── 1_prokka_results
│ ├── ...
│ ├── 49_prokka_results
│ ├── renamed_annotations
├── 10_phylo_placement
│ └── make_annotation_folder
│ └── phylophlan_conda_script
│ └── slurm-7856958.out
│ ├── phylogeny_results
├── 11_expression_analysis
│ └── 1_SRR4342137_expression_results.txt
│ └── ...
│ └── 49_SRR4342139_expression_results.txt
│ └── htseq_script
│ └── slurm-7856869.out
├── 12_abundance_of_bins
│ └── bwa_script
│ ├── mapping to bins
├── DNA_trimmed
├── RNA_untrimmed
