Sequencing data analysis for the "Co-evolution of four bacterial species reduces facilitative interactions" paper
This repository hosts the code for the sequencing data analysis. The experiment described in the paper was followed using DNA short-read, DNA long-read and RNA short-read sequencing data.
The conda environment containing most of the software used for data analysis is exported in environment.yml
The genomic short-read and long-read data was processed using a Snakemake workflow.
They can be found in the following directories:
- Illumina:
scripts/workflows/illumina - PacBio:
scripts/workflows/pacbio
The RNA sequencing data was analyzed usign RASflow workflow.
Variants and other information relevant for the analysis was parsed and stored ad dataframes in the variants directory.
Below is an outline of the most important dataframes:
variants/variants_comp_mapping.csvcontains all filtered variants for the genomic Illumina datavariants/{at,ct}_variants_annotations.csvsame as above, but additionally with genomic annotations for Ct and Atvariants/snps_freebayes_comp_mapping.csvcontains all fixated variants for the genomic Illumina datavariants/snps_pacbio.csvcontains all SNPS in the genomic PacBio datavariants/assembly_length.csvassembly lenghts for PacBio assemblies
The code for the figures can be found in scripts/diversity.py for variant analsysis and scripts/deletions.py for PacBio data.