PGRR: Pangenome Gene Reference Resource

The Pangenome Gene Reference Resource (PGRR) is a comprehensive pangenome of the M. tuberculosis complex (MTBC), which spans all seven M. tuberculosis lineages and is produced using a virtually error-free sequencing and de novo assembly approach. It encompasses the complete panorama of genetic variation across all genes present in M. tuberculosis to provide an efficient tool for performing isolated gene alignments and for accessing gene-level information from the pangenome reference strains.

Description of Files

pangenome.fasta.gz: contains the biological sequences of all genes and their highly similar paralogs collected from 50 strains of M. tuberculosis across all seven lineages

pangenome_BCCM_unique_only.fasta.gz: subsetted version of pangenome.fasta.gz that contains only genes present in BCCM but not in H37Rv

pangenome_H37Rv_only.fasta.gz: subsetted version of pangenome.fasta.gz that contains only sequences belonging to the H37Rv reference genome

pangenome_no_BCCM.fasta.gz: subsetted version of pangenome.fasta.gz that includes all sequences EXCEPT for those found in the BCCM strains

Example Workflow

To use the PGRR, you can use any alignment tool such as Bowtie2 to align your query sequences to the PGRR fasta file:

# Unzip and index PGRR
gunzip pangenome.fasta.gz
bowtie2-build -f pangenome.fasta pgrr_reference

#Align paired sample to PGRR
bowtie2 --no-unal --very-sensitive-local -x pgrr_reference -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -S output.sam

# convert SAM file to BAM format and sort it for downstream analysis
samtools view -bS -u output.sam | samtools sort -n -o output.bam

Citation

Poonam Chitale, Elissa Ocke, Aubrey R. Odom, Howard Fan, Alexander Henoch, Karla Vasco, Emily C. Fogarty, Courtney Grady, Alexander D. Lemenze, Pradeep Kumar, Shannon Manning, A. Murat Eren, W. Evan Johnson, and David Alland. Defining the Mycobacterium tuberculosis Pangenome and Suggestions for a New Composite Reference Sequence.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
FASTA		FASTA
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PGRR: Pangenome Gene Reference Resource

Description of Files

Example Workflow

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PGRR: Pangenome Gene Reference Resource

Description of Files

Example Workflow

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages