Here is my code for running a co-occurrence analysis with data generated by abricate.
- Screen one or many
.fastafile(s) against a database with abricate. - Read the data into R.
- Wrangle the data by pre-processing and creating a count table.
- Use the cooccur package to create a cooccur object and visualise it as a heatmap.
The code is organised to be used for any table produced by abricate to screen against a database of genes.
Short read sequencing makes de novo genome assembly difficult at the site of repeated sequences, which cannot be bridged if the reads are shorter than the repeat length.
We used co-occurrence analysis to overcome these limitations of short-read sequencing. Co-occurrence analysis allows for the comparison of genetic interactions across genomes. It can show which genes are carried together hinting at carriage on the same or different replicons. We ran a probabilistic analysis of gene co-occurrence to understand the genetic location of AMR genes relative to each other in short-read genomic data.
We used SPAdes v3.11.1 to assemble 772 genomes of Escherichia coli and Klebsiella pneumoniae previously sequenced and isolated from blood and stool samples from Blantyre, Malawi. Abricate v0.0.9 was used to screen for AMR genes against the Resfinder database (60% minimum length and 90% percentage identity).
We then used the cooccur package v1.3 in R v4.3.1 to employ a probabilistic model to determine whether each pair of genes had an observed co-occurrence which was significantly different from the expected co-occurrence, revealing genetic interactions among AMR genes. If a pair of genes were observed to co-occur significantly less than expected by chance their relationship was termed negative, if they were observed to co-occur significantly more it was termed positive and any pair of genes which were observed to co-occur with no significant difference from their expected value were termed random. A significant difference was defined as having a p-value ≤ 0.05.
The final figure will look like this:
This analysis was part of a wider study which has been published in Nature Communications:
You will find the complete code including instructions about how to run it here.
This analysis takes a binary table of AMR genes present in a defined set of genomes from abricate and creates a co-occurrence matrix and subsequent analysis of the co-occurrence interactions between AMR genes across the genomes. The code can be used for any table produced by abricate to screen against a database of genes.
- Screen one or many
.fastafile(s) against a database with abricate. You can screen against AMR genes (CARD, Resfinder), virulence factors (VFDB) and metal resistance genes (MEGARES). - Read the data into R.
- Wrangle the data by pre-processing which includes the removal of duplicates and creating a count table which allows us to display it in a way in which we can make it into a co-occurrence table.
- Use the cooccur package to create a cooccur object. This can then be visualised with the cooccur package itself or as a heatmap with the pheatmap function. You can either visualise the heatmap to contain all the genes in the matrix or subset to pick a selection of interesting genes.

