For identifying plasmid IDs and deconvoluting PCR1 barcodes in plasmid ID sequencing protocol.
This project is under active development. For the version of Plasmid ID associated with the manuscript "Adaptation of Plasmid-ID Technology for Evaluation of N2-Fixing Effectiveness and Competitiveness for Root Nodulation in the Sinorhizobium-Medicago System", see the Geddes lab Plasmid ID 2025 release.
The easiest way to install the necessary dependencies for this workflow is to use Miniforge3. Download and installation instructions can be found here.
Assuming you have installed and configured miniforge3 correctly, you can create an environment with the necessary dependencies using a single command:
conda env create -n plasmid -c conda-forge -c bioconda vsearch numpy pandas biopythonOr you can create the environment using the environment.yml file included in this repository:
conda env create -n plasmid -f environment.ymlgit clone git@github.com:NDSU-Geddes-Lab/plasmid-id.git
cd plasmid-id/conda activate plasmidThe workflow assumes you have paired-end reads for each sequencing sample that you wish to deconvolute. Merging of paired-end reads is done with vsearch via the merge.sh script. Running the script without any options will produce the help menu.
./merge.shUSAGE: ./merge.sh R1.fastq.gz R2.fastq.gz
You will have to run the script once for each pair of reads (i.e. once per sample) and the output will be a single <sample_name>.merged.fastq.gz file ready for processing with the Python script.
To create a barcode dictionary for a new plasmid ID library, use the plasmid_make_db.py script. Running the script with the -h flag will show the help menu.
./plasmid_make_db.py -husage: plasmid_make_db.py [-h] [-f FW_PRIMERS] [-r RV_PRIMERS] [-5 LEFT] [-3 RIGHT] [-m MIN_COUNT] [-p MIN_PURITY]
seqfile
Identify random plasmid ID barcodes in sequence reads and create a dictionary.
positional arguments:
seqfile reads.fastq.gz
options:
-h, --help show this help message and exit
-f FW_PRIMERS, --fw-primers FW_PRIMERS
FASTA file with forward primers (default: FW_primers.fa)
-r RV_PRIMERS, --rv-primers RV_PRIMERS
FASTA file with reverse primers (default: RV_primers.fa)
-5 LEFT, --left LEFT 5-prime (left) flanking sequence (default: TGAACTGTACAAATGAAGGT)
-3 RIGHT, --right RIGHT
3-prime (right) flanking sequence (GCTT + N12 experiment tag) (default: GCTTTGTATCTTCACC)
-m MIN_COUNT, --min-count MIN_COUNT
minimum read count per well (default: 0)
-p MIN_PURITY, --min-purity MIN_PURITY
minimum relative abundance for a barcode in a well (default: 0.5)
If successful, the script will create a table of all ASVs identified in each well (*_asv_table.csv) and a barcode dictionary file resulting from filtering and naming the ASVs (*_db.csv), as well as some text output. For example:
./plasmid_make_db.py S216.merged.fastq.gzProcessed 385483 reads from S216.merged.fastq.gz
310394 reads (80.52%) matched expected read architecture
Wrote counts for 1584 unique barcodes to S216_asv_table.csv
Wrote 67 barcodes to S216_db.csv
Barcodes in the fincal dictionary will be named according to the well with the highest count of that barcode.
The plasmid_ID.py script takes a single argument - the reads to analyze – and produces an output CSV file with the count of each barcode identified. Running the script with the -h flag will show the help menu.
./plasmid_ID.py -husage: plasmid_ID.py [-h] [-f FW_PRIMERS] [-r RV_PRIMERS] [-5 LEFT] [-3 RIGHT] seqfile dictionary
Identify plasmid ID barcodes in sequence reads and search against barcode dictionary
positional arguments:
seqfile reads.fastq.gz
dictionary barcode_dict.csv
options:
-h, --help show this help message and exit
-f FW_PRIMERS, --fw-primers FW_PRIMERS
FASTA file with forward primers (default: FW_primers.fa)
-r RV_PRIMERS, --rv-primers RV_PRIMERS
FASTA file with reverse primers (default: RV_primers.fa)
-5 LEFT, --left LEFT 5-prime (left) flanking sequence (default: TGAACTGTACAAATGAAGGT)
-3 RIGHT, --right RIGHT
3-prime (right) flanking sequence (GCTT + N12 experiment tag) (default: GCTTTGTATCTTCACC)
If successful, the script will create a results file with the count for each well of each barcode matched in the dictionary, and will produce some messages as output. For example, using the dictionary created in the previous step:
./plasmid_ID.py S216.merged.fastq.gz S216_db.csvProcessed 385483 reads from S216.merged.fastq.gz
310394 reads (80.52%) matched expected read architecture
Wrote counts for 68 matched barcodes to S216_results.csv
This repository contains code from marcelamendoza/Plasmid-ID in accordance with the MIT license, Copyright (c) 2018 marcelamendoza.