Skip to content

03. Binsanity binning (draft)

rprops edited this page May 3, 2017 · 1 revision

Binsanity is a recently published tool for binning based on coverage and sequence signatures. Main difference with CONCOCT is that it implements a sequential binning strategy. First it bins based on coverage and only then starts binning the low-quality bins (evaluated by CheckM) by kmer and GC%.

We opted here for the Binsanity-wf function since it automates the refinment using CheckM.

In order to run Binsanity, we need to rescale the coverage file and parse it slightly differently. IMPORTANT
Binsanity does not handle large data sets well (recommended < 100,000 contigs), so we first need to reduce the size of the assembly fasta to contigs larger than 3kb, and trim the coverage file to only those contigs, before rescaling.

# Remove header
tail -n +2 Coverage.tsv > Coverage_noheader.tsv

# Transform coverage profile
transform-coverage-profile -c Coverage_noheader.tsv -t scale

# Trim assembly fasta to > 3kb (bbmap tool)
reformat.sh in=final_contigs_c10K.fa out=final_contigs_c10K_3000_filtered.fa minlength=3000

# Quickly count number of contigs, to make sure you're below 100,000
grep -c ">" final_contigs_c10K_3000_filtered.fa

# Generate contig list
grep ">" final_contigs_c10K_3000.fa | awk '{print $1}' $1 | sed "s/>//g" > contigs_id_3000_filtered.txt


# Select contigs of interest (>3kb) from the generated coverage file
grep -Ff contigs_id_1000_filtered.txt Coverage_noheader.x100.lognorm > test

NOTICE: be aware that your sample names in the BAM files should be split by anything other than ".". Make sure that the bam files are sorted and indexed!!!:

bedtools multicov depends upon index BAM files in order to count the number of overlaps in each BAM file. As such, each BAM file should be position sorted (samtool sort aln.bam aln.sort) and indexed (samtools index aln.sort.bam) with either samtools or bamtools.

Binsanity-profile -o idba-assembly --contigs ../contigs/contigs.id.txt -i ../contigs/final_contigs_c10K.fa -s ../Map --transform Scale
Binsanity-wf -f ../contigs/ -l final_contigs_c10K.fa -c idba-assembly.cov.x100.lognorm

Clone this wiki locally