-
Notifications
You must be signed in to change notification settings - Fork 0
Quick Start Tutorial
genomewalker edited this page Feb 27, 2026
·
3 revisions
This tutorial walks through binning a metagenomic assembly with AMBER, from BAM to validated HQ bins.
- AMBER built with LibTorch (CPU or GPU) — see the README installation section
- CheckM2 for bin validation
- A sorted, indexed BAM file of reads mapped to assembled contigs
# Sort and index BAM if not already done
samtools sort -@ 16 mapped.bam -o mapped.sorted.bam
samtools index mapped.sorted.bam
# Verify
samtools flagstat mapped.sorted.bamMinimum contig length: AMBER uses contigs ≥ 1,001 bp by default. Shorter contigs are excluded from binning but remain in the assembly.
amber bin \
--contigs assembly/final.contigs.fa \
--bam mapped.sorted.bam \
--encoder-seed 42 \
--random-seed 1006 \
--resolution 5.0 \
--bandwidth 0.2 \
--partgraph-ratio 50 \
--encoder-restarts 3 \
--leiden-restarts 25 \
--threads 16 \
--output run1/
# Expected output:
# run1/bins/ — one FASTA per bin
# run1/run.abin — binary archive for resolve
# run1/amber_summary.tsv
# run1/damage_per_bin.tsvRuntime: ~30 min on GPU (NVIDIA A100) for a 1 Gbp metagenome with 10M reads.
Three independent runs with different seeds capture the stochastic variation in Leiden clustering and encoder training:
for seed in 42 7 123; do
amber bin \
--contigs assembly/final.contigs.fa \
--bam mapped.sorted.bam \
--encoder-seed $seed \
--random-seed 1006 \
--resolution 5.0 --bandwidth 0.2 --partgraph-ratio 50 \
--encoder-restarts 3 --leiden-restarts 25 \
--threads 16 \
--output run_seed${seed}/
done
# Aggregate
amber resolve \
--runs run_seed42/run.abin run_seed7/run.abin run_seed123/run.abin \
--output consensus_bins/ \
--threads 16conda activate checkm2
checkm2 predict \
-i consensus_bins/bins/ \
-o consensus_bins/checkm2 \
-x fa \
--threads 16
# Count HQ bins (≥90% complete, <5% contamination)
awk -F'\t' 'NR>1 && $2>=90 && $3<5' \
consensus_bins/checkm2/quality_report.tsv | wc -l
# View all bins sorted by completeness
awk -F'\t' 'NR>1' consensus_bins/checkm2/quality_report.tsv \
| sort -t$'\t' -k2 -nr | column -t | head -20# Already generated by amber bin:
column -t consensus_bins/bins/damage_per_bin.tsv | head
# Or re-compute post-hoc on consensus bins:
amber damage \
--bam mapped.sorted.bam \
--bins consensus_bins/bins/ \
--output consensus_bins/damage_stats.tsv \
--threads 16Bins with damage_class = ancient should show C→T rates > 5% at position 1 and G→A rates > 5% at 3′ position 1, with exponential decay.
If your assembly contains both ancient (damaged) and modern (undamaged) reads from the same organism:
# Deconvolve reads mapped to the ancient bin consensus
amber deconvolve \
--contigs consensus_bins/bins/bin.001.fa \
--bam mapped.sorted.bam \
--output deconvolve_bin001/ \
--write-stats \
--threads 16
# Two FASTA files:
# deconvolve_bin001/ancient_consensus.fa — aDNA population
# deconvolve_bin001/modern_consensus.fa — modern population- Increase
--resolution(try 10.0 or 20.0) - Check that the BAM is sorted and indexed
- Decrease
--resolution(try 2.0) - Increase
--partgraph-ratio(try 75 or 100)
- Check
amber_summary.tsvfor estimated completeness — are there MQ bins? - Verify that
hmmsearchis in PATH (HMMER3 required for marker detection) - Check that contigs are ≥ 1,001 bp and the assembly is adequate depth
- Rebuild with
-DAMBER_USE_TORCH=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX - See the README