Skip to content

Quick Start Tutorial

genomewalker edited this page Feb 27, 2026 · 3 revisions

Quick Start Tutorial

This tutorial walks through binning a metagenomic assembly with AMBER, from BAM to validated HQ bins.

Prerequisites

  • AMBER built with LibTorch (CPU or GPU) — see the README installation section
  • CheckM2 for bin validation
  • A sorted, indexed BAM file of reads mapped to assembled contigs

Step 1: Prepare inputs

# Sort and index BAM if not already done
samtools sort -@ 16 mapped.bam -o mapped.sorted.bam
samtools index mapped.sorted.bam

# Verify
samtools flagstat mapped.sorted.bam

Minimum contig length: AMBER uses contigs ≥ 1,001 bp by default. Shorter contigs are excluded from binning but remain in the assembly.


Step 2: Single run

amber bin \
    --contigs assembly/final.contigs.fa \
    --bam mapped.sorted.bam \

    --encoder-seed 42 \
    --random-seed 1006 \
    --resolution 5.0 \
    --bandwidth 0.2 \
    --partgraph-ratio 50 \
    --encoder-restarts 3 \
    --leiden-restarts 25 \
    --threads 16 \
    --output run1/

# Expected output:
# run1/bins/        — one FASTA per bin
# run1/run.abin     — binary archive for resolve
# run1/amber_summary.tsv
# run1/damage_per_bin.tsv

Runtime: ~30 min on GPU (NVIDIA A100) for a 1 Gbp metagenome with 10M reads.


Step 3: Multiple runs and consensus (recommended)

Three independent runs with different seeds capture the stochastic variation in Leiden clustering and encoder training:

for seed in 42 7 123; do
    amber bin \
        --contigs assembly/final.contigs.fa \
        --bam mapped.sorted.bam \
    
        --encoder-seed $seed \
        --random-seed 1006 \
        --resolution 5.0 --bandwidth 0.2 --partgraph-ratio 50 \
        --encoder-restarts 3 --leiden-restarts 25 \
        --threads 16 \
        --output run_seed${seed}/
done

# Aggregate
amber resolve \
    --runs run_seed42/run.abin run_seed7/run.abin run_seed123/run.abin \
    --output consensus_bins/ \
    --threads 16

Step 4: Validate with CheckM2

conda activate checkm2

checkm2 predict \
    -i consensus_bins/bins/ \
    -o consensus_bins/checkm2 \
    -x fa \
    --threads 16

# Count HQ bins (≥90% complete, <5% contamination)
awk -F'\t' 'NR>1 && $2>=90 && $3<5' \
    consensus_bins/checkm2/quality_report.tsv | wc -l

# View all bins sorted by completeness
awk -F'\t' 'NR>1' consensus_bins/checkm2/quality_report.tsv \
    | sort -t$'\t' -k2 -nr | column -t | head -20

Step 5: Check damage profiles

# Already generated by amber bin:
column -t consensus_bins/bins/damage_per_bin.tsv | head

# Or re-compute post-hoc on consensus bins:
amber damage \
    --bam mapped.sorted.bam \
    --bins consensus_bins/bins/ \
    --output consensus_bins/damage_stats.tsv \
    --threads 16

Bins with damage_class = ancient should show C→T rates > 5% at position 1 and G→A rates > 5% at 3′ position 1, with exponential decay.


Step 6 (optional): Deconvolve ancient/modern populations

If your assembly contains both ancient (damaged) and modern (undamaged) reads from the same organism:

# Deconvolve reads mapped to the ancient bin consensus
amber deconvolve \
    --contigs consensus_bins/bins/bin.001.fa \
    --bam mapped.sorted.bam \
    --output deconvolve_bin001/ \
    --write-stats \
    --threads 16

# Two FASTA files:
# deconvolve_bin001/ancient_consensus.fa — aDNA population
# deconvolve_bin001/modern_consensus.fa  — modern population

Troubleshooting

Too few bins / all contigs in one bin

  • Increase --resolution (try 10.0 or 20.0)
  • Check that the BAM is sorted and indexed

Too many small bins / fragmented genomes

  • Decrease --resolution (try 2.0)
  • Increase --partgraph-ratio (try 75 or 100)

0 HQ bins

  • Check amber_summary.tsv for estimated completeness — are there MQ bins?
  • Verify that hmmsearch is in PATH (HMMER3 required for marker detection)
  • Check that contigs are ≥ 1,001 bp and the assembly is adequate depth

amber bin requires LibTorch support

  • Rebuild with -DAMBER_USE_TORCH=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
  • See the README