Skip to content

Quickstart

hextraza edited this page Mar 20, 2025 · 23 revisions

Overview

The nimble workflow consists of three main steps:

  1. Generate a reference library from .fasta or .csv reference sequence data.
  2. Align reads to the reference library using .fastq.gz or .bam input files.
  3. Generate a summarized output suitable for downstream analysis, such as integration with Seurat.

This guide will walk through a basic example using default settings. The syntax assumes that you've installed using Python and pip. If you're using docker, replace the python -m nimble at the beginning of each line with docker run ghcr.io/bimberlab/nimble:latest.

Usage

1. Generate a Reference Library

Before aligning reads, you must first generate a nimble library. This library serves as the reference against which sequencing reads will be aligned.

To create a library, use the following command:

python -m nimble generate --file reference.fasta --output_path library.json

  • --file: Input reference sequences in FASTA format.
  • --output_path: Output file storing the reference library in nimble's format.

Alternatively, if you have a CSV file with "name" and "sequence" columns, use:

python -m nimble generate --file reference.csv --output_path library.json

Or, if your CSV file lacks the "sequence" column but contains metadata for the FASTA file, corresponding row-by-row, you can provide both:

python -m nimble generate --file reference.fasta --opt-file reference.csv --output_path library.json

The produced output is a .json file containing your sequence data, any additional metadata provided via the optional .csv file, and a set of configuration parameters dictating how nimble should go about aligning the file. These parameters have defaults set for a fairly lenient alignment. For more information about the configuration options, see the library file format documentation.

2. Align Reads to the Reference Library

Once you have a nimble library, you can align your sequencing data.

python -m nimble align -r library.json -i input.fastq.gz -o results.tsv

For paired-end data:

python -m nimble align -r library.json -i input_R1.fastq.gz input_R2.fastq.gz -o results.tsv

For 10x BAM files (either R1-only or R1 and R2):

python -m nimble align -r library.json -i input.bam -o results.tsv

  • -r: The reference library file generated in the previous step.
  • -i: The input sequencing reads (can be single or paired FASTQ, or a BAM file).
  • -o: The output alignment results.

Visit the output file format documentation for more information about the files produced by nimble align.

3. Generate a Feature-by-Cell Matrix

Many scRNA-seq packages like Seurat expect count quantifications to be in a feature-by-cell matrix, which can be generated with the report command:

python -m nimble report -i results.tsv -o summary.tsv

  • -i: The output from the alignment step.
  • -o: The processed output, formatted as a feature-by-cell matrix.

Visit the CLI parameters documentation for more detail about each command, and the library file format documentation for information about how to configure alignment behavior. See the separate R package nimbler for integrating nimble data into Seurat objects.

Clone this wiki locally