-
Notifications
You must be signed in to change notification settings - Fork 0
Quickstart
The nimble workflow consists of three main steps:
- Generate a reference library from .fasta or .csv reference sequence data.
- Align reads to the reference library using .fastq.gz or .bam input files.
- Generate a summarized output suitable for downstream analysis, such as integration with Seurat.
This guide will walk through a basic example using default settings. The syntax assumes that you've installed using Python and pip. If you're using docker, replace the python -m nimble at the beginning of each line with docker run ghcr.io/bimberlab/nimble:latest.
Before aligning reads, you must first generate a nimble library. This library serves as the reference against which sequencing reads will be aligned.
To create a library, use the following command:
python -m nimble generate --file reference.fasta --output_path library.json
-
--file: Input reference sequences in FASTA format. -
--output_path: Output file storing the reference library in nimble's format.
Alternatively, if you have a CSV file with "name" and "sequence" columns, use:
python -m nimble generate --file reference.csv --output_path library.json
Or, if your CSV file lacks the "sequence" column but contains metadata for the FASTA file, corresponding row-by-row, you can provide both:
python -m nimble generate --file reference.fasta --opt-file reference.csv --output_path library.json
The produced output is a .json file containing your sequence data, any additional metadata provided via the optional .csv file, and a set of configuration parameters dictating how nimble should go about aligning the file. These parameters have defaults set for a fairly lenient alignment. For more information about the configuration options, see the library file format documentation.
Once you have a nimble library, you can align your sequencing data.
python -m nimble align -r library.json -i input.fastq.gz -o results.tsv
For paired-end data:
python -m nimble align -r library.json -i input_R1.fastq.gz input_R2.fastq.gz -o results.tsv
For 10x BAM files (either R1-only or R1 and R2):
python -m nimble align -r library.json -i input.bam -o results.tsv
-
-r: The reference library file generated in the previous step. -
-i: The input sequencing reads (can be single or paired FASTQ, or a BAM file). -
-o: The output alignment results.
Visit the output file format documentation for more information about the files produced by nimble align.
Many scRNA-seq packages like Seurat expect count quantifications to be in a feature-by-cell matrix, which can be generated with the report command:
python -m nimble report -i results.tsv -o summary.tsv
-
-i: The output from the alignment step. -
-o: The processed output, formatted as a feature-by-cell matrix.
Visit the CLI parameters documentation for more detail about each command, and the library file format documentation for information about how to configure alignment behavior. See the separate R package nimbler for integrating nimble data into Seurat objects.