This repository contains the full analysis pipeline and associated scripts for the paper:
Optimizing Single-Cell Long-Read Sequencing for Enhanced Isoform Detection in Pancreatic Islets
Maria S. Hansen, Christopher J. Hill, Lori Sussel, Kristen L. Wells
bioRxiv, 2025.04.30.651101
https://doi.org/10.1101/2025.04.30.651101
This study uses Nanopore single-cell long-read RNA sequencing of mouse pancreatic islets, with cells captured using 10x Genomics technology. The goal was to evaluate whether 5′ capture, targeted insulin depletion, and extended reverse transcription can improve read length and isoform detection in single-cell long-read RNA-sequencing of pancreatic islets. This study utilizes the DTUrtle R package to assess differential transcript usage from the resulting data.
Please contact kristen.wells-wrasman@cuanschutz.edu with any questions.
This repository provides a complete pipeline for analyzing Nanopore long-read single-cell RNA-seq data. Specifically, it:
- Processes Nanopore single-cell long-read sequencing data.
- Performs quality control, alignment, and cell barcode quantification.
- Conducts dimensionality reduction and read length analysis.
- Analyzes differential transcript usage (DTU) to assess isoform expression.
- Processes sequencing data from 10X Genomics 3′ and 5′ single-cell platforms to analyze read start site bias.
- Processes sequencing data from bulk RNA-seq experiments for insulin depletion analysis .
- Generates all plots featured in the manuscript.
A three-stage pipeline for processing single-cell long-read data:
-
01_wf_single_cell/
Contains configuration files and scripts for running EPI2ME'swf-single-cellpipeline using Nextflow. This workflow includes steps for quality control, alignment, cell quantification, and UMAP plotting. -
02_postprocessing/
Contains the post-processing pipeline for analyzing single-cell RNA-seq data using aSnakemakepipeline. It processes output from thewf-single-cellpipeline that was executed in Step 1, generating barcodes and read length analysis data. -
03_plotting/
Contains scripts used to generate all figures in the manuscript, including UMAPs, gene and transcript identification plots, read length distributions, isoform usage plots, and insulin-depletion plots for single-cell long-read libraries.
This directory contains a Snakemake workflow and R scripts for the bulk RNA-seq analysis of the insulin depletion experiment.
Scripts for generating plots comparing read start site distributions between 3′ and 5′ single-cell RNA-seq datasets using short-read next generation sequencing (NGS) data, aimed at investigating differences in internal priming
-
01_wf_single_cell/
Output: matrix that can be loaded into seurat. -
02_postprocessing/
Output: Doublet identification plots, PCA plots, UMAPs, clustering tree, histograms and percent tagged plots based on read length data. Seurat objects. -
03_plotting/
Output: The majority of figures in the manuscript, including UMAPs, read length distributions, isoform usage plots, and insulin-depletion in single-cell long-read libraries.
Output: Volcano plot displaying differentially expressed genes between insulin-depleted and non-depleted pancreatic islet samples from bulk RNA-seq data.
Output: NGS plot showing read start site bias between 3' and 5' libraries.
Each subdirectory includes its own README with setup instructions, environment configuration, and execution steps.
-
Differential Transcript Usage (DTU) between alpha and beta cells
-
Read length distributions between published datasets
Sequencing data available at: [link]
Docker files for building Singularity images are located within each subdirectory.