Comprehensive Small RNA-seq Analysis Platform
Features • Installation • Quick Start • Documentation • Citation
sRNAtlas is an open-source bioinformatics platform designed to democratize small RNA sequencing analysis. It provides researchers—regardless of computational expertise—with a complete, reproducible pipeline for analyzing miRNAs, siRNAs, piRNAs, and other small non-coding RNAs.
The platform integrates established bioinformatics tools (Bowtie, Samtools, Cutadapt, pyDESeq2) into a cohesive workflow, accessible through an intuitive web interface built with Streamlit. From raw FASTQ files to differential expression results and pathway enrichment, sRNAtlas handles the entire analysis pipeline while maintaining full transparency and reproducibility.
Key principles:
- Accessibility: No command-line expertise required
- Transparency: All parameters visible and adjustable
- Reproducibility: Consistent, version-controlled analysis
- Open source: Free to use, modify, and distribute under AGPL-3.0-or-later
sRNAtlas is a powerful, user-friendly application for comprehensive small RNA sequencing (sRNA-seq) data analysis. Built with Streamlit, it provides an intuitive interface for researchers to process raw sequencing data through quality control, alignment, quantification, differential expression analysis, and functional enrichment—all without requiring command-line expertise.
- 🎯 Purpose-built for small RNA: Optimized parameters for miRNA, siRNA, piRNA analysis
- 🖥️ No coding required: Intuitive web interface for all analysis steps
- 🔬 Complete pipeline: From raw FASTQ to publication-ready figures
- 📊 Interactive visualizations: Explore your data with dynamic plots
- 💾 Project management: Save, load, and share analysis sessions
- 🧪 Reproducible: Consistent results with version-controlled parameters
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Raw │───▶│ Quality │───▶│ Adapter │───▶│ Reference │
│ FASTQ │ │ Control │ │ Trimming │ │ Database │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ Functional │◀───│ DE │◀───│ Read │◀──────────┘
│ Enrichment │ │ Analysis │ │ Counting │
└─────────────┘ └─────────────┘ └─────────────┘
| Module | Description | Key Features |
|---|---|---|
| 📁 Project | Organize your analysis | Sample management, metadata, save/load projects |
| 📊 Quality Control | Assess read quality | Size distribution, quality scores, contamination check |
| ✂️ Trimming | Remove adapters | Cutadapt integration, preset adapters, length filtering |
| 🗄️ Databases | Reference management | miRBase, RNAcentral, custom FASTA, index building |
| 🔗 Alignment | Map reads | Bowtie optimization for small RNA, multi-mapper handling |
| 🔬 Post-Align QC | Alignment quality | Mapping stats, strand bias, 5' nucleotide analysis |
| 📈 Counting | Quantification | Feature counting, count matrix generation |
| 🧬 DE Analysis | Differential expression | pyDESeq2, volcano plots, heatmaps, PCA |
| 🔍 Novel miRNA | Discovery | Identify unannotated small RNAs |
| 🧫 isomiR | Variant analysis | Detect isoforms, differential usage, arm switching |
| 🎯 Targets | Target prediction | psRNATarget (plants), miRanda (animals) |
| 🧬 GO/Pathway | Enrichment | Gene Ontology, KEGG pathway analysis |
| ⚡ Batch | Automation | Full pipeline batch processing |
| 📋 Reports | Export | HTML reports, figure export |
| RNA Type | Size Range | Characteristics |
|---|---|---|
| miRNA | 18-25 nt | Gene expression regulators, 5' U bias |
| siRNA | 20-24 nt | RNAi pathway, perfect complementarity |
| piRNA | 24-32 nt | Transposon silencing, germline |
| tRF/tsRNA | 14-40 nt | tRNA-derived fragments, stress response |
| rsRF | 15-40 nt | rRNA-derived fragments |
| snoRNA | 60-300 nt | Small nucleolar RNA, rRNA modification |
| snRNA | 100-300 nt | Small nuclear RNA, splicing |
| Y RNA | 80-120 nt | DNA replication, quality control |
Plant-specific: tasiRNA, phasiRNA, natsiRNA, hc-siRNA
- 🔬 Novel miRNA Discovery: Identify unannotated small RNAs from unaligned reads
- 🧫 isomiR Analysis: Detect 5'/3' variants, SNPs, and non-templated additions
↔️ Arm Switching Detection: Identify 5p/3p dominance changes between conditions- 🔄 Differential isomiR Usage: Compare isomiR ratios across experimental groups
- 📊 Multi-group Comparison: ANOVA for >2 conditions with pairwise comparisons
- 🔥 Cluster Analysis: Hierarchical clustering with interactive heatmaps
- 📈 Interactive Plots: Zoom, pan, and export publication-ready figures
- ⚡ Performance Caching: Streamlit caching for faster repeat analyses
- 🎯 QC Scorecard: Traffic-light quality assessment with outlier detection
- 🔢 Multi-mapper Modes: Unique, fractional, and primary alignment counting
- 🐍 miRanda Integration: Animal miRNA target prediction
- 📋 Provenance Tracking: YAML/JSON export for full reproducibility
- 🔍 Multi-sample QC Overlays: PCA-based sample clustering and outlier detection
- Python 3.9 or higher
- 8+ GB RAM recommended
- External tools: Bowtie, Samtools, Cutadapt
# Clone the repository
git clone https://github.com/osman12345/sRNAtlas.git
cd sRNAtlas
# Create conda environment
conda create -n srnatlas python=3.10 -y
conda activate srnatlas
# Install bioinformatics tools
conda install -c bioconda bowtie samtools -y
# Install Python dependencies
pip install -r requirements.txt# Clone and setup
git clone https://github.com/osman12345/sRNAtlas.git
cd sRNAtlas
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# or: venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Install external tools separately
# See docs/INSTALLATION.md for platform-specific instructions# Build the image
docker build -t srnatlas .
# Run the container
docker run -p 8501:8501 -v $(pwd)/data:/app/data srnatlas
# Or use docker-compose
docker-compose up -d# Check Python packages
python -c "import streamlit; import pandas; import pysam; print('OK')"
# Check external tools
bowtie --version
samtools --version
cutadapt --version
# Run tests
python -m pytestcd sRNAtlas
streamlit run app/main.pyOpen your browser to http://localhost:8501
- Click Project in the sidebar
- Enter project name and select organism
- Upload your FASTQ files
| Step | Module | Action |
|---|---|---|
| 1 | Quality Control | Assess raw read quality |
| 2 | Trimming | Remove adapters (select preset) |
| 3 | Databases | Download miRBase + build index |
| 4 | Alignment | Map reads to reference |
| 5 | Post-Align QC | Verify alignment quality |
| 6 | Counting | Generate count matrix |
| 7 | DE Analysis | Compare conditions |
| 8 | GO/Pathway | Functional enrichment |
- Download count matrices, DE results, and figures
- Generate HTML reports
- Save project for future analysis
📖 See the Quick Start Guide for a detailed walkthrough.
| Document | Description |
|---|---|
| Quick Start Guide | Get running in 10 minutes |
| User Guide | Complete documentation |
| Installation Guide | Platform-specific setup |
The application includes comprehensive documentation accessible via the Help module:
- Quick start tutorials
- Module reference
- File format specifications
- FAQ and troubleshooting
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16+ GB |
| Storage | 20 GB | 100+ GB |
| CPU | 4 cores | 8+ cores |
| Dependency | Version | Purpose |
|---|---|---|
| Python | 3.9+ | Runtime |
| Bowtie | 1.3+ | Alignment (not Bowtie2) |
| Samtools | 1.17+ | BAM processing |
| Cutadapt | 4.0+ | Adapter trimming |
streamlit>=1.28.0
streamlit-option-menu>=0.3.6
pandas>=2.0.0
numpy>=1.24.0
plotly>=5.18.0
pysam>=0.22.0
biopython>=1.81
scipy>=1.11.0
scikit-learn>=1.3.0
pydeseq2>=0.4.0
FASTQ (raw reads):
@SEQ_ID
TAGCTTATCAGACTGATGTTGA
+
IIIIIIIIIIIIIIIIIIIII
Count Matrix (CSV):
,Sample1,Sample2,Sample3,Sample4
hsa-miR-21-5p,1500,1450,2800,2750
hsa-miR-155-5p,200,180,650,620Sample Metadata (CSV):
sample,condition,batch
Sample1,control,1
Sample2,control,1
Sample3,treatment,2
Sample4,treatment,2| Output | Format | Description |
|---|---|---|
| Trimmed reads | FASTQ.gz | Adapter-free, filtered reads |
| Alignments | BAM/BAI | Sorted, indexed alignments |
| Count matrix | CSV | Read counts per feature |
| DE results | CSV | log2FC, p-value, FDR |
| Figures | PNG/HTML | Interactive visualizations |
| Reports | HTML | Comprehensive analysis summary |
| Parameter | Default | Description |
|---|---|---|
-v |
1 | Mismatches allowed (0-3) |
-k |
10 | Report up to k alignments |
--best |
On | Report best alignments first |
--strata |
On | Only report best stratum |
| Parameter | Default | Description |
|---|---|---|
| Min length | 18 nt | Discard shorter reads |
| Max length | 35 nt | Discard longer reads |
| Quality cutoff | 20 | Trim low-quality bases |
| Error rate | 0.1 | Adapter matching tolerance |
| Parameter | Default | Description |
|---|---|---|
| FDR threshold | 0.05 | Significance cutoff |
| log2FC threshold | 0.585 | ~1.5-fold change |
| Min count | 10 | Filter low-count features |
sRNAtlas includes a comprehensive test suite:
# Run all tests
python -m pytest
# Run with verbose output
python -m pytest -v
# Run specific test file
python -m pytest tests/test_file_handlers.py
# Run with coverage
python -m pytest --cov=utils --cov=modules| Module | Tests | Status |
|---|---|---|
| File handlers | 12 | ✅ Passing |
| Progress tracker | 5 | ✅ Passing |
| Cluster analysis | 6 | ✅ Passing |
| Novel miRNA | 8 | ✅ Passing |
| isomiR | 19 | ✅ Passing |
| Total | 51 | All Passing |
sRNAtlas/
├── app/
│ └── main.py # Application entry point
├── modules/
│ ├── qc_module.py # Quality control
│ ├── trimming_module.py # Adapter trimming
│ ├── alignment_module.py # Read alignment
│ ├── counting_module.py # Read counting
│ ├── de_module.py # Differential expression
│ ├── novel_mirna_module.py # Novel miRNA discovery
│ ├── isomir_module.py # isomiR analysis
│ └── ...
├── utils/
│ ├── file_handlers.py # File I/O utilities
│ ├── plotting.py # Visualization functions
│ ├── cluster_analysis.py # Clustering utilities
│ ├── caching.py # Streamlit caching helpers
│ ├── error_handling.py # Error classification & diagnostics
│ ├── qc_scorecard.py # QC scoring with thresholds
│ ├── provenance.py # Reproducibility tracking
│ ├── miranda.py # miRanda target prediction
│ └── ...
├── config/
│ └── settings.py # Configuration
├── tests/
│ ├── conftest.py # Test fixtures
│ └── test_*.py # Test files
├── docs/
│ ├── QUICK_START.md
│ ├── USER_GUIDE.md
│ └── INSTALLATION.md
├── assets/
│ └── logo.svg # Application logo
├── requirements.txt
└── README.md
| Problem | Cause | Solution |
|---|---|---|
| 0 reads after trimming | Wrong adapter | Try different preset or check library kit |
| 0% alignment rate | Wrong reference | Verify organism, rebuild index |
| Memory error | Too many samples | Process in smaller batches |
bowtie: not found |
Not in PATH | Install via conda or add to PATH |
pysam import error |
Missing htslib | conda install -c bioconda pysam |
- Check the in-app Help module
- Review User Guide
- Search existing GitHub Issues
- Open a new issue with:
- Error message
- Steps to reproduce
- System information
Contributions are welcome! Please read our contributing guidelines before submitting pull requests.
# Clone and setup
git clone https://github.com/osman12345/sRNAtlas.git
cd sRNAtlas
# Create dev environment
conda create -n srnatlas-dev python=3.10 -y
conda activate srnatlas-dev
# Install dev dependencies
pip install -r requirements.txt
pip install pytest pytest-cov black flake8
# Run tests
python -m pytest -v- Follow PEP 8 guidelines
- Use type hints where appropriate
- Write docstrings for functions and classes
- Add tests for new features
- Core analysis pipeline
- Novel miRNA discovery
- isomiR analysis
- Multi-group comparison
- Cluster analysis
- Unit test framework
- Docker containerization
- Performance caching (@st.cache_data)
- QC Scorecard with traffic-light flags
- Multi-mapper counting modes
- miRanda target prediction (animals)
- Provenance tracking (YAML/JSON)
- isomiR differential usage & arm switching
- Multi-sample QC overlays with outlier detection
- Windows standalone installer
- Cloud deployment
- API endpoints
- Batch job scheduler
If you use sRNAtlas in your research, please cite:
@software{sRNAtlas,
author = {Ayman Osman},
title = {sRNAtlas: A Comprehensive Platform for Small RNA-seq Analysis},
year = {2026},
url = {https://github.com/osman12345/sRNAtlas},
version = {v0.1.0}
}This project is licensed under the GNU Affero General Public License v3.0 or later (AGPL-3.0-or-later).
See the LICENSE file for details, or read the full license at: https://www.gnu.org/licenses/agpl-3.0.html
- Streamlit - Web application framework
- Bowtie - Short read aligner
- miRBase - miRNA database
- RNAcentral - ncRNA database
- pyDESeq2 - Differential expression
Built with ❤️ for the small RNA research community