An interactive bioinformatics/statistical modeling project that explores RNA-seq count data and compares Poisson vs Negative Binomial (NB2) assumptions. The Streamlit app visualizes overdispersion (mean–variance behavior), fits per-gene GLMs, and exports model comparison results.
Live Demo: LIVE_DEMO_URL_HERE
GitHub Repo: REPO_URL_HERE
RNA-seq data are counts, and Poisson models assume:
Var(Y) = E(Y)
In practice, RNA-seq often shows overdispersion (variance > mean), which can make Poisson a poor fit. Negative Binomial (NB2) models handle this by introducing a dispersion term:
Var(Y) = μ + α μ²
This project makes that distinction easy to see and test interactively.
-
Three data modes
- Sample dataset (runs instantly)
- Simulated RNA-seq dataset (interactive controls)
- Upload your own CSV in tidy format
-
Simulation controls
- Number of genes / samples
- Dispersion α (controls overdispersion)
- Condition effect strength
- Random seed
-
Model comparison
- Gene-wise Mean vs Variance visualization
- Per-gene intercept-only GLMs:
- Poisson GLM (baseline)
- NB2 GLM (gene-wise α estimated via method-of-moments)
- AIC comparison plot and per-gene results table
-
Export
- Download per-gene model comparison results as CSV
Your uploaded CSV should be tidy with:
Required columns:
genesamplecount
Optional column:
condition
Example:
gene,sample,condition,count
GENE1,S1,Control,12
GENE1,S2,Control,15
GENE1,S3,Treated,40