Skip to content

nadine-ramirez/rna-seq-count-model-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNA-Seq Count Model Explorer — Poisson vs Negative Binomial (NB2) + Streamlit Live Demo

An interactive bioinformatics/statistical modeling project that explores RNA-seq count data and compares Poisson vs Negative Binomial (NB2) assumptions. The Streamlit app visualizes overdispersion (mean–variance behavior), fits per-gene GLMs, and exports model comparison results.

Live Demo: LIVE_DEMO_URL_HERE
GitHub Repo: REPO_URL_HERE


Why this matters

RNA-seq data are counts, and Poisson models assume:

Var(Y) = E(Y)

In practice, RNA-seq often shows overdispersion (variance > mean), which can make Poisson a poor fit. Negative Binomial (NB2) models handle this by introducing a dispersion term:

Var(Y) = μ + α μ²

This project makes that distinction easy to see and test interactively.


Features

Streamlit App

  • Three data modes

    • Sample dataset (runs instantly)
    • Simulated RNA-seq dataset (interactive controls)
    • Upload your own CSV in tidy format
  • Simulation controls

    • Number of genes / samples
    • Dispersion α (controls overdispersion)
    • Condition effect strength
    • Random seed
  • Model comparison

    • Gene-wise Mean vs Variance visualization
    • Per-gene intercept-only GLMs:
      • Poisson GLM (baseline)
      • NB2 GLM (gene-wise α estimated via method-of-moments)
    • AIC comparison plot and per-gene results table
  • Export

    • Download per-gene model comparison results as CSV

Expected CSV format (Upload mode)

Your uploaded CSV should be tidy with:

Required columns:

  • gene
  • sample
  • count

Optional column:

  • condition

Example:

gene,sample,condition,count
GENE1,S1,Control,12
GENE1,S2,Control,15
GENE1,S3,Treated,40