Skip to content

eporetsky/MutClust

Repository files navigation

MutClust: Efficient and Scalable Mutual Rank-Based Coexpression Clustering

MutClust is a Python tool for efficient and scalable mutual rank-based gene coexpression analyses. The clustering analysis is conducted using ClusterONE, as described in Wisecaver et al. 2017. MutClust is still under development.


Features

  • Mutual Rank Analysis: Compute mutual rank (MR) from Pearson correlations on your gene expression matrix.
  • ClusterONE Clustering: Identify gene coexpression clusters from filtered/weighted MR networks.
  • Fast: Multi-threaded, sparse matrix operations for speed on large datasets.

Installation

Recommended

Install MutClust:

conda env create -f environment.yml
conda activate mutclust

Alternative

Step 1: Make sure that ClusterONE is available from the command line:

conda install bioconda::clusterone

Step 2a: Install MutClust from PyPI:

pip install mutclust

Step 2b: Or clone the repository from GitHub:

git clone https://github.com/eporetsky/mutclust.git
cd mutclust
pip install .

Usage

1. Calculate Mutual Rank (MR)

mutclust mr -i expr.tsv -o results.mrs.tsv.gz --mr-threshold 100 --threads 4 [--log2]
Argument Short Description Default
--input -i Path to the RNA-seq dataset (.tsv/.tsv.gz) Required
--output -o Output file for mutual rank pairs Required
--mr-threshold -m MR threshold for reporting gene pairs 100
--threads -t Number of CPU threads (correlation) 4
--log2 If set, applies log2(x+1) before calculation OFF by default
  • Input: Genes as rows, samples as columns (TSV, row index 'geneID').
  • Output: Gzipped tab-separated file containing Gene1, Gene2, MR.

2. Cluster Genes (with ClusterONE)

mutclust cls -i results.mrs.tsv.gz -o results.cls.tsv --e_value 10
Argument Short Description Default
--input -i Path to Mutual Rank (MR) pairs (.tsv/.tsv.gz) Required
--output -o Output file for clusters (.tsv) Required
--e_value -e Exponential decay constant for edge weighting 10
  • The tool filters/weights MR pairs and calls ClusterONE for clustering.
  • Output: clusters.tsv, listing clusters with p-value < 0.1. Tab-separated file containing clusterID, geneID, pval.

Example Workflow

mutclust mr -i data/myexpr.tsv -o out.mrs.tsv.gz --mr-threshold 100 --threads 72 --log2
mutclust cls -i out.mrs.tsv.gz -o out.clusters.tsv --e_value 10

Input Format

Expression file:

geneID\tSample1\tSample2\n...
GeneA \t1.1    \t2.2
GeneB \t4.2    \t3.7

Note: MutClust might be limited to linux because of dependency on pynetcor.


Coming Soon

  • Generate cluster gene annotation
  • Calculate cluster GO term enrichment
  • Calculate clusteer eigen-gene data
  • Add a MutClust Dockerfile
  • Add unit testing

License

MIT License. See LICENSE file for details.


Contributing

Suggestions, pull requests, and issues welcome!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages