Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ on:
push:
tags:
- "v*.*.*"
release:
types: [published]

jobs:
build-and-publish:
Expand All @@ -12,6 +14,8 @@ jobs:
steps:
- name: Check out code
uses: actions/checkout@v3
with:
ref: ${{ github.event_name == 'release' && github.event.release.tag_name || github.ref }}

- name: Set up Python
uses: actions/setup-python@v4
Expand All @@ -32,7 +36,7 @@ jobs:
TWINE_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
run: |
twine upload \
--repository-url https://upload.pypi.github.io/UCD-BDLab/BioNeuralNet \
--repository-url https://api.github.com/orgs/UCD-BDLab/packages/pypi/upload \
dist/*

- name: Publish to PyPI
Expand Down
46 changes: 42 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,47 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
- **Updated Tutorials and Documentation**: New end to end jupiter notebook example.
- **Updated Test**: All test have been updated and new ones have been added.

## [1.0.1] to [1.0.9] - 2025-04-24
## [1.1.0] - 2025-07-12

- **BUG**: A bug related to rdata files missing
- **Updated License**: BioNeuralNet is now distributed under the [Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0)](https://creativecommons.org/licenses/by-nc-nd/4.0/).
### **Added**
- **New Embedding Integration Utility**
- `_integrate_embeddings(reduced, method="multiply", alpha=2.0, beta=0.5)`:
- Integrates reduced embeddings with raw omics features via a multiplicative scheme:
- `enhanced = beta * raw + (1 - beta) * (alpha * normalized_weight * raw)`
- (default ensures ≥ 50 % of each feature’s final value is influenced by the learned weights).

- **Graph-Generation Algorithms**
- `gen_similarity_graph`: k-NN Cosine / Gaussian RBF similarity graph
- `gen_correlation_graph`: Pearson / Spearman co-expression graph
- `gen_threshold_graph`: soft-threshold (WGCNA-style) correlation graph
- `gen_gaussian_knn_graph`: Gaussian kernel k-NN graph
- `gen_mutual_info_graph`: mutual-information graph

- **Preprocessing Utilities**
- Clinical data pipeline `preprocess_clinical`
- Inf/NaN cleaning: `clean_inf_nan`
- Variance selection: `select_top_k_variance`
- Correlation selection (supervised / unsupervised): `select_top_k_correlation`
- RandomForest importance: `select_top_randomforest`
- ANOVA F-test selection: `top_anova_f_features`
- Network-pruning helpers:
- `prune_network`, `prune_network_by_quantile`,
- `network_remove_low_variance`, `network_remove_high_zero_fraction`

- **Continuous-Deployment Workflow**
Added `.github/workflows/publish.yml` to auto-publish releases to PyPI when a Git tag is pushed.

- **Updated Homepage Image**
Replaced the index-page illustration to depict the full BioNeuralNet workflow.

- **New release**: A new release will include documentation for the other updates. (1.1.0)
### **Changed**
- **Comprehensive Documentation Update**
- Rebuilt ReadTheDocs site with a new workflow diagram on the landing page.
- Synced API reference to include all new graph-generation, preprocessing, and embedding-integration functions.
- Added quick-start guide, expanded tutorials, and refreshed examples/notebooks.
- Updated narrative docs, docstrings, and licencing info for consistency.

- **License**: Project is now distributed under the [Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0)](https://creativecommons.org/licenses/by-nc-nd/4.0/).

### **Fixed**
- **Packaging Bug**: Missing `.csv` datasets and `.R` scripts in source distribution; `MANIFEST.in` updated to include all requisite data files.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
[![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue.svg)](https://bioneuralnet.readthedocs.io/en/latest/)


## Welcome to BioNeuralNet 1.0.9
## Welcome to BioNeuralNet 1.1.0

![BioNeuralNet Logo](assets/LOGO_WB.png)

Expand Down
2 changes: 1 addition & 1 deletion bioneuralnet/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
- `datasets`: Contains example (synthetic) datasets for testing and demonstration purposes.
"""

__version__ = "1.0.9"
__version__ = "1.1.0"

from .network_embedding import GNNEmbedding
from .downstream_task import SubjectRepresentation
Expand Down
2 changes: 1 addition & 1 deletion docs/jupyter_execute/Quick_Start.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -913,7 +913,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"BioNeuralNet version: 1.0.9\n"
"BioNeuralNet version: 1.1.0\n"
]
}
],
Expand Down
21 changes: 0 additions & 21 deletions docs/jupyter_execute/TCGA-BRCA_Dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -60,27 +60,6 @@
"- [Direct Download BRCA](http://firebrowse.org/?cohort=BRCA&download_dialog=true)\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "60a6b53c",
"metadata": {},
"outputs": [],
"source": [
"# adjusting global pandas options for better display on web documentation\n",
"import pandas as pd\n",
"import warnings\n",
"import logging\n",
"\n",
"pd.set_option(\"display.max_columns\", 5)\n",
"pd.set_option(\"display.expand_frame_repr\", False)\n",
"warnings.filterwarnings(\"ignore\", category=UserWarning)\n",
"warnings.filterwarnings(\"ignore\", category=DeprecationWarning)\n",
"logging.getLogger(\"ray\").setLevel(logging.ERROR)\n",
"logging.getLogger(\"ray.tune\").setLevel(logging.ERROR)\n",
"logging.getLogger(\"torch_geometric\").setLevel(logging.ERROR)"
]
},
{
"cell_type": "markdown",
"id": "c9698b74",
Expand Down
2 changes: 1 addition & 1 deletion docs/source/Quick_Start.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -913,7 +913,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"BioNeuralNet version: 1.0.9\n"
"BioNeuralNet version: 1.1.0\n"
]
}
],
Expand Down
1 change: 1 addition & 0 deletions docs/source/_autosummary/bioneuralnet.utils.graph.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ bioneuralnet.utils.graph
gen_similarity_graph
gen_snn_graph
gen_threshold_graph
get_logger

.. rubric:: Classes

Expand Down
2 changes: 0 additions & 2 deletions docs/source/_autosummary/bioneuralnet.utils.preprocess.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ bioneuralnet.utils.preprocess
multipletests
network_remove_high_zero_fraction
network_remove_low_variance
overload
preprocess_clinical
prune_network
prune_network_by_quantile
Expand All @@ -28,7 +27,6 @@ bioneuralnet.utils.preprocess

.. autosummary::

OrdinalEncoder
RandomForestClassifier
RandomForestRegressor
RobustScaler
Expand Down
Binary file modified docs/source/_static/BioNeuralNet.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/_static/BioNeuralNet_old1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
183 changes: 60 additions & 123 deletions docs/source/clustering.rst
Original file line number Diff line number Diff line change
@@ -1,158 +1,95 @@
Correlated Clustering
=====================

BioNeuralNet includes internal modules for performing **correlated clustering** on complex networks.
These methods extend traditional community detection by integrating **phenotype correlation**, allowing users to extract **biologically relevant, phenotype-associated modules** from any network.
BioNeuralNet provides **correlated clustering methods** designed specifically to identify biologically relevant communities within multi-omics networks. By integrating **phenotype correlations**, these approaches enhance traditional community detection methods, capturing biologically meaningful network modules strongly associated with clinical or phenotypic outcomes.

Overview
--------
Key Features
------------
- **Phenotype-Aware Clustering**: Incorporates external phenotype information directly into clustering algorithms, resulting in communities that are both structurally cohesive and biologically meaningful.
- **Flexible Application**: Methods are applicable to any network data represented as adjacency matrices, facilitating diverse research scenarios including biomarker discovery and functional module identification.
- **Integration with Downstream Analysis**: Clusters obtained can directly feed into downstream tasks such as disease prediction, feature selection, and biomarker identification.

Our framework supports three key **correlated clustering** approaches:
Supported Clustering Methods
----------------------------

- **Correlated PageRank**:
Correlated PageRank
-------------------
A variant of PageRank that biases node rankings toward phenotype-relevant nodes, prioritizing features with strong phenotype associations:

- A **modified PageRank algorithm** that prioritizes nodes based on their correlation with an external phenotype.

- The **personalization vector** is computed using phenotype correlation, ensuring that **biologically significant nodes receive more influence**.

- This method is ideal for **identifying high-impact nodes** within a given network.
.. math::

- **Correlated Louvain**:
\mathbf{r} = \alpha \cdot \mathbf{M} \mathbf{r} + (1 - \alpha) \mathbf{p}

- An adaptation of the **Louvain community detection algorithm**, modified to optimize for **both network modularity and phenotype correlation**.
- The objective function for community detection is given by:
- :math:`\mathbf{M}`: Normalized adjacency (transition probability matrix).
- :math:`\mathbf{p}`: Phenotype-informed personalization vector (based on correlation).
- Ideal for ranking biologically impactful nodes.

.. math::
Correlated Louvain
------------------
Modifies Louvain community detection to balance structural modularity and phenotype correlation, optimizing:

Q^* = k_L \cdot Q + (1 - k_L) \cdot \overline{\lvert \rho \rvert},
.. math::

where:
Q^* = k_L \cdot Q + (1 - k_L) \cdot \overline{\lvert \rho \rvert}

- :math:`Q` is the standard **Newman-Girvan modularity**, defined as:
- :math:`Q`: Newman-Girvan modularity, measuring network structural cohesiveness.
- :math:`\overline{\lvert \rho \rvert}`: Mean absolute Pearson correlation between cluster features and phenotype.
- :math:`k_L`: User-defined parameter balancing structure and phenotype relevance.
- Efficient for identifying phenotype-enriched communities.

.. math::
Hybrid Louvain (Iterative Refinement)
-------------------------------------
Combines Correlated Louvain with Correlated PageRank iteratively to refine community assignments:

Q = \frac{1}{2m} \sum_{i,j} \bigl(A_{ij} - \frac{k_i k_j}{2m} \bigr) \delta(c_i, c_j),

where :math:`A_{ij}` represents the adjacency matrix, :math:`k_i` and :math:`k_j` are node degrees, and :math:`\delta(c_i, c_j)` indicates whether nodes belong to the same community.
- :math:`\overline{\lvert \rho \rvert}` is the **mean absolute Pearson correlation** between the **first principal component (PC1) of the subgraph's features** and the phenotype.
- :math:`k_L` is a user-defined weight (e.g., :math:`k_L = 0.2`), balancing **network modularity and phenotype correlation**.

- This method **detects communities** that are both **structurally cohesive and strongly associated with phenotype**.

- **Hybrid Louvain**:

- A **refinement approach** that combines **Correlated Louvain** and **Correlated PageRank** in an iterative process.

- The key steps are:

1. **Initial Community Detection**:

- The **input network (adjacency matrix)** is clustered using **Correlated Louvain**.
- This identifies **initial phenotype-associated modules**.

2. **Iterative Refinement with Correlated PageRank**:

- In each iteration:

- The **most correlated module** is **expanded** based on Correlated PageRank.
- The refined network is **re-clustered using Correlated Louvain**.
- This process continues **until convergence**.

3. **Final Cluster Extraction**:

- The final **phenotype-optimized modules** are extracted and returned.
- The quality of the clustering is measured using **both modularity and phenotype correlation metrics**.
1. Initial clustering using Correlated Louvain identifies phenotype-associated modules.
2. Clusters iteratively refined by expanding highly correlated modules using Correlated PageRank.
3. Repeated until convergence, producing optimized phenotype-associated communities.

.. figure:: _static/hybrid_clustering.png
:align: center
:alt: Overview hybrid clustering workflow

**Hybrid Clustering**: Precedure and steps for the hybrid clustering method.
:alt: Hybrid Clustering Workflow

Workflow: Hybrid Louvain iteratively integrates Correlated PageRank and Correlated Louvain to produce refined phenotype-associated clusters.

Mathematical Approach
Comparison of Methods
---------------------

**Correlated PageRank:**

- Correlated PageRank extends the traditional PageRank formulation by **biasing the random walk towards phenotype-associated nodes**.

- The **ranking function** is defined as:

.. math::

\mathbf{r} = \alpha \cdot \mathbf{M} \mathbf{r} + (1 - \alpha) \mathbf{p},

where:

- :math:`\mathbf{M}` is the transition probability matrix, derived from the **normalized adjacency matrix**.
- :math:`\mathbf{p}` is the **personalization vector**, computed using **phenotype correlation**.
- :math:`\alpha` is the **teleportation factor** (default: :math:`\alpha = 0.85`).

- Unlike standard PageRank, which assumes a **uniform teleportation distribution**, **Correlated PageRank prioritizes phenotype-relevant nodes**.

Graphical Comparison
--------------------

Below is an illustration of **different clustering approaches** on a sample network:
The figure below illustrates the difference between standard and correlated clustering methods, highlighting BioNeuralNet's ability to extract biologically meaningful modules.

.. figure:: _static/clustercorrelation.png
:align: center
:alt: Comparison of Correlated Clustering Methods

**Figure 2:** Comparison between SmCCNet generated clusters and Correlated Louvain clusters

Integration with BioNeuralNet
------------------------------
:alt: Clustering Method Comparison

Our **correlated clustering methods** seamlessly integrate into **BioNeuralNet** and can be applied to **any network represented as an adjacency matrix**.
Comparison: Standard (SmCCNet) versus Correlated Louvain clusters.

Use cases include:
Applications and Use Cases
--------------------------
BioNeuralNet correlated clustering is versatile and suitable for diverse network analyses:

- **Multi-Omics Networks**: Extracting **biologically relevant subgraphs** from gene expression, proteomics, or metabolomics data.
- **Brain Connectivity Graphs**: Identifying **functional modules associated with neurological disorders**.
- **Social & Disease Networks**: Detecting **community structures in epidemiology and patient networks**.
- **Multi-Omics Networks**: Extract biologically relevant gene/protein modules associated with clinical phenotypes.
- **Neuroimaging Networks**: Identify functional brain modules linked to neurological diseases.
- **Disease Networks**: Reveal patient or epidemiological network communities strongly linked to clinical outcomes.

Our framework supports:
Integration into BioNeuralNet Workflow
--------------------------------------
Clustering outputs seamlessly feed into downstream BioNeuralNet modules:

- **Graph Neural Network Embedding**: Training GNNs on **phenotype-optimized clusters**.

- **Predictive Biomarker Discovery**: Identifying key **features associated with disease outcomes**.

- **Customizable Modularity Optimization**: Allowing users to **adjust the trade-off between structure and phenotype correlation**.
- **GNN Embedding Generation**: Train Graph Neural Networks on phenotype-enriched clusters.
- **Disease Prediction (DPMON)**: Utilize phenotype-associated modules for improved predictive accuracy.
- **Biomarker Discovery**: Extract features or modules strongly predictive of disease status.

Notes for Users
---------------

1. **Input Requirements**:

- Any **graph-based dataset** can be used as input, provided as an **adjacency matrix**.

- Phenotype data should be supplied in **numerical format** (e.g., disease severity scores, expression levels).

2. **Cluster Comparison**:

- **Correlated Louvain extracts phenotype-associated modules.**

- **Hybrid Louvain iteratively refines clusters using Correlated PageRank.**

- Users can compare results using **modularity scores and phenotype correlation metrics**.

3. **Method Selection**:
User Recommendations
--------------------
- **Correlated PageRank**: Best for prioritizing individual high-impact features or nodes.
- **Correlated Louvain**: Ideal for extracting phenotype-associated functional communities efficiently.
- **Hybrid Louvain**: Recommended for maximal biological interpretability, particularly in complex multi-omics scenarios.

- **Correlated PageRank** is ideal for **ranking high-impact nodes in a phenotype-aware manner**.

- **Correlated Louvain** is best for **detecting phenotype-associated communities**.

- **Hybrid Louvain** provides the most refined, **biologically meaningful clusters**.
Reference and Further Reading
-----------------------------
For detailed methodology and benchmarking, refer to our publication:

Conclusion
----------
- Abdel-Hafiz et al., Frontiers in Big Data, 2022. [1]_

The **correlated clustering methods** implemented in BioNeuralNet provide a **powerful, flexible framework** for extracting **highly structured, phenotype-associated modules** from any network.
By integrating **phenotype correlation directly into the clustering process**, these methods enable **more biologically relevant and disease-informative network analysis**.
Return to :doc:`../index`

paper link: https://doi.org/10.3389/fdata.2022.894632
.. [1] Abdel-Hafiz, M., Najafi, M., et al. "Significant Subgraph Detection in Multi-omics Networks for Disease Pathway Identification." *Frontiers in Big Data*, 5 (2022). DOI: `10.3389/fdata.2022.894632 <https://doi.org/10.3389/fdata.2022.894632>`_.

Return to :doc:`../index`
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
try:
release = metadata.version("bioneuralnet")
except metadata.PackageNotFoundError:
release = "1.0.9"
release = "1.1.0"

project = "BioNeuralNet"
version = release
Expand Down
Loading
Loading