hyper-vem

Simultaneous clustering of both nodes and group interactions in higher-order networks using a Variational Expectation-Maximization algorithm.

This repository implements a scikit-learn compatible Variational Expectation-Maximization (VEM) algorithm for co-clustering higher-order networks (hypergraphs), where interactions involve groups rather than pairs. It jointly clusters nodes and interactions, and is demonstrated on roll-call voting data from the 119th U.S. House of Representatives

Figure 1: Raw (left) and co-clustered (right) voting matrix of the 119th U.S. House of Representatives.

Key Features

Standard scikit-learn estimator API compatibility, allowing integration with tools like GridSearchCV for parallelized model selection.
Vectorized VE and M-steps using NumPy matrix operations.
Support for hypergraphs with missing values (NaN) as well as sparse matrices.
Custom plotting functions for visualizing the inferred cluster structure.

Note: The underlying model and VEM update rules were derived as part of the MSc thesis Co-clustering on Binary Hypergraphs.

Installation

The project was developed and tested on Python 3.11.

Clone the repository:

git clone https://github.com/i-Liolios/hyper-vem.git
cd hyper-vem

Install the packages:

# standard installation
pip install .

# Optional: Install with testing and KModes initialization support
pip install .[kmodes,test]

Requirements:

matplotlib>=3.6.2
numpy>=1.23.5
pandas>=2.3.3
scikit-learn>=1.8.0
scipy>=1.17.0
seaborn>=0.13.2

Quick start

from hypervem import HypergraphCoClustering

# Fit model on an existing incidence matrix X
model = HypergraphCoClustering(n_node_clusters=2, n_hyperedge_clusters=2, n_init=3, random_state=42)
model.fit(X)

# Access results
model.node_clusters_
model.hyperedge_clusters_
model.connection_probabilities_

Overview: Why Hypergraphs?

Standard networks model only pairwise interactions. However, real-world interactions often involve groups (hyperedges):

Shopping baskets: multiple items bought together
Research papers: multiple co-authors on a paper
Legislative votes: multiple legislators voting on a bill

A hypergraph preserves these interactions using a binary incidence matrix X (shape: n_hyperedges × n_nodes):

	Node 1	Node 2	Node 3	Description
Hyperedge 1	1	1	0	Pairwise interaction between Node 1 & 2
Hyperedge 2	1	1	1	Higher-order interaction between all nodes

X[j, i] = 1 if node i participates in hyperedge j, else 0
A hyperedge is simply a set of nodes participating in the same interaction

Model

The model is defined by a two-step mixture model:

Each Node belongs to a latent cluster $$k$$ with probability $\gamma_k$, $$k \in \{1, ..., K\}$$ (e.g., "Rock Music Fans").
Each Hyperedge belongs to a latent cluster $$g$$ with probability $\delta_g$, $$g \in \{1, ..., G\}$$ (e.g., "Rock Concerts", "Symphonies").
Each node decides whether to join each hyperedge by performing a Bernoulli trial with parameter $\theta_{k,g}$.

Example: A "Rock music fan" has a high probability of joining a "Rock Concert" hyperedge, but a low probability of joining a "Symphony" hyperedge.

The number of node/hyperedge clusters $G,K$ are user-specified hyperparameters. They can be inferred through model selection.

Variational Expectation-Maximization

We iteratively maximize a lower bound (ELBO) of the log-likelihood using coordinate ascent:

VE-step: Update soft cluster assignments
M-step: Update model parameters (weights and connection probabilities)
Repeat until convergence

Both steps have closed-form updates available in math.md.

Input:

Incidence Matrix X, shape (M, N)

Hyperparameters:
- K = number of node clusters (n_node_clusters)
- G = number of hyperedge clusters (n_hyperedge_clusters)

Output:

Attribute	Shape	Description
`node_responsibilities_`	$(N, K)$	Soft cluster assignments for nodes.
`hyperedge_responsibilities_`	$(M, G)$	Soft cluster assignments for hyperedges.
`node_clusters_`	$(N,)$	Hard cluster assignments for nodes (argmax of responsibilities).
`hyperedge_clusters_`	$(M,)$	Hard cluster assignments for hyperedges (argmax of responsibilities).
`node_weights_`	$(K,)$	Estimated mixing proportions of node clusters.
`hyperedge_weights_`	$(G,)$	Estimated mixing proportions of hyperedge clusters.
`connection_probabilities_`	$(K, G)$	Probability matrix of participation between node clusters and hyperedge clusters.

Visual diagram of the VEM:

%%{init: {'flowchart': {'curve': 'stepwise'}}}%%
graph LR
    A[Initialize parameters] --> B(VE-Step: <br/> Update soft cluster assignments)
    B --> C(M-Step: <br/> Update parameters)
    C -- Convergence not met --> B
    C -- Converged --> E[Inferred Clusters & Parameters]

    style A fill:#ffffff,color:#000000,stroke:#333,stroke-width:2px
    style E fill:#ffffff,color:#000000,stroke:#333,stroke-width:2px

Complexity:

Dense incidence matrix X:
O(M · N · K · G)
Sparse incidence matrix X:
O(nnz(X) · K · G), where nnz(X) is the number of non-zero entries

Model selection

Choose the number of node clusters K and hyperedge clusters G using the Integrated Completed Likelihood (ICL).
GridSearchCV is used only to parallelize the search, not for cross-validation, as the ICL is an information criterion.

from hypervem import HypergraphCoClustering
from sklearn.model_selection import GridSearchCV

param_grid = {'n_node_clusters': range(1,5),
              'n_hyperedge_clusters': range(1,5)}

# Use a single “train=test” split for ICL (no CV)
grid = GridSearchCV(HypergraphCoClustering(random_state=42),
                    param_grid=param_grid,
                    cv=[(range(X.shape[0]), range(X.shape[0]))])
grid.fit(X)

best_model = grid.best_estimator_

Project Structure

The project is organized as a Python package (hypervem) with separate directories for tests, notebooks, and executable scripts.

hypervem_project/
├── hypervem/                  # Source code 
├── examples/                  # Scripts to produce synthetic experiments
├── notebooks/                 # Interactive analysis 
├── tests/                     # Unit tests
└── setup.py                   # Installation configuration

Testing

This repository includes a suite of unit tests located in the tests/ directory.

To run the tests, install pytest and execute:

pip install pytest
pytest tests/

Applications

Some potential application areas of the model:

E-commerce: Identify consumer profiles from simultaneous product bundle purchases.
Governance: Uncover legislative voting blocs and bill categories from co-sponsorship data (see example).

Example: Congressional Voting Analysis

The model was applied to roll-call voting data from the 119th U.S. House of Representatives:

Nodes: 439 representatives
Hyperedges: 361 bills

A node joining a hyperedge corresponds to a “Yes” vote.

The model jointly identifies blocs of legislators and categories of legislation at multiple resolutions.

Reproducible notebook: /notebooks/voting_analysis.ipynb

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
examples		examples
hypervem		hypervem
notebooks		notebooks
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
coclust.png		coclust.png
math.md		math.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hyper-vem

Key Features

Table of Contents

Installation

Quick start

Overview: Why Hypergraphs?

Model

Variational Expectation-Maximization

Model selection

Project Structure

Testing

Applications

Example: Congressional Voting Analysis

License

About

Uh oh!

Releases

Packages

Languages

License

i-Liolios/hyper-vem

Folders and files

Latest commit

History

Repository files navigation

hyper-vem

Key Features

Table of Contents

Installation

Quick start

Overview: Why Hypergraphs?

Model

Variational Expectation-Maximization

Model selection

Project Structure

Testing

Applications

Example: Congressional Voting Analysis

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages