Supervised Independent Subspace Principal Component Analysis (sisPCA)

sispca is a Python package designed to learn linear representations capturing variations associated with factors of interest in high-dimensional data. It extends the Principal Component Analysis (PCA) to multiple subspaces and encourage subspace disentanglement by maximizing the Hilbert-Schmidt Independence Criterion (HSIC). The model is implemented in PyTorch and uses the Lightning framework for training. See the documentation for more details.

For more theoretical connections and applications, please refer to our paper Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis.

What's New

v1.1.0 (2025-02-27): Memory-efficient handling of supervision kernel for large datasets.
v1.0.0 (2024-10-11): Initial release.

Installation

Via GitHub (latest version):

pip install git+https://github.com/JiayuSuPKU/sispca.git#egg=sispca

Via PyPI (stable version):

pip install sispca

Getting Started

Basic usage:

import numpy as np
import torch
from sispca import Supervision, SISPCADataset, SISPCA

# simulate random inputs
x = torch.randn(100, 20)
y_cont = torch.randn(100, 5) # continuous target
y_group = np.random.choice(['A', 'B', 'C'], 100) # categorical target

# simulate custom kernel K_y
# in general, K_y should be either sparse, i.e. a graph Laplacian kernel, or low-rank, i.e. K_y = L @ L.T
L = torch.randn(100, 20)
K_y = L @ L.T # (n_sample, n_sample)

# create a dataset with supervision
sdata = SISPCADataset(
    data = x.float(), # (n_sample, n_feature)
    target_supervision_list = [
        Supervision(target_data=y_cont, target_type='continuous'),
        Supervision(target_data=y_group, target_type='categorical'),
        # Supervision(target_data=None, target_type='custom', target_kernel_K = K_y)
        Supervision(target_data=None, target_type='custom', target_kernel_Q = L) # equivalent to the above
    ]
)

# fit the sisPCA model
sispca = SISPCA(
    sdata,
    n_latent_sub=[3, 3, 3, 3], # the last subspace will be unsupervised
    lambda_contrast=10,
    kernel_subspace='linear',
    solver='eig'
)
sispca.fit(batch_size = -1, max_epochs = 100, early_stopping_patience = 5)

Tutorials:

Feature selection using sisPCA on the Breast Cancer Wisconsin dataset.
Learning unsupervised residual subspace in simulation.
Learning interpretable infection subspaces in scRNA-seq data using sisPCA. It takes approximately 1 min (M1 Macbook Air) to fit a single sisPCA-linear model on a scRNA-seq dataset with 20,000 cells and 2,000 genes.

For additional details, please refer to the documentation.

Citation

If you find sisPCA useful in your research, please consider citing our paper:

  @misc{su2024disentangling,
    title={Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis},
    author={Jiayu Su and David A. Knowles and Raul Rabadan},
    year={2024},
    eprint={2410.23595},
    archivePrefix={arXiv},
    primaryClass={stat.ML},
    url={https://arxiv.org/abs/2410.23595},
  }

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
sispca		sispca
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supervised Independent Subspace Principal Component Analysis (sisPCA)

What's New

Installation

Getting Started

Citation

About

Uh oh!

Releases

Packages

Languages

License

jhjiang2020/sispca

Folders and files

Latest commit

History

Repository files navigation

Supervised Independent Subspace Principal Component Analysis (sisPCA)

What's New

Installation

Getting Started

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages