Skip to content

Supervised Independent Subspace Principal Component Analysis

License

Notifications You must be signed in to change notification settings

jhjiang2020/sispca

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supervised Independent Subspace Principal Component Analysis (sisPCA)

DOI License

Overview

sispca is a Python package designed to learn linear representations capturing variations associated with factors of interest in high-dimensional data. It extends the Principal Component Analysis (PCA) to multiple subspaces and encourage subspace disentanglement by maximizing the Hilbert-Schmidt Independence Criterion (HSIC). The model is implemented in PyTorch and uses the Lightning framework for training. See the documentation for more details.

For more theoretical connections and applications, please refer to our paper Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis.

What's New

  • v1.1.0 (2025-02-27): Memory-efficient handling of supervision kernel for large datasets.
  • v1.0.0 (2024-10-11): Initial release.

Installation

Via GitHub (latest version):

pip install git+https://github.com/JiayuSuPKU/sispca.git#egg=sispca

Via PyPI (stable version):

pip install sispca

Getting Started

Basic usage:

import numpy as np
import torch
from sispca import Supervision, SISPCADataset, SISPCA

# simulate random inputs
x = torch.randn(100, 20)
y_cont = torch.randn(100, 5) # continuous target
y_group = np.random.choice(['A', 'B', 'C'], 100) # categorical target

# simulate custom kernel K_y
# in general, K_y should be either sparse, i.e. a graph Laplacian kernel, or low-rank, i.e. K_y = L @ L.T
L = torch.randn(100, 20)
K_y = L @ L.T # (n_sample, n_sample)

# create a dataset with supervision
sdata = SISPCADataset(
    data = x.float(), # (n_sample, n_feature)
    target_supervision_list = [
        Supervision(target_data=y_cont, target_type='continuous'),
        Supervision(target_data=y_group, target_type='categorical'),
        # Supervision(target_data=None, target_type='custom', target_kernel_K = K_y)
        Supervision(target_data=None, target_type='custom', target_kernel_Q = L) # equivalent to the above
    ]
)

# fit the sisPCA model
sispca = SISPCA(
    sdata,
    n_latent_sub=[3, 3, 3, 3], # the last subspace will be unsupervised
    lambda_contrast=10,
    kernel_subspace='linear',
    solver='eig'
)
sispca.fit(batch_size = -1, max_epochs = 100, early_stopping_patience = 5)

Tutorials:

For additional details, please refer to the documentation.

Citation

If you find sisPCA useful in your research, please consider citing our paper:

  @misc{su2024disentangling,
    title={Disentangling Interpretable Factors with Supervised Independent Subspace Principal Component Analysis},
    author={Jiayu Su and David A. Knowles and Raul Rabadan},
    year={2024},
    eprint={2410.23595},
    archivePrefix={arXiv},
    primaryClass={stat.ML},
    url={https://arxiv.org/abs/2410.23595},
  }

About

Supervised Independent Subspace Principal Component Analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%