Skip to content

kuslavicek/ballot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ballot: Balanced k-means clustering with optimal transport

Python NumPy Pytest PyPI License Version Maintained zread

Ballot (Balanced Lloyd with Optimal Transport) is a high-performance Python package for balanced clustering. It solves the problem of creating equal-sized clusters (or clusters with specific capacity constraints) by leveraging Optimal Transport theory and Entropic Regularization (Sinkhorn algorithm).

Features

  • Speed: Uses Sinkhorn iterations (E-BalLOT) for near-linear time complexity $O(n \log n)$, making it usable for large datasets ($n > 100,000$).
  • Simplicity: precise, math-driven implementation without complex C++ dependencies.
  • Scikit-learn Compatible: Designed to fit seamlessly into existing ML pipelines.

Installation

Install via pip:

pip install ballot

Usage

import numpy as np
from ballot.estimator import BalancedKMeans

np.random.seed(42)
X = np.random.rand(100, 2)

bkm = BalancedKMeans(n_clusters=2)

labels = bkm.fit_predict(X)

print(f"Cluster 0 count: {np.sum(labels == 0)}")
print(f"Cluster 1 count: {np.sum(labels == 1)}")

Development

To install in editable mode for development:

git clone https://github.com/kuslavicek/ballot.git
cd ballot
pip install -e .

Run tests:

pytest

References

This project incorporates research from the following paper:

  • BalLOT: Balanced k-means clustering with optimal transport Wenyan Luo, Dustin G. Mixon arXiv:2512.05926

About

High-performance Python package for balanced k-means clustering using optimal transport and entropic regularization

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages