Ballot (Balanced Lloyd with Optimal Transport) is a high-performance Python package for balanced clustering. It solves the problem of creating equal-sized clusters (or clusters with specific capacity constraints) by leveraging Optimal Transport theory and Entropic Regularization (Sinkhorn algorithm).
-
Speed: Uses Sinkhorn iterations (E-BalLOT) for near-linear time complexity
$O(n \log n)$ , making it usable for large datasets ($n > 100,000$ ). - Simplicity: precise, math-driven implementation without complex C++ dependencies.
- Scikit-learn Compatible: Designed to fit seamlessly into existing ML pipelines.
Install via pip:
pip install ballotimport numpy as np
from ballot.estimator import BalancedKMeans
np.random.seed(42)
X = np.random.rand(100, 2)
bkm = BalancedKMeans(n_clusters=2)
labels = bkm.fit_predict(X)
print(f"Cluster 0 count: {np.sum(labels == 0)}")
print(f"Cluster 1 count: {np.sum(labels == 1)}")To install in editable mode for development:
git clone https://github.com/kuslavicek/ballot.git
cd ballot
pip install -e .Run tests:
pytestThis project incorporates research from the following paper:
- BalLOT: Balanced k-means clustering with optimal transport Wenyan Luo, Dustin G. Mixon arXiv:2512.05926