Commit 5e93be2
committed
feat: add parallel distance computation and vectorized pipeline
Rewrite the distance computation engine from scratch on top of v0.3.5:
- Vectorized kNN distances using NumPy broadcasting with chunked
processing for memory efficiency and progress bar support
- Add n_jobs parameter for cross-cluster multiprocessing via
concurrent.futures (n_jobs=-1 uses all cores)
- Restructure Numba path with non-generator kernels that support
numba.prange for thread-level parallelism
- Optional scipy.spatial.distance.cdist and scipy.special.erf
acceleration when scipy is available
- Vectorize _standard_distances, _prob_distances, and
_norm_prob_outlier_factor pipeline methods
- Fully backward-compatible: all existing API calls work unchanged
Closes #36
Made-with: Cursor1 parent d0ace6f commit 5e93be2
3 files changed
Lines changed: 343 additions & 40 deletions
0 commit comments