You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/index.md
+28-8Lines changed: 28 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,11 @@ Depth = 4
6
6
7
7
## Motivation
8
8
It's actually a funny story led to the development of this package.
9
-
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after into a heated discussion on the Julia Discourse forums after I asked for Julia optimizaition tips. Long story short, Julia community is an amazing one! Andrey Oskin offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
9
+
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after a heated discussion on the Julia Discourse forum when I asked for Julia optimizaition tips. Long story short, Julia community is an amazing one! Andrey offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
10
10
11
-
Say hello to our baby, `ParallelKMeans`!
11
+
Say hello to `ParallelKMeans`!
12
12
13
-
This package aims to utilize the speed of Julia and parallelization (both CPU & GPU) by offering an extremely fast implementation of the K-Means clustering algorithm with user friendly interface.
13
+
This package aims to utilize the speed of Julia and parallelization (both CPU & GPU) by offering an extremely fast implementation of the K-Means clustering algorithm with a friendly interface.
14
14
15
15
16
16
## K-Means Algorithm Implementation Notes
@@ -24,8 +24,9 @@ This implementation inherits this problem like every implementation does.
24
24
As a result, it is useful in practice to restart it several times to get the correct results.
25
25
26
26
## Installation
27
-
You can grab the latest stable version of this package by simply running in Julia.
28
-
Don't forget to Julia's package manager with `]`
27
+
You can grab the latest stable version of this package from Julia registries by simply running;
28
+
29
+
*NB:* Don't forget to Julia's package manager with `]`
29
30
30
31
```julia
31
32
pkg> add ParallelKMeans
@@ -50,7 +51,7 @@ git checkout experimental
50
51
51
52
52
53
## Pending Features
53
-
-[X] Implementation of Triangle inequality based on [Elkan C. (2003) "Using the Triangle Inequality to Accelerate
54
+
-[ ] Full Implementation of Triangle inequality based on [Elkan C. (2003) "Using the Triangle Inequality to Accelerate
-[ ] Even faster Kmeans implementation based on current literature.
60
61
-[ ] Optimization of code base.
61
62
-[ ] Improved Documentation
62
-
-[ ] More benchmark test beyond `Scikit-Learn` and `Clustering.jl`
63
+
-[ ] More benchmark tests
63
64
64
65
65
66
## How To Use
@@ -68,7 +69,7 @@ Taking advantage of Julia's brilliant multiple dispatch system, the package expo
68
69
```julia
69
70
using ParallelKMeans
70
71
71
-
#Use all available CPU cores
72
+
#Uses all available CPU cores by default
72
73
multi_results =kmeans(X, 3; max_iters=300)
73
74
74
75
# Use only 1 core of CPU
@@ -124,14 +125,33 @@ e = [ParallelKMeans.kmeans(LightElkan(), X, i;
124
125
125
126
126
127
## Benchmarks
128
+
Currently, this package is benchmarked against similar implementation in both Python and Julia. All reproducible benchmarks can be found in [ParallelKMeans/extras](https://github.com/PyDataBlog/ParallelKMeans.jl/tree/master/extras) directory. More tests in various languages are planned beyond the initial release version (`0.1.0`).
129
+
130
+
*Note*: All benchmark tests are made on the same computer to help eliminate any bias.
131
+
132
+
133
+
Currently, the benchmark speed tests are based on the search for optimal number of clusters using the [Elbow Method](https://en.wikipedia.org/wiki/Elbow_method_(clustering)) since this is a practical use case for most practioners employing the K-Means algorithm.
134
+
135
+
136
+
137
+
| Package | Language | Input Data | Execution Time |
Ultimately, we see this package as potentially the one stop shop for everything related to KMeans algorithm and its speed up variants. We are open to new implementations and ideas from anyone interested in this project.
151
+
152
+
Detailed contribution guidelines will be added in upcoming releases.
0 commit comments