Skip to content

Commit 61209f3

Browse files
committed
major update to prepare 0.1.0
1 parent 14683c5 commit 61209f3

File tree

9 files changed

+498
-423
lines changed

9 files changed

+498
-423
lines changed

Project.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ julia = "1.3"
1313
[extras]
1414
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
1515
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
16+
Suppressor = "fd094767-a336-5f1f-9728-57cf17d0bbfb"
1617

1718
[targets]
18-
test = ["Test", "Random"]
19+
test = ["Test", "Random", "Suppressor"]

docs/src/index.md

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@ Depth = 4
66

77
## Motivation
88
It's actually a funny story led to the development of this package.
9-
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after into a heated discussion on the Julia Discourse forums after I asked for Julia optimizaition tips. Long story short, Julia community is an amazing one! Andrey Oskin offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
9+
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after a heated discussion on the Julia Discourse forum when I asked for Julia optimizaition tips. Long story short, Julia community is an amazing one! Andrey offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
1010

11-
Say hello to our baby, `ParallelKMeans`!
11+
Say hello to `ParallelKMeans`!
1212

13-
This package aims to utilize the speed of Julia and parallelization (both CPU & GPU) by offering an extremely fast implementation of the K-Means clustering algorithm with user friendly interface.
13+
This package aims to utilize the speed of Julia and parallelization (both CPU & GPU) by offering an extremely fast implementation of the K-Means clustering algorithm with a friendly interface.
1414

1515

1616
## K-Means Algorithm Implementation Notes
@@ -24,8 +24,9 @@ This implementation inherits this problem like every implementation does.
2424
As a result, it is useful in practice to restart it several times to get the correct results.
2525

2626
## Installation
27-
You can grab the latest stable version of this package by simply running in Julia.
28-
Don't forget to Julia's package manager with `]`
27+
You can grab the latest stable version of this package from Julia registries by simply running;
28+
29+
*NB:* Don't forget to Julia's package manager with `]`
2930

3031
```julia
3132
pkg> add ParallelKMeans
@@ -50,7 +51,7 @@ git checkout experimental
5051

5152

5253
## Pending Features
53-
- [X] Implementation of Triangle inequality based on [Elkan C. (2003) "Using the Triangle Inequality to Accelerate
54+
- [ ] Full Implementation of Triangle inequality based on [Elkan C. (2003) "Using the Triangle Inequality to Accelerate
5455
K-Means"](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf).
5556
- [ ] Implementation of current k-means acceleration algorithms.
5657
- [ ] Support for DataFrame inputs.
@@ -59,7 +60,7 @@ K-Means"](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf).
5960
- [ ] Even faster Kmeans implementation based on current literature.
6061
- [ ] Optimization of code base.
6162
- [ ] Improved Documentation
62-
- [ ] More benchmark test beyond `Scikit-Learn` and `Clustering.jl`
63+
- [ ] More benchmark tests
6364

6465

6566
## How To Use
@@ -68,7 +69,7 @@ Taking advantage of Julia's brilliant multiple dispatch system, the package expo
6869
```julia
6970
using ParallelKMeans
7071

71-
# Use all available CPU cores
72+
# Uses all available CPU cores by default
7273
multi_results = kmeans(X, 3; max_iters=300)
7374

7475
# Use only 1 core of CPU
@@ -124,14 +125,33 @@ e = [ParallelKMeans.kmeans(LightElkan(), X, i;
124125

125126

126127
## Benchmarks
128+
Currently, this package is benchmarked against similar implementation in both Python and Julia. All reproducible benchmarks can be found in [ParallelKMeans/extras](https://github.com/PyDataBlog/ParallelKMeans.jl/tree/master/extras) directory. More tests in various languages are planned beyond the initial release version (`0.1.0`).
129+
130+
*Note*: All benchmark tests are made on the same computer to help eliminate any bias.
131+
132+
133+
Currently, the benchmark speed tests are based on the search for optimal number of clusters using the [Elbow Method](https://en.wikipedia.org/wiki/Elbow_method_(clustering)) since this is a practical use case for most practioners employing the K-Means algorithm.
134+
135+
136+
137+
| Package | Language | Input Data | Execution Time |
138+
|:-----------------:|:--------:|:---------------------------------:|:--------------:|
139+
| Clustering.jl | Julia | (1 Million examples, 30 features) | |
140+
| ParallelKMeans.jl | Julia | (1 Million examples, 30 features) | |
141+
| Scikit-Learn | Python | (1 Million examples, 30 features) | |
142+
| Knor | R | (1 Million examples, 30 features) | |
127143

128144

129145
## Release History
130146
- 0.1.0 Initial release
131147

132148

133149
## Contributing
150+
Ultimately, we see this package as potentially the one stop shop for everything related to KMeans algorithm and its speed up variants. We are open to new implementations and ideas from anyone interested in this project.
151+
152+
Detailed contribution guidelines will be added in upcoming releases.
134153

154+
<!--- Insert Contribution Guidelines --->
135155

136156
```@index
137157
```

0 commit comments

Comments
 (0)