Skip to content

Commit f4a44d1

Browse files
committed
initial WIP of documentation
1 parent 64e20f9 commit f4a44d1

File tree

2 files changed

+117
-52
lines changed

2 files changed

+117
-52
lines changed

README.md

Lines changed: 7 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -10,25 +10,18 @@ ________________________________________________________________________________
1010
_________________________________________________________________________________________________________
1111

1212
## Table Of Content
13-
14-
1. [Motivation](#Motivatiion)
13+
1. [Documentation](#Documentation)
1514
2. [Installation](#Installation)
1615
3. [Features](#Features)
17-
4. [Benchmarks](#Benchmarks)
18-
5. [Pending Features](#Pending-Features)
19-
6. [How To Use](#How-To-Use)
20-
7. [Release History](#Release-History)
21-
8. [How To Contribute](#How-To-Contribute)
22-
9. [Credits](#Credits)
23-
10. [License](#License)
16+
4. [License](#License)
2417

2518
_________________________________________________________________________________________________________
2619

27-
### Motivation
28-
It's a funny story actually led to the development of this package.
29-
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after into a heated discussion on the Julia Discourse forums after I asked for Julia optimizaition tips. Long story short, Julia community is an amazing one! Andrey Oskin offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world.
20+
### Documentation
21+
- Stable Documentation: [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://PyDataBlog.github.io/ParallelKMeans.jl/stable)
22+
23+
- Experimental Documentation: [![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://PyDataBlog.github.io/ParallelKMeans.jl/dev)
3024

31-
Say hello to our baby, `ParallelKMeans`!
3225
_________________________________________________________________________________________________________
3326

3427
### Installation
@@ -42,7 +35,7 @@ pkg> add TextAnalysis
4235
For the few (and selected) brave ones, one can simply grab the current experimental features by simply adding the experimental branch to your development environment after invoking the package manager with `]`:
4336

4437
```julia
45-
dev git@github.com:PyDataBlog/ParallelKMeans.jl.git
38+
pkg> dev git@github.com:PyDataBlog/ParallelKMeans.jl.git
4639
```
4740

4841
Don't forget to checkout the experimental branch and you are good to go with bleeding edge features and breaks!
@@ -60,43 +53,6 @@ ________________________________________________________________________________
6053

6154
_________________________________________________________________________________________________________
6255

63-
### Benchmarks
64-
65-
_________________________________________________________________________________________________________
66-
67-
### Pending Features
68-
- [X] Implementation of Triangle inequality based on [Elkan C. (2003) "Using the Triangle Inequality to Accelerate
69-
K-Means"](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf)
70-
- [ ] Support for DataFrame inputs.
71-
- [ ] Refactoring and finalizaiton of API desgin.
72-
- [ ] GPU support.
73-
- [ ] Even faster Kmeans implementation based on current literature.
74-
- [ ] Optimization of code base.
75-
76-
_________________________________________________________________________________________________________
77-
78-
### How To Use
79-
80-
```Julia
81-
82-
```
83-
84-
_________________________________________________________________________________________________________
85-
86-
### Release History
87-
88-
- 0.1.0 Initial release
89-
90-
_________________________________________________________________________________________________________
91-
92-
### How To Contribue
93-
94-
_________________________________________________________________________________________________________
95-
96-
### Credits
97-
98-
_________________________________________________________________________________________________________
99-
10056
### License
10157

10258
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2FPyDataBlog%2FParallelKMeans.jl.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2FPyDataBlog%2FParallelKMeans.jl?ref=badge_large)

docs/src/index.md

Lines changed: 110 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,125 @@
1-
# ParallelKMeans.jl Documentation
1+
# ParallelKMeans.jl Package
22

33
```@contents
4+
Depth = 4
45
```
56

7+
## Motivation
8+
It's actually a funny story led to the development of this package.
9+
What started off as a personal toy project trying to re-construct the K-Means algorithm in native Julia blew up after into a heated discussion on the Julia Discourse forums after I asked for Julia optimizaition tips. Long story short, Julia community is an amazing one! Andrey Oskin offered his help and together, we decided to push the speed limits of Julia with a parallel implementation of the most famous clustering algorithm. The initial results were mind blowing so we have decided to tidy up the implementation and share with the world as a maintained Julia pacakge.
10+
11+
Say hello to our baby, `ParallelKMeans`!
12+
13+
This package aims to utilize the speed of Julia and parallelization (both CPU & GPU) by offering an extremely fast implementation of the K-Means clustering algorithm with user friendly interface.
14+
15+
16+
## K-Means Algorithm Implementation Notes
17+
Explain main algos and some few lines about the input dimension as well as
18+
619
## Installation
20+
You can grab the latest stable version of this package by simply running in Julia.
21+
Don't forget to Julia's package manager with `]`
722

23+
```julia
24+
pkg> add TextAnalysis
25+
```
26+
27+
For the few (and selected) brave ones, one can simply grab the current experimental features by simply adding the experimental branch to your development environment after invoking the package manager with `]`:
28+
29+
```julia
30+
dev git@github.com:PyDataBlog/ParallelKMeans.jl.git
31+
```
32+
33+
Don't forget to checkout the experimental branch and you are good to go with bleeding edge features and breaks!
34+
```bash
35+
git checkout experimental
36+
```
837

938
## Features
39+
- Lightening fast implementation of Kmeans clustering algorithm even on a single thread in native Julia.
40+
- Support for multi-theading implementation of Kmeans clustering algorithm.
41+
- Kmeans++ initialization for faster and better convergence.
42+
- Modified version of Elkan's Triangle inequality to speed up K-Means algorithm.
43+
44+
45+
## Pending Features
46+
- [X] Implementation of Triangle inequality based on [Elkan C. (2003) "Using the Triangle Inequality to Accelerate
47+
K-Means"](https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf)
48+
- [ ] Support for DataFrame inputs.
49+
- [ ] Refactoring and finalizaiton of API desgin.
50+
- [ ] GPU support.
51+
- [ ] Even faster Kmeans implementation based on current literature.
52+
- [ ] Optimization of code base.
1053

1154

1255
## How To Use
56+
Taking advantage of Julia's brilliant multiple dispatch system, the package exposes users to a very easy to use API.
57+
58+
```julia
59+
using ParallelKMeans
60+
61+
# Use only 1 core of CPU
62+
results = kmeans(X, 3, ParallelKMeans.SingleThread(), tol=1e-6, max_iters=300)
63+
64+
# Use all available CPU cores
65+
multi_results = kmeans(X, 3, ParallelKMeans.MultiThread(), tol=1e-6, max_iters=300)
66+
```
67+
68+
### Practical Usage Examples
69+
Some of the common usage examples of this package are as follows:
70+
71+
#### Clustering With A Desired Number Of Groups
72+
73+
```julia
74+
using ParallelKMeans, RDatasets, Plots
75+
76+
# load the data
77+
iris = dataset("datasets", "iris");
78+
79+
# features to use for clustering
80+
features = collect(Matrix(iris[:, 1:4])');
81+
82+
result = kmeans(features, 3, ParallelKMeans.MultiThread());
83+
84+
# plot with the point color mapped to the assigned cluster index
85+
scatter(iris.PetalLength, iris.PetalWidth, marker_z=result.assignments,
86+
color=:lightrainbow, legend=false)
87+
88+
# TODO: Add scatter plot image
89+
```
90+
91+
#### Elbow Method For The Selection Of optimal number of clusters
92+
```julia
93+
using ParallelKMeans
94+
95+
# Single Thread Implementation of Lloyd's Algorithm
96+
b = [ParallelKMeans.kmeans(X, i, ParallelKMeans.SingleThread(),
97+
tol=1e-6, max_iters=300, verbose=false).totalcost for i = 2:10]
98+
99+
# Multi Thread Implementation of Lloyd's Algorithm
100+
c = [ParallelKMeans.kmeans(X, i, ParallelKMeans.MultiThread(),
101+
tol=1e-6, max_iters=300, verbose=false).totalcost for i = 2:10]
102+
103+
# Multi Thread Implementation plus a modified version of Elkan's triangiulity of inequaltiy
104+
# to boost speed
105+
d = [ParallelKMeans.kmeans(ParallelKMeans.LightElkan(), X, i, ParallelKMeans.MultiThread(),
106+
tol=1e-6, max_iters=300, verbose=false).totalcost for i = 2:10]
107+
108+
# Single Thread Implementation plus a modified version of Elkan's triangiulity of inequaltiy
109+
# to boost speed
110+
e = [ParallelKMeans.kmeans(ParallelKMeans.LightElkan(), X, i, ParallelKMeans.SingleThread(),
111+
tol=1e-6, max_iters=300, verbose=false).totalcost for i = 2:10]
112+
```
113+
114+
115+
## Benchmarks
116+
117+
118+
## Release History
119+
- 0.1.0 Initial release
120+
13121

122+
## Contributing
14123

15124

16125
```@index

0 commit comments

Comments
 (0)