Skip to content

Commit 6e2eb64

Browse files
authored
Merge pull request #16 from PyDataBlog/fix-benchmark-test
The pointwise implementation working. Needs further testing before it can be merged into the master branch.
2 parents 70a7803 + cda656f commit 6e2eb64

File tree

7 files changed

+358
-133
lines changed

7 files changed

+358
-133
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,5 @@
99
/benchmark/tune.json
1010
.benchmarkci/
1111
.idea/*
12-
.vscode/*
12+
.vscode/*
13+
test/experiments.jl

benchmark/bench01_distance.jl

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,22 @@ using Random
77
suite = BenchmarkGroup()
88

99
Random.seed!(2020)
10-
X = rand(100_000, 3)
11-
centroids = rand(2, 3)
12-
d = rand(100_000, 2)
13-
suite["100kx3"] = @benchmarkable ParallelKMeans.pairwise!($d, $X, $centroids)
10+
X = rand(3, 100_000)
11+
centroids = rand(3, 2)
12+
d = Vector{Float64}(undef, 100_000)
13+
suite["100kx3"] = @benchmarkable ParallelKMeans.colwise!($d, $X, $centroids)
1414

15-
X = rand(100_000, 10)
16-
centroids = rand(2, 10)
17-
d = rand(100_000, 2)
18-
suite["100kx10"] = @benchmarkable ParallelKMeans.pairwise!($d, $X, $centroids)
15+
X = rand(10, 100_000)
16+
centroids = rand(10, 2)
17+
d = Vector{Float64}(undef, 100_000)
18+
suite["100kx10"] = @benchmarkable ParallelKMeans.colwise!($d, $X, $centroids)
1919

2020
# for reference
2121
metric = SqEuclidean()
22-
suite["100kx10_distances"] = @benchmarkable Distances.pairwise!($d, $metric, $X, $centroids, dims = 1)
23-
22+
#suite["100kx10_distances"] = @benchmarkable Distances.colwise!($d, $metric, $X, $centroids)
23+
dist = Distances.pairwise(metric, X, centroids, dims = 2)
24+
min = minimum(dist, dims=2)
25+
suite["100kx10_distances"] = @benchmarkable $d = min
2426
end # module
2527

2628
BenchDistance.suite

benchmark/extras/README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Skoffer comparison between Clustering, SingleThread mode of PKMeans and MultiThreadPKMeans
2+
3+
```julia
4+
versioninfo()
5+
6+
Julia Version 1.3.1
7+
Commit 2d5741174c (2019-12-30 21:36 UTC)
8+
Platform Info:
9+
OS: Linux (x86_64-pc-linux-gnu)
10+
CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
11+
WORD_SIZE: 64
12+
LIBM: libopenlibm
13+
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
14+
Environment:
15+
JULIA_EDITOR = atom -a
16+
JULIA_NUM_THREADS = 4
17+
```
18+
19+
For `X = rand(60, 1_000_000); tol = 1e-6` output of `TimerOutputs`
20+
21+
```
22+
Time Allocations
23+
────────────────────── ───────────────────────
24+
Tot / % measured: 1541s / 85.5% 19.5GiB / 99.4%
25+
26+
Section ncalls time %tot avg alloc %tot avg
27+
───────────────────────────────────────────────────────────────────────────────
28+
Clustering 1 662s 50.2% 662s 18.6GiB 96.1% 18.6GiB
29+
10 clusters 1 92.6s 7.03% 92.6s 2.35GiB 12.1% 2.35GiB
30+
9 clusters 1 89.7s 6.81% 89.7s 2.34GiB 12.1% 2.34GiB
31+
8 clusters 1 87.1s 6.62% 87.1s 2.33GiB 12.0% 2.33GiB
32+
7 clusters 1 85.3s 6.48% 85.3s 2.32GiB 12.0% 2.32GiB
33+
6 clusters 1 80.6s 6.12% 80.6s 2.32GiB 12.0% 2.32GiB
34+
5 clusters 1 78.3s 5.95% 78.3s 2.31GiB 11.9% 2.31GiB
35+
4 clusters 1 76.6s 5.82% 76.6s 2.30GiB 11.9% 2.30GiB
36+
3 clusters 1 50.3s 3.82% 50.3s 1.58GiB 8.16% 1.58GiB
37+
2 clusters 1 20.9s 1.59% 20.9s 732MiB 3.69% 732MiB
38+
PKMeans Singlethread 2 491s 37.3% 245s 208MiB 1.05% 104MiB
39+
9 clusters 1 131s 10.0% 131s 22.9MiB 0.12% 22.9MiB
40+
10 clusters 1 89.5s 6.80% 89.5s 22.9MiB 0.12% 22.9MiB
41+
7 clusters 1 77.3s 5.87% 77.3s 22.9MiB 0.12% 22.9MiB
42+
8 clusters 1 59.4s 4.51% 59.4s 22.9MiB 0.12% 22.9MiB
43+
6 clusters 1 44.1s 3.35% 44.1s 22.9MiB 0.12% 22.9MiB
44+
5 clusters 1 35.1s 2.67% 35.1s 22.9MiB 0.12% 22.9MiB
45+
4 clusters 1 32.9s 2.50% 32.9s 22.9MiB 0.12% 22.9MiB
46+
3 clusters 1 14.6s 1.11% 14.6s 22.9MiB 0.12% 22.9MiB
47+
2 clusters 2 6.52s 0.50% 3.26s 23.3MiB 0.12% 11.7MiB
48+
PKMeans Multithread 1 165s 12.5% 165s 575MiB 2.90% 575MiB
49+
9 clusters 1 37.2s 2.82% 37.2s 40.1MiB 0.20% 40.1MiB
50+
8 clusters 1 33.1s 2.51% 33.1s 23.9MiB 0.12% 23.9MiB
51+
10 clusters 1 25.8s 1.96% 25.8s 24.0MiB 0.12% 24.0MiB
52+
6 clusters 1 20.9s 1.59% 20.9s 23.6MiB 0.12% 23.6MiB
53+
7 clusters 1 16.4s 1.25% 16.4s 23.4MiB 0.12% 23.4MiB
54+
5 clusters 1 13.1s 1.00% 13.1s 23.4MiB 0.12% 23.4MiB
55+
4 clusters 1 9.90s 0.75% 9.90s 23.4MiB 0.12% 23.4MiB
56+
3 clusters 1 4.97s 0.38% 4.97s 370MiB 1.87% 370MiB
57+
2 clusters 1 3.26s 0.25% 3.26s 23.2MiB 0.12% 23.2MiB
58+
───────────────────────────────────────────────────────────────────────────────
59+
```

benchmark/extras/comparisons.jl

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
using Clustering
2+
using ParallelKMeans
3+
using Plots
4+
using BenchmarkTools
5+
using TimerOutputs
6+
using Random
7+
using ProgressMeter
8+
9+
# Create a TimerOutput, this is the main type that keeps track of everything.
10+
const to = TimerOutput()
11+
12+
Random.seed!(2020)
13+
X = rand(60, 1_000_000);
14+
# Timed assingments
15+
global a = Float64[]
16+
global b = Float64[]
17+
global c = Float64[]
18+
19+
p = Progress(9, 10, "Computing clustering...")
20+
@timeit to "Clustering" begin
21+
for i in 2:10
22+
@timeit to "$i clusters" push!(a, Clustering.kmeans(X, i, tol=1e-6, maxiter=300).totalcost)
23+
next!(p)
24+
end
25+
end
26+
27+
p = Progress(9, 10, "Computing singlethreaded ParallelKMeans...")
28+
@timeit to "PKMeans Singlethread" begin
29+
for i in 2:10
30+
@timeit to "$i clusters" push!(b, ParallelKMeans.kmeans(X, i, tol=1e-6, max_iters=300, verbose=false).totalcost)
31+
next!(p)
32+
end
33+
end
34+
35+
p = Progress(9, 10, "Computing multithreaded ParallelKMeans...")
36+
@timeit to "PKMeans Multithread" begin
37+
for i in 2:10
38+
@timeit to "$i clusters" push!(c, ParallelKMeans.kmeans(X, i, ParallelKMeans.MultiThread(), tol=1e-6, max_iters=300, verbose=false).totalcost)
39+
next!(p)
40+
end
41+
end
42+
43+
plot(a, label="Clustering.jl")
44+
plot!(b, label="Single-Thread ParallelKmeans")
45+
plot!(c, label="Multi-Thread ParallelKmeans")
46+
47+
print(to)

0 commit comments

Comments
 (0)