Bloom filter implementation for BIN-702
> clone the repository
> run the main.py to start the benchmarks
python 3.X
mmh3 lib -> pip install mmh3
> cd src/
> python main.py
Set
mb
peak mb
insert ms
clear ms
find ms
1000
0.000296
0.041276
0.9889
0
0
10000
0.000296
0.655676
1.9998
0
0
100000
0.000324
6.291772
19.9799
1.9973
8.9914
2000000
0.000324
100.663612
457.532
50.9484
203.3379
Bloom
mb
peak mb
insert ms
clear ms
find ms
error prob
hash count
bit count
error count
1000
0.010876
0.011449
15.9934
0
15.9833
0.001
7
9586
1
10000
0.0993
0.099879
179.8307
1.9978
155.8389
0.0021
7
95851
21
100000
1.04424
1.044825
1634.4944
44.9578
1571.1436
0.00154
7
958506
154
2000000
19.83744
19.838031
35077.5659
1017.1684
35077.5659
0.0016655
7
19170117
3331
Bloom
mb
peak mb
insert ms
clear ms
find ms
error prob
hash count
bit count
error count
1000
0.000296
0.041276
11.9879
0
12.9987
0.003
5
9586
3
10000
0.0993
0.099882
170.8237
1.9972
126.8831
0.0037
5
95851
37
100000
1.04424
1.044825
1246.7373
48.9547
1311.182
0.00221
5
958506
221
2000000
19.83744
19.838034
25255.1331
1037.1826
23715.8606
0.002236
5
19170117
4472
Bloom
mb
peak mb
insert ms
clear ms
find ms
error prob
hash count
bit count
error count
1000
0.00398
0.004556
12.9867
0.9983
15.9839
0.108
5
2949
108
10000
0.030796
0.031378
122.8838
0.9991
115.8721
0.1099
5
29492
1099
100000
0.321784
0.322372
1205.3608
11.9547
1442.8068
0.10759
5
294924
10759
2000000
6.10908
6.109674
24957.1912
330.6803
24417.6577
0.1064285
5
5898497
212857
Possible Applications in Bioinformatics
Sequence characterization
Genome assembly
Sequencing error correction
RNA-Seq
Tristan Deschamps