[Question] regression test for error rate

``` python
from pybloom_live import BloomFilter, ScalableBloomFilter

def test_bf():
    error_rate = 0.01
    capacity = 5000
    bf = BloomFilter(capacity, 0.01)
    items = list(range(capacity))
    for i in items:
        bf.add(i)

    test_items = list(range(bf.num_bits))
    false_positive_count = 0
    for i in test_items:
        if i in bf and i not in items:
            false_positive_count += 1
    p = false_positive_count / len(test_items)
    print(p)
    assert p < error_rate


def test_sbf():
    error_rate = 0.01
    sbf = ScalableBloomFilter(100, error_rate, 2)
    items = list(range(5000))
    for i in items:
        sbf.add(i)
    
    test_items = list(range(sum(f.num_bits for f in sbf.filters)))
    false_positive_count = 0
    for i in test_items:
        if i in sbf and i not in items:
            false_positive_count += 1
    p = false_positive_count / len(test_items)
    print(p)
    assert p < error_rate
```

I try to do regression test for error rate for BloomFilter and ScalableBloomFilter. 

For simple BloomFilter, the error_rate is in expected `<0.01`.

But for ScalableBloomFilter, error_rate always around 0.02 (expected 0.01)

In test, I allocated  `test_items` in the total size of underlying bit arrays, if filter returned True, but item not in original inserted set, I thought it's a false positive item.

Wondering whether my test approach is right?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] regression test for error rate #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Question] regression test for error rate #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions