Skip to content

yash045/Project-2-DSA-group-41-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bloom Filter vs. Cuckoo Filter (C++20)

IMPORTANT WINDOWS SETUP & TROUBLESHOOTING

  • USE “DEVELOPER POWERSHELL FOR VS 2022” SO CL IS AVAILABLE.
  • INSTALL VISUAL STUDIO COMMUNITY WITH “DESKTOP DEVELOPMENT WITH C++”.
  • IF CL IS NOT FOUND, YOU ARE NOT IN THE DEVELOPER SHELL.
  • IF /STD:C++20 IS REJECTED, UPDATE MSVC OR USE /STD:C++LATEST.
  • IF CMAKE IS NOT FOUND, INSTALL CMAKE:
  • IF WINGET IS NOT FOUND, DOWNLOAD INSTALLERS DIRECTLY FROM THE LINKS ABOVE.
  • PYTHON PLOTTING REQUIRES MATPLOTLIB. IF YOU SEE MODULE NOT FOUND: MATPLOTLIB, RUN:
    • python -m pip install --upgrade pip
    • python -m pip install matplotlib
  • IF PLOTTING FAILS WITH “FILE NOT FOUND results/menu-run.csv”, RUN MENU OPTION [3] → “QUICK SWEEP” FIRST, THEN [4].

Build

  • Windows (PowerShell):
    • cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
    • cmake --build build --config Release
  • Linux/macOS:
    • cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
    • cmake --build build -j

Run

  • Example:
    • ./build/Release/benchmark.exe --ds both --n_keys 1000000 --present_ratio 0.5 --bloom_p 0.01 --cf_fbits 12 --out results/run.csv

Parameters

  • --ds: bloom|cuckoo|both
  • --n_keys: number of unique present keys (100k–1M+)
  • --n_queries: queries total (default = n_keys)
  • --present_ratio: fraction of queries that are present (0..1)
  • Bloom: --bloom_p target FP rate
  • Cuckoo: --cf_fbits, --cf_bucket, --cf_load, --cf_max_kicks
  • --seed: RNG seed
  • --out: CSV path

Output CSV Columns

structure,n_items,n_inserted,n_queries,present_ratio,bloom_p,cf_fbits,cf_bucket,bucket_load,cf_max_kicks,seed, insert_ops_per_sec,query_ops_per_sec,insert_ns_p50,insert_ns_p95,insert_ns_p99,query_ns_p50,query_ns_p95,query_ns_p99, false_positive_rate,mem_bytes,mem_per_item_bits,cuckoo_insert_failures

Notes

  • Implementations are from scratch: custom hashing (SplitMix64), Bloom with double hashing (Kirsch–Mitzenmacher), and Cuckoo with fingerprints, two buckets, bounded kicks, and delete support.
  • Data set is synthetic 64-bit integers (>=100,000 unique keys). Queries mix present/absent keys.
  • Memory per item is reported as bytes*8 / n_items. For Cuckoo, storage uses 16-bit fingerprints for simplicity; adjust if you want tighter packing.

About

This is our repo for project 2, Bloom Filter vs Cuckoo FIlter.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors