A comparative study of multiple Approximate Nearest Neighbor (ANN) search algorithms, implementing and benchmarking various graph-based approaches including HNSW, FlatNav, and ONNG.
This project implements 30+ variants of ANN algorithms to find the K nearest neighbors in high-dimensional vector spaces. It is meant as a convenient sandbox: drop in your own ANN code, run a one-liner, and instantly compare build/search speed and recall against the bundled baselines.
| Algorithm | Description | Location | Type |
|---|---|---|---|
| Brute Force | Exact search baseline (O(n)) | method/bruteforce/ |
Custom |
| HNSW | Hierarchical Navigable Small World (multi-layer) | method/hnsw1/ |
Custom |
| HNSW + Adaptive Search | HNSW with NGT's adaptive search strategy | method/hnsw2/ |
Custom |
| FlatNav | Single-layer HNSW graph | method/flatnav*/ |
Custom |
| ONNG | FlatNav + ONNG edge pruning | method/onng*/ |
Custom |
| hnswlib | Official hnswlib library | method/hnswlib/ |
Library |
| flatnavlib | Official FlatNav library | method/flatnavlib/ |
Library |
FlatNav: A single-layer variant of HNSW where random_level() always returns 0. This simplifies the graph structure while maintaining the core navigation properties.
ONNG: Built on top of FlatNav (single-layer graph). While initially implementing an edge pruning strategy from NGT, our latest findings indicate this step can be removed entirely without sacrificing, and in some cases even improving, the performance trade-off.
HNSW2: Uses NGT's adaptive search strategy with a gamma parameter to control early termination based on the ratio between current candidate distance and the worst distance in the result set.
Important: Our experiments consistently show that the edge pruning step primarily affects graph storage size and does not significantly reduce distance computations. More importantly, we found that omitting this step altogether leads to a simpler, faster, and equally effective algorithm. The onng1_test2_sq16_3 and _4 directories contain our best-performing implementations, which are pure single-layer graphs without any explicit edge pruning.
- SIMD: Vectorized distance computation (
flatnav1_SIMD/). Note: This implementation is x86-specific. - Multi-threading: Parallel search (
flatnav1_threads*/). - Graph Reordering: G-order, RCM optimization (
flatnav1_threads1_gorder/,flatnav1_threads1_rcm/). - Quantization: SQ8, SQ16 compression (
onng1_test2_sq*/).
| Dataset | Base Vectors | Dimensions | Queries | Note |
|---|---|---|---|---|
| SIFT | 1,000,000 | 128 | 10,000 | |
| GLOVE | 1,183,514 | 100 | 10,000 | Normalized, L2 distance |
| DEBUG | 1,000 | 16 | 100 |
Note: Only binary format (
.bin) files are included in this repository. Text format (.txt) files are excluded due to large file size (~800MB). You can convert between formats using the scripts inchecker/.
| Method | Search Latency (ms) | Recall@10 | Pass |
|---|---|---|---|
| Brute Force | 128.45 | 99.94% | ✓ |
| HNSW (hnsw1) | 1.27 | 99.57% | ✓ |
| HNSW + Adaptive (hnsw2) | 1.32 | 99.84% | ✓ |
| FlatNav | 1.36 | 99.84% | ✓ |
| FlatNav + SIMD | 1.09 | 99.84% | ✓ |
| FlatNav + Threads | 0.66 | 99.89% | ✓ |
| FlatNav + RCM | 0.54 | 99.86% | ✓ |
| FlatNav + RCM + Path | 0.43 | 99.57% | ✓ |
| ONNG | 0.36 | 99.28% | ✓ |
| ONNG (onng1_test2) | 0.33 | 99.33% | ✓ |
| ONNG + SQ16 | 0.24 | 99.32% | ✓ |
| No Pruning (onng1_test2_sq16_2) | 0.39 | 99.54% | ✓ |
| Method | Search Latency (ms) | Recall@10 | Pass |
|---|---|---|---|
| Brute Force | 133.90 | 100.00% | ✓ |
| HNSW (hnsw1) | 1.74 | 86.71% | ✗ |
| HNSW + Adaptive (hnsw2) | 11.67 | 99.14% | ✓ |
| FlatNav | 11.79 | 99.12% | ✓ |
| FlatNav + Threads | 5.86 | 99.24% | ✓ |
| FlatNav + RCM + Path | 3.93 | 99.31% | ✓ |
| ONNG | 3.47 | 99.13% | ✓ |
| ONNG (onng1_test2) | 3.06 | 99.32% | ✓ |
| ONNG + SQ16 | 2.12 | 99.26% | ✓ |
| No Pruning (onng1_test2_sq16_2) | 2.78 | 99.42% | ✓ |
Note: GLOVE vectors are normalized, so L2 distance is used. This is a harder dataset - standard HNSW without adaptive search fails to reach 99% recall.
.
|-- method/ # Algorithm implementations
| |-- bruteforce/ # Baseline brute force search
| |-- hnsw*/ # HNSW variants (custom)
| |-- flatnav*/ # FlatNav variants (custom)
| |-- onng*/ # ONNG variants (custom)
| |-- hnswlib/ # Library baseline (official hnswlib)
| `-- flatnavlib/ # Library baseline (FlatNav + Cereal/RapidJSON)
|-- ablation/ # Ablation study variants
| |-- hnsw2/ # HNSW ablation experiments
| `-- onng1_test2/ # ONNG ablation experiments
|-- checker/ # Basic evaluation tools
|-- checker1/ # High-precision evaluation
|-- checker2/ # Ablation study framework
| |-- evaluate.cpp # Ablation evaluation program
| |-- run_eval.bat # Windows script
| `-- plot_ablation.py # Result visualization
`-- data/ # Datasets (binary format)
|-- sift/
|-- glove/
`-- debug/
checker/— baseline evaluation harness; focuses on core build/search timing and Recall@10 gate (>=0.99).checker1/— same harness but with higher-precision distance handling for all graph algorithms; still being completed.checker2/— ablation study framework with parameter sweeping (gamma/efSearch), graph caching, and detailed statistics collection.
The ablation/ directory contains controlled experiments to measure the impact of individual algorithm components.
| Variant | Description |
|---|---|
baseline |
Full HNSW with adaptive gamma search |
baseline_with_fixed_beam |
Fixed efSearch instead of adaptive gamma |
baseline_without_hierarchy |
Single-layer graph (FlatNav-like) |
| Variant | Description |
|---|---|
baseline |
Full precision float (base) |
baseline_with_sq16 |
Add SQ16 quantization |
baseline_with_fixed_enterpoint |
Remove 100-point entry scan |
baseline_without_last_shortcut_reduction |
Remove second edge pruning pass |
baseline_without_rcm |
Remove RCM graph reordering |
baseline_without_simd |
Force scalar distance computation |
baseline_without_two_shortcut_reductions |
Remove all edge pruning |
We performed extensive ablation studies to isolate the impact of each optimization technique.
Based on our quantitative research and ablation studies, we have reached several critical conclusions regarding high-performance graph indexing.
1. The "Optimization Illusion" of Complexity Our experiments disprove the common belief that complex graph structures are necessary for high performance.
- Hierarchy is Redundant: We found that the multi-layer structure of HNSW becomes redundant as the base layer itself achieves high navigability. A single-layer graph (FlatNav) simplifies code complexity and memory layout with zero loss in the recall-latency trade-off.
- Explicit Pruning is Unnecessary: The rigorous edge pruning strategies (shortcut reduction) adapted from NGT/ONNG proved to be ineffective in our tests. HNSW's native heuristic for neighbor selection is robust enough; adding explicit pruning steps is an "optimization illusion" that consumes build time without tangible search benefits.
2. The Real Drivers of Performance
-
$\gamma$ -Adaptive Search: The most significant algorithmic gain comes from replacing the rigidefSearchwith NGT's$\gamma$ -adaptive search. This "soft filtering" mechanism allows the search to escape local optima dynamically, providing a superior QPS profile compared to fixed-beam search.
Figure: The chart demonstrates that the simplified single-layer graph (blue line) combined with adaptive search achieves state-of-the-art performance, outperforming the complex baseline. Note how fixed_beam (green line) lags significantly, highlighting the importance of the $\gamma$-search strategy.
-
SQ16 & SIMD: Among low-level optimizations, 16-bit Scalar Quantization (SQ16) and SIMD instructions are confirmed to be highly effective, drastically reducing memory bandwidth pressure and distance computation costs.
-
Ineffective Optimizations: Techniques like Graph Reordering (RCM) and complex entry-point selection strategies (0 vs 100 random points) showed negligible impact on high-dimensional vectors in our setup.
Figure: QPS vs Recall (Log Scale) on GloVe-1.18M. Note how SQ16 dominates the high-recall regime.
3. Hyperparameter Insights
We identified that the maximum degree of the base layer (Mmax0) can be significantly smaller than the construction search depth (efConstruction) (e.g., 64 vs 256). This configuration reduces index size and build time while maintaining >0.99 recall.
Figure: QPS vs Recall (Log Scale) on GloVe-1.18M. baseline with efConstruction=256 and Mmax0=256.
HNSW2 on SIFT (Impact of Hierarchy & Beam Search)

HNSW2 on GLOVE (Harder Dataset)

ONNG on SIFT (Impact of SQ16, SIMD, RCM)

- CPU: Intel Core i5-1135G7 @ 2.40GHz (Tiger Lake)
- Cores: 4 Cores, 8 Logical Processors
- Cache: L2 5.0 MB / L3 8.0 MB
- OS: Windows (Virtualization Enabled)
Note: All benchmarks were run on this single-node environment.
To achieve minimal variance and reproduce the strict latency numbers reported above, we recommend the following environment configuration. While not mandatory, these steps prevent OS scheduling jitter and thermal throttling.
- Disable Hyper-Threading: Reduces L1/L2 cache contention.
- Disable Turbo Boost / SpeedStep / C-States: Prevents frequency scaling and context switching delays.
- Fix CPU Frequency: Lock the CPU to its base frequency (or a static high performance state).
- Close all unnecessary background applications.
- (Optional) Reboot to ensure a clean OS state.
- Important: Remember to re-enable Hyper-Threading and dynamic frequency scaling in BIOS after benchmarking to restore full system performance for daily usage.
cd checker2
# Run single variant
.\run_eval.bat hnsw2\baseline DEBUG
# Run all variants
.\run_eval.bat hnsw2\baseline;.\run_eval.bat hnsw2\baseline_with_fixed_beam;...
# Plot results
python plot_ablation.py hnsw2 SIFTFeatures:
- Graph caching: Built graphs are saved to
graph_*.binfor reuse - Checkpoint resume: If interrupted, evaluation resumes from the last completed parameter
- Parameter sweep: Automatically sweeps gamma (0.00-0.50) or efSearch (10-91004)
- Early Stopping: Search terminates automatically if Recall@10 reaches 1.0, or if it remains stable (plateaus) for more than 3 consecutive iterations, or if it unexpectedly drops. This makes parameter sweeps much more efficient.
- C++17 compatible compiler (g++ recommended)
- Python 3.8+ (for data processing scripts)
# Navigate to checker directory
cd checker
# Run evaluation for a specific method
# Windows:
.\run_eval.bat hnsw1
# Linux:
./run_eval.sh hnsw1Why does DEBUG run twice in
checker/evaluate.cpp? During early development we intentionally evaluate the small DEBUG set two times back-to-back. The duplicate pass makes silent crashes or uninitialized-state bugs surface earlier (e.g., use-after-free after the first loop). The lighterchecker1variant drops this double-run to save a few seconds when you just need a clean pass.
- Binary format (
.bin): Optimized for fast loading - Text format (
.txt): Human-readable, one vector per line
- Create a new directory under
method/ - Implement
MySolution.handMySolution.cppwith the required interface:
class MySolution {
public:
void build(const std::vector<std::vector<float>>& base);
std::vector<int> search(const std::vector<float>& query, int k);
};- Run evaluation:
.\run_eval.bat your_method_name
This project includes or references the following open-source libraries:
| Library | Usage | License | Location |
|---|---|---|---|
| hnswlib | HNSW reference implementation | Apache-2.0 | method/hnswlib/ |
| FlatNav | FlatNav reference implementation | Apache-2.0 | method/flatnavlib/ |
| Cereal | Serialization | BSD-3-Clause | method/flatnavlib/cereal/ |
| RapidJSON | JSON parsing | MIT / BSD / JSON | method/flatnavlib/cereal/external/rapidjson/ |
This project is licensed under the MIT License - see the LICENSE file for details.
- SIFT Dataset: Corpus-Texmex
- GloVe dataset: Stanford NLP
- AI assistants: Claude (Anthropic), Gemini (Google)
