Skip to content

This project implements 30+ variants of ANN algorithms to find the K nearest neighbors in high-dimensional vector spaces. It is meant as a convenient sandbox: drop in your own ANN code, run a one-liner, and instantly compare build/search speed and recall against the bundled baselines.

License

Notifications You must be signed in to change notification settings

wibyuan/easyANN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ANN - Approximate Nearest Neighbor Search

A comparative study of multiple Approximate Nearest Neighbor (ANN) search algorithms, implementing and benchmarking various graph-based approaches including HNSW, FlatNav, and ONNG.

Overview

This project implements 30+ variants of ANN algorithms to find the K nearest neighbors in high-dimensional vector spaces. It is meant as a convenient sandbox: drop in your own ANN code, run a one-liner, and instantly compare build/search speed and recall against the bundled baselines.

Algorithms Implemented

Algorithm Description Location Type
Brute Force Exact search baseline (O(n)) method/bruteforce/ Custom
HNSW Hierarchical Navigable Small World (multi-layer) method/hnsw1/ Custom
HNSW + Adaptive Search HNSW with NGT's adaptive search strategy method/hnsw2/ Custom
FlatNav Single-layer HNSW graph method/flatnav*/ Custom
ONNG FlatNav + ONNG edge pruning method/onng*/ Custom
hnswlib Official hnswlib library method/hnswlib/ Library
flatnavlib Official FlatNav library method/flatnavlib/ Library

Implementation Notes

FlatNav: A single-layer variant of HNSW where random_level() always returns 0. This simplifies the graph structure while maintaining the core navigation properties.

ONNG: Built on top of FlatNav (single-layer graph). While initially implementing an edge pruning strategy from NGT, our latest findings indicate this step can be removed entirely without sacrificing, and in some cases even improving, the performance trade-off.

HNSW2: Uses NGT's adaptive search strategy with a gamma parameter to control early termination based on the ratio between current candidate distance and the worst distance in the result set.

Important: Our experiments consistently show that the edge pruning step primarily affects graph storage size and does not significantly reduce distance computations. More importantly, we found that omitting this step altogether leads to a simpler, faster, and equally effective algorithm. The onng1_test2_sq16_3 and _4 directories contain our best-performing implementations, which are pure single-layer graphs without any explicit edge pruning.

Optimizations

  • SIMD: Vectorized distance computation (flatnav1_SIMD/). Note: This implementation is x86-specific.
  • Multi-threading: Parallel search (flatnav1_threads*/).
  • Graph Reordering: G-order, RCM optimization (flatnav1_threads1_gorder/, flatnav1_threads1_rcm/).
  • Quantization: SQ8, SQ16 compression (onng1_test2_sq*/).

Datasets

Dataset Base Vectors Dimensions Queries Note
SIFT 1,000,000 128 10,000
GLOVE 1,183,514 100 10,000 Normalized, L2 distance
DEBUG 1,000 16 100

Note: Only binary format (.bin) files are included in this repository. Text format (.txt) files are excluded due to large file size (~800MB). You can convert between formats using the scripts in checker/.

Performance Results

SIFT Dataset (1M vectors, 128-dim, 10K queries)

Method Search Latency (ms) Recall@10 Pass
Brute Force 128.45 99.94%
HNSW (hnsw1) 1.27 99.57%
HNSW + Adaptive (hnsw2) 1.32 99.84%
FlatNav 1.36 99.84%
FlatNav + SIMD 1.09 99.84%
FlatNav + Threads 0.66 99.89%
FlatNav + RCM 0.54 99.86%
FlatNav + RCM + Path 0.43 99.57%
ONNG 0.36 99.28%
ONNG (onng1_test2) 0.33 99.33%
ONNG + SQ16 0.24 99.32%
No Pruning (onng1_test2_sq16_2) 0.39 99.54%

GLOVE Dataset (1.18M vectors, 100-dim, 10K queries)

Method Search Latency (ms) Recall@10 Pass
Brute Force 133.90 100.00%
HNSW (hnsw1) 1.74 86.71%
HNSW + Adaptive (hnsw2) 11.67 99.14%
FlatNav 11.79 99.12%
FlatNav + Threads 5.86 99.24%
FlatNav + RCM + Path 3.93 99.31%
ONNG 3.47 99.13%
ONNG (onng1_test2) 3.06 99.32%
ONNG + SQ16 2.12 99.26%
No Pruning (onng1_test2_sq16_2) 2.78 99.42%

Note: GLOVE vectors are normalized, so L2 distance is used. This is a harder dataset - standard HNSW without adaptive search fails to reach 99% recall.

Project Structure

.
|-- method/           # Algorithm implementations
|   |-- bruteforce/   # Baseline brute force search
|   |-- hnsw*/        # HNSW variants (custom)
|   |-- flatnav*/     # FlatNav variants (custom)
|   |-- onng*/        # ONNG variants (custom)
|   |-- hnswlib/      # Library baseline (official hnswlib)
|   `-- flatnavlib/   # Library baseline (FlatNav + Cereal/RapidJSON)
|-- ablation/         # Ablation study variants
|   |-- hnsw2/        # HNSW ablation experiments
|   `-- onng1_test2/  # ONNG ablation experiments
|-- checker/          # Basic evaluation tools
|-- checker1/         # High-precision evaluation
|-- checker2/         # Ablation study framework
|   |-- evaluate.cpp  # Ablation evaluation program
|   |-- run_eval.bat  # Windows script
|   `-- plot_ablation.py  # Result visualization
`-- data/             # Datasets (binary format)
    |-- sift/
    |-- glove/
    `-- debug/

Checkers

  • checker/ — baseline evaluation harness; focuses on core build/search timing and Recall@10 gate (>=0.99).
  • checker1/ — same harness but with higher-precision distance handling for all graph algorithms; still being completed.
  • checker2/ — ablation study framework with parameter sweeping (gamma/efSearch), graph caching, and detailed statistics collection.

Ablation Studies

The ablation/ directory contains controlled experiments to measure the impact of individual algorithm components.

HNSW2 Ablation (ablation/hnsw2/)

Variant Description
baseline Full HNSW with adaptive gamma search
baseline_with_fixed_beam Fixed efSearch instead of adaptive gamma
baseline_without_hierarchy Single-layer graph (FlatNav-like)

ONNG Ablation (ablation/onng1_test2/)

Variant Description
baseline Full precision float (base)
baseline_with_sq16 Add SQ16 quantization
baseline_with_fixed_enterpoint Remove 100-point entry scan
baseline_without_last_shortcut_reduction Remove second edge pruning pass
baseline_without_rcm Remove RCM graph reordering
baseline_without_simd Force scalar distance computation
baseline_without_two_shortcut_reductions Remove all edge pruning

We performed extensive ablation studies to isolate the impact of each optimization technique.

Findings

Based on our quantitative research and ablation studies, we have reached several critical conclusions regarding high-performance graph indexing.

1. The "Optimization Illusion" of Complexity Our experiments disprove the common belief that complex graph structures are necessary for high performance.

  • Hierarchy is Redundant: We found that the multi-layer structure of HNSW becomes redundant as the base layer itself achieves high navigability. A single-layer graph (FlatNav) simplifies code complexity and memory layout with zero loss in the recall-latency trade-off.
  • Explicit Pruning is Unnecessary: The rigorous edge pruning strategies (shortcut reduction) adapted from NGT/ONNG proved to be ineffective in our tests. HNSW's native heuristic for neighbor selection is robust enough; adding explicit pruning steps is an "optimization illusion" that consumes build time without tangible search benefits.

2. The Real Drivers of Performance

  • $\gamma$-Adaptive Search: The most significant algorithmic gain comes from replacing the rigid efSearch with NGT's $\gamma$-adaptive search. This "soft filtering" mechanism allows the search to escape local optima dynamically, providing a superior QPS profile compared to fixed-beam search.

Ablation Study

Figure: The chart demonstrates that the simplified single-layer graph (blue line) combined with adaptive search achieves state-of-the-art performance, outperforming the complex baseline. Note how fixed_beam (green line) lags significantly, highlighting the importance of the $\gamma$-search strategy.

  • SQ16 & SIMD: Among low-level optimizations, 16-bit Scalar Quantization (SQ16) and SIMD instructions are confirmed to be highly effective, drastically reducing memory bandwidth pressure and distance computation costs.

  • Ineffective Optimizations: Techniques like Graph Reordering (RCM) and complex entry-point selection strategies (0 vs 100 random points) showed negligible impact on high-dimensional vectors in our setup.

Ablation Study

Figure: QPS vs Recall (Log Scale) on GloVe-1.18M. Note how SQ16 dominates the high-recall regime.

3. Hyperparameter Insights We identified that the maximum degree of the base layer (Mmax0) can be significantly smaller than the construction search depth (efConstruction) (e.g., 64 vs 256). This configuration reduces index size and build time while maintaining >0.99 recall. ONNG Ablation on SIFT Figure: QPS vs Recall (Log Scale) on GloVe-1.18M. baseline with efConstruction=256 and Mmax0=256.

Detailed Analysis

HNSW2 on SIFT (Impact of Hierarchy & Beam Search) HNSW2 SIFT

HNSW2 on GLOVE (Harder Dataset) HNSW2 GLOVE

ONNG on SIFT (Impact of SQ16, SIMD, RCM) ONNG SIFT

ONNG on GLOVE ONNG GLOVE

Benchmark Environment

  • CPU: Intel Core i5-1135G7 @ 2.40GHz (Tiger Lake)
  • Cores: 4 Cores, 8 Logical Processors
  • Cache: L2 5.0 MB / L3 8.0 MB
  • OS: Windows (Virtualization Enabled)

Note: All benchmarks were run on this single-node environment.

Benchmarking Best Practices

To achieve minimal variance and reproduce the strict latency numbers reported above, we recommend the following environment configuration. While not mandatory, these steps prevent OS scheduling jitter and thermal throttling.

1. BIOS Configuration (Reduce Jitter)

  • Disable Hyper-Threading: Reduces L1/L2 cache contention.
  • Disable Turbo Boost / SpeedStep / C-States: Prevents frequency scaling and context switching delays.
  • Fix CPU Frequency: Lock the CPU to its base frequency (or a static high performance state).

2. Preparation

  • Close all unnecessary background applications.
  • (Optional) Reboot to ensure a clean OS state.

3. Restoration

  • Important: Remember to re-enable Hyper-Threading and dynamic frequency scaling in BIOS after benchmarking to restore full system performance for daily usage.

Running Ablation Experiments

cd checker2

# Run single variant
.\run_eval.bat hnsw2\baseline DEBUG

# Run all variants
.\run_eval.bat hnsw2\baseline;.\run_eval.bat hnsw2\baseline_with_fixed_beam;...

# Plot results
python plot_ablation.py hnsw2 SIFT

Features:

  • Graph caching: Built graphs are saved to graph_*.bin for reuse
  • Checkpoint resume: If interrupted, evaluation resumes from the last completed parameter
  • Parameter sweep: Automatically sweeps gamma (0.00-0.50) or efSearch (10-91004)
  • Early Stopping: Search terminates automatically if Recall@10 reaches 1.0, or if it remains stable (plateaus) for more than 3 consecutive iterations, or if it unexpectedly drops. This makes parameter sweeps much more efficient.

Quick Start

Prerequisites

  • C++17 compatible compiler (g++ recommended)
  • Python 3.8+ (for data processing scripts)

Build and Run

# Navigate to checker directory
cd checker

# Run evaluation for a specific method
# Windows:
.\run_eval.bat hnsw1

# Linux:
./run_eval.sh hnsw1

Why does DEBUG run twice in checker/evaluate.cpp? During early development we intentionally evaluate the small DEBUG set two times back-to-back. The duplicate pass makes silent crashes or uninitialized-state bugs surface earlier (e.g., use-after-free after the first loop). The lighter checker1 variant drops this double-run to save a few seconds when you just need a clean pass.

Data Format

  • Binary format (.bin): Optimized for fast loading
  • Text format (.txt): Human-readable, one vector per line

Implementing Your Own Algorithm

  1. Create a new directory under method/
  2. Implement MySolution.h and MySolution.cpp with the required interface:
class MySolution {
public:
    void build(const std::vector<std::vector<float>>& base);
    std::vector<int> search(const std::vector<float>& query, int k);
};
  1. Run evaluation: .\run_eval.bat your_method_name

Dependencies

This project includes or references the following open-source libraries:

Library Usage License Location
hnswlib HNSW reference implementation Apache-2.0 method/hnswlib/
FlatNav FlatNav reference implementation Apache-2.0 method/flatnavlib/
Cereal Serialization BSD-3-Clause method/flatnavlib/cereal/
RapidJSON JSON parsing MIT / BSD / JSON method/flatnavlib/cereal/external/rapidjson/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

About

This project implements 30+ variants of ANN algorithms to find the K nearest neighbors in high-dimensional vector spaces. It is meant as a convenient sandbox: drop in your own ANN code, run a one-liner, and instantly compare build/search speed and recall against the bundled baselines.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages