Skip to content

Latest commit

 

History

History
283 lines (196 loc) · 19.2 KB

File metadata and controls

283 lines (196 loc) · 19.2 KB

Graph-Vector Database Performance Benchmarks (2025 #2)

Since the release of our v1 benchmarks, we’ve received a lot of feedback on the Postgres implementation. In these benchmarks we’ve made necessary amendments to make them as fair as possible. Please note, our oversights last time were not intentional, and we’re committed to accuracy and consistency across all our future benchmarks. Your feedback is greatly appreciated!

We are running HelixDB on a completely standard configuration, the same of which everyone would run after installing locally on their own machine. We are completely ACID, and have not made any configurations to our consistency to improve our performance. We also have not changed any default settings for Neo4j and Postgres. However, we’ve now changed the way our benchmarks are conducted for Postgres and Neo4j to make it more comparable.

We’ve heard you and realised for Postgres, we weren’t pooling connections and we weren’t taking advantage of prepared statements. This was an oversight due to an assumption regarding the tokio-postgres crate.

We now use deadpool-postgres to manage a connection pool and cache prepared statements. For the pool size, we use 20 as that is the default for PgBouncer, a service used to pool Postgres connections. We have opted not to use PgBouncer as we are only benchmarking from a single machine and can maintain a consistent number of connections ourselves, reducing unnecessary latency that would be caused by PgBouncer. As Postgres spawns processes per connection, we believe 20 is sensible as the server running Postgres has 8 vCPUs, so 2.5 processes per core on average.

We’ve also used this opportunity to check and update our Neo4j benchmarks. We’ve found that Neo4j has a Bolt worker pool of 400 by default, and we were previously using the neo4rs’ crate default of 16 connections. This has now been increased to utilise a maximum of 400 connections to maximise the number of concurrent requests while maintaining a default database configuration.

We're still working on Vector benchmarks - stay tuned!

TL;DR

(10,000 users, 500,000 items, ~4M edges).

  • HelixDB crushes graph workloads (1.6–7.8x faster than Postgres) and is your best bet for production graph workloads (GraphRAG, recommendations, social graphs)
  • Neo4j 3–10x slower than HelixDB
  • Postgres (multi-connection, prepared statements) is 1.6–7.8x slower than HelixDB

Dataset hash ffed7c34a46dc90e · Conducted November 2025 · Raw data in repo


How to Read This

Term Meaning
P50 / P95 / P99 50th / 95th / 99th percentile latency
ops/sec Successful operations per second (throughput)
FixedConcurrency Sustained load at N concurrent clients
FixedQPS Constant request rate, measures latency stability

Test Environment

Hardware: AWS c6g.2xlarge (eu-west-2) · 8 vCPUs (ARM Neoverse-N1) · 16 GB RAM · 500 GB gp3 EBS

Software: Ubuntu 24.04 LTS · HelixDB v2.1.0 · Neo4j 2025.09.0 (G1GC) · PostgreSQL 16.10

Benchmark: 2s warmup · 5s measurement window · FixedConcurrency (100/200/400/800) + FixedQPS (400/800/1600)

Dataset: 10k users across 25 countries · 500k items across 1k categories · ~4M edges (~400/user)

The same hardware has been used to run all benchmarks in this writeup.


Workloads Tested

ID Name Description Example Use
1 PointGet Retrieve entity by ID Product detail fetch
2 OneHop Traverse all user -> item edges Watch history
3 OneHopFilter Traverse + filter by category "Action movies watched"

Results Summary

Workload Raw Winner Performance Gap
PointGet HelixDB 10.2x Neo4j, 7.8x Postgres
OneHop HelixDB 5.5x Neo4j, 1.6x Postgres
OneHopFilter HelixDB 3.2x Neo4j, 2.6x Postgres

Detailed Results

1 · PointGet — Simple ID Lookup

Retrieve single item by ID (product detail, user profile).

Winner: HelixDB — 10.2x Neo4j, 7.8x Postgres

PointGet Performance

PointGet FixedQPS

FixedConcurrency Results

Database Concurrency Throughput (ops/sec) P50 Latency (ms) P95 Latency (ms) P99 Latency (ms)
HelixDB 100 90,238.4 1.07 1.29 1.41
Neo4j 100 17,591.4 5.51 7.47 9.99
Postgres 100 23,540.2 4.22 4.41 4.51
HelixDB 200 156,136.0 1.20 1.62 1.94
Neo4j 200 18,246.4 10.61 15.81 19.75
Postgres 200 23,971.2 8.32 8.48 8.56
HelixDB 400 186,642.4 1.92 2.81 3.44
Neo4j 400 17,516.6 22.33 31.78 37.66
Postgres 400 23,085.4 17.30 17.50 17.61
HelixDB 800 186,590.8 3.57 6.73 8.62
Neo4j 800 17,434.2 45.25 55.75 62.03
Postgres 800 23,516.4 33.82 34.85 35.24

FixedQPS Results

Database Target QPS Actual Throughput (ops/sec) P50 Latency (ms) P95 Latency (ms) P99 Latency (ms)
HelixDB 400 400.2 0.87 0.90 0.97
Neo4j 400 400.2 2.71 3.00 3.04
Postgres 400 400.2 0.82 0.91 0.96
HelixDB 800 800.2 0.87 0.90 0.96
Neo4j 800 800.2 2.92 3.11 3.20
Postgres 800 800.2 0.84 0.92 0.95
HelixDB 1600 1,600.0 0.93 0.99 1.10
Neo4j 1600 1,600.0 2.74 3.06 3.18
Postgres 1600 1,600.2 0.84 0.92 0.95

Changes from Benchmark v1

Postgres improved ~53% in throughput (15.4K → 23.5K ops/sec) thanks to connection pooling and prepared statement caching. Single-row lookups benefit significantly from eliminating connection setup overhead and statement parsing on each request.

Neo4j improved ~50% (11.7K → 17.6K ops/sec) by increasing max connections from 16 to 400, allowing better utilization of Bolt worker threads.

Tradeoffs:

  • Postgres (20 connections): Balances performance with resource usage (~2.5 processes per vCPU). More connections would increase memory overhead and context switching. Fewer connections would bottleneck concurrent requests.
  • Neo4j (400 max connections): Matches the default Bolt worker pool size but consumes significant memory per connection. In production, this would require careful tuning based on workload and available RAM.

2 · OneHop — Graph Traversal

Fetch all items a user interacted with (~400 edges/user).

Winner: HelixDB — 5.5x Neo4j, 1.6x Postgres

OneHop Performance

OneHop FixedQPS

FixedConcurrency Results

Database Concurrency Throughput (ops/sec) P50 Latency (ms) P95 Latency (ms) P99 Latency (ms)
HelixDB 100 14,964.6 6.09 11.87 15.30
Neo4j 100 2,760.2 35.33 54.01 63.60
Postgres 100 9,319.0 10.61 11.71 12.33
HelixDB 200 15,129.2 12.72 18.74 22.21
Neo4j 200 2,752.2 70.34 110.89 129.82
Postgres 200 9,473.4 21.00 22.14 22.76
HelixDB 400 15,122.8 25.96 32.00 35.50
Neo4j 400 2,686.4 143.48 233.31 292.81
Postgres 400 9,506.0 41.99 43.20 43.86
HelixDB 800 14,996.8 52.84 58.88 62.44
Neo4j 800 2,713.2 290.08 368.36 415.16
Postgres 800 9,231.2 86.61 87.87 88.49

FixedQPS Results

Database Target QPS Actual Throughput (ops/sec) P50 Latency (ms) P95 Latency (ms) P99 Latency (ms)
HelixDB 400 400.2 1.70 2.51 2.65
Neo4j 400 400.2 6.21 7.10 7.32
Postgres 400 400.2 1.62 1.72 1.79
HelixDB 800 800.2 1.51 2.45 2.67
Neo4j 800 800.2 6.48 7.63 8.05
Postgres 800 800.2 1.68 1.90 2.01
HelixDB 1600 1,600.0 1.53 2.43 2.73
Neo4j 1600 1,600.0 6.91 8.43 10.00
Postgres 1600 1,600.0 1.69 1.87 1.96

Changes from Benchmark v1

Postgres improved dramatically by ~718% in throughput (1.1K → 9.3K ops/sec). Graph traversals require many sequential queries, and the previous single-connection approach was creating a new connection for every traversal. Connection pooling eliminates this overhead, and prepared statement caching removes query parsing costs for repeated patterns.

Neo4j improved ~8% (2.5K → 2.7K ops/sec) with more connections. The smaller gain suggests Neo4j's graph traversal performance was already well-optimized and less bottlenecked by connection limits for this workload.

Tradeoffs:

  • Postgres (20 connections, prepared statements): The massive improvement shows Postgres can handle graph traversals effectively when properly configured. However, each traversal still requires multiple sequential queries (JOIN operations), limiting scalability compared to native graph storage. The 20-connection limit prevents overwhelming the process-based architecture while maximizing throughput.
  • Neo4j (400 max connections): Relatively small improvement indicates connection count wasn't the primary bottleneck. The traversal itself is the dominant cost factor. More connections primarily help with concurrent request handling rather than single-query performance.

Why the gap narrowed so much: v1's single-connection Postgres was essentially benchmarking connection setup overhead, not traversal performance. With proper pooling, Postgres demonstrates respectable graph query capabilities, though still slower than purpose-built graph databases.


3 · OneHopFilter — Filtered Traversal

Find items a user interacted with in a specific category.

Winner: HelixDB — 3.2x Neo4j, 2.6x Postgres

OneHopFilter Performance

OneHopFilter FixedQPS

FixedConcurrency Results

Database Concurrency Throughput (ops/sec) P50 Latency (ms) P95 Latency (ms) P99 Latency (ms)
HelixDB 100 32,030.2 2.94 5.19 7.54
Neo4j 100 10,470.2 9.18 13.76 18.40
Postgres 100 12,750.0 7.75 8.47 8.86
HelixDB 200 33,277.0 5.84 8.96 12.93
Neo4j 200 10,416.8 18.49 29.16 38.25
Postgres 200 12,883.4 15.44 16.20 16.63
HelixDB 400 33,540.8 11.67 16.31 19.85
Neo4j 400 10,254.8 38.02 57.32 70.43
Postgres 400 12,890.2 30.96 31.74 32.16
HelixDB 800 33,438.6 23.64 28.32 32.03
Neo4j 800 10,240.0 77.00 97.08 110.20
Postgres 800 12,812.8 62.39 63.26 63.66

FixedQPS Results

Database Target QPS Actual Throughput (ops/sec) P50 Latency (ms) P95 Latency (ms) P99 Latency (ms)
HelixDB 400 400.2 1.16 1.21 1.23
Neo4j 400 400.2 3.21 3.33 3.41
Postgres 400 400.2 1.34 1.44 1.50
HelixDB 800 800.2 1.09 1.15 1.19
Neo4j 800 800.2 3.01 3.27 3.35
Postgres 800 800.0 1.35 1.47 1.54
HelixDB 1600 1,600.2 1.09 1.28 1.38
Neo4j 1600 1,600.2 3.08 3.34 3.47
Postgres 1600 1,599.6 1.33 1.44 1.53

Changes from Benchmark v1

Postgres improved massively by ~700% in throughput (1.6K → 12.8K ops/sec). Filtered graph traversals combine the benefits of connection pooling with prepared statement caching for the filtering predicates, making a dramatic difference when the same query patterns repeat.

Neo4j improved ~31% (7.8K → 10.2K ops/sec) with increased connections. The moderate improvement suggests this workload benefits from parallelization but is still bound by traversal + filtering costs.

Tradeoffs:

  • Postgres (20 connections, prepared statements): The massive speedup demonstrates how critical proper database client configuration is for Postgres. Prepared statements are especially valuable here because filter patterns repeat frequently. However, Postgres still needs to execute JOINs and filter operations across normalized tables, which is inherently less efficient than graph-native storage with embedded adjacency lists.
  • Neo4j (400 max connections): Better improvement than simple traversal (OneHop) because filtering benefits from parallel execution across more worker threads. However, each connection consumes memory for caching and buffers, creating a tradeoff between throughput and resource usage.

Why the gap narrowed: v1 Postgres was severely handicapped by connection overhead and repeated query parsing. With pooling and prepared statements, Postgres shows its strength in filtered queries, though it still can't match the architectural advantages of graph databases that store relationships as first-class citizens.

Production implications: If you're using Postgres for graph workloads, connection pooling and prepared statements are non-negotiable. Without them, performance degrades by 7-8x for traversals. Even optimized, Postgres requires more careful query tuning and indexing to approach graph database performance.


Performance Highlights

Database Best At Latency (P50)
HelixDB Graph operations 0.9–2.5 ms
Neo4j Graph operations 2.0–2.1 ms
Postgres Graph operations 0.8–1.7 ms

Limitations & Reproducibility

What we didn't test:

  • Cold-start latency
  • Insertion times
  • Memory footprint during ingestion
  • Operational complexity

How to reproduce:

  • Dataset hash: ffed7c34a46dc90e
  • Raw JSON results, configs, and benchmark driver in repo
  • Tests: November 2025, AWS c6g.2xlarge (eu-west-2)

Conclusion

Despite making necessary adjustments, HelixDB still comes out on top; outperforming Postgres on throughput by up to 7.8x, and Neo4j by more than 11x. We're really excited by the fact that Postgres has managed to match our read latency given the updated configuration, because we have so many optimisations we can still make to get this number even lower. It's also worth noting that we have NOT benchmarked multi-hop traversals in this writeup. This will be done in our next set of our benchmarks.

Some of the optimisations to be implemented in the coming weeks and months:

  • Removing allocations incurred by using the json macro.
  • Using QUIC instead of http 1
  • Using binary protocol instead of JSON
  • Zero-copy reads from KV store

So, in conclusion, just use Helix.


Benchmark data + scripts: github.com/helixdb/graph-vector-bench