strfry Query Performance: Deep Historical Queries
Observation
Queries with until filters become progressively slower when fetching older events. Working with a relay containing 200k+ events of a specific kind/author, query times increase from <1s for recent data to 30+s for data a few hours older.
Environment
- Remote relay: strfry (wss://nip85.brainstorm.world)
- Hardware: DigitalOcean droplet, 8GB RAM, SSD storage
- Client: nostr-tools SimplePool.querySync()
- Query:
{ kinds: [30382], authors: ["48ec018359cac3c933f0f7a14550e36a4f683dcf55520c916dd8c61e7724f5de"], until: <timestamp>, limit: 500 }
- Known event count: ~200,000 events matching this filter
Observed Behavior
Query Duration by Depth
Pagination working backward from present (until = now, decrementing):
| Timestamp Range |
Duration |
Events/page |
Notes |
| Recent (now) |
0.8-1s |
490-500 |
Fast, likely in cache |
| -1hr |
7-8s |
450-500 |
Moderate slowdown |
| -2hr |
12-14s |
400-480 |
Significant slowdown |
| -3hr |
17s |
400-470 |
Approaching timeout limits |
| -4.5hr |
24-25s |
400-476 |
Near timeout limit (30s) |
| -5hr+ |
26-27s |
410-490 |
Consistently near timeout |
| -6hr+ |
28-29s |
450-485 |
96% of timeout limit |
| Older |
Timeout |
0 |
Eventually returns empty |
Actual Results
- With 4.4s timeout: Retrieved 28,000 events before timing out
- With 8.8s timeout: Retrieved 40,000 events before timing out
- With 30s timeout: Retrieved 92,000+ events (ongoing), queries now at 28-29s
Query duration increases exponentially as we go back in time. At 92k events retrieved (roughly 6 hours of history), queries are taking 28-29 seconds - 96% of the 30 second timeout limit. This represents only ~46% of the known 200k total events. At this rate, queries will exceed the timeout before retrieving the remaining ~108k events.
Hardware specs (8GB RAM, SSD) should be adequate for query performance, suggesting the bottleneck may be LMDB-specific behavior rather than raw storage speed.
Hypothesis
This could be due to:
- LMDB page cache behavior (recent data cached, older data requires disk I/O)
- B+ tree traversal characteristics for deep historical queries
- Index structure for compound filters (author+kind+timestamp)
Reproduction Steps
# Using nostr-tools or similar client
# Start with recent data
filter = {
kinds: [30382],
authors: ["48ec018359cac3c933f0f7a14550e36a4f683dcf55520c916dd8c61e7724f5de"],
until: Math.floor(Date.now() / 1000),
limit: 500
}
# Get first page
events = await pool.querySync(relays, filter)
# Query completes in ~1s
# Paginate backward: set until to oldest timestamp from previous page
oldest = Math.min(...events.map(e => e.created_at))
filter.until = oldest
events = await pool.querySync(relays, filter)
# Query completes in ~1s
# Continue pagination...
# After fetching ~50k events over a few hours of history:
# Query duration has increased to 15-20s+
Questions
- Is this behavior expected for deep historical queries with LMDB?
- Are there strfry configuration options that could help (cache size, etc.)?
- Is there a better query pattern for backfilling large historical datasets?
- Would a local strfry sync + negentropy copy improve query performance significantly?
strfry Query Performance: Deep Historical Queries
Observation
Queries with
untilfilters become progressively slower when fetching older events. Working with a relay containing 200k+ events of a specific kind/author, query times increase from <1s for recent data to 30+s for data a few hours older.Environment
{ kinds: [30382], authors: ["48ec018359cac3c933f0f7a14550e36a4f683dcf55520c916dd8c61e7724f5de"], until: <timestamp>, limit: 500 }Observed Behavior
Query Duration by Depth
Pagination working backward from present (until = now, decrementing):
Actual Results
Query duration increases exponentially as we go back in time. At 92k events retrieved (roughly 6 hours of history), queries are taking 28-29 seconds - 96% of the 30 second timeout limit. This represents only ~46% of the known 200k total events. At this rate, queries will exceed the timeout before retrieving the remaining ~108k events.
Hardware specs (8GB RAM, SSD) should be adequate for query performance, suggesting the bottleneck may be LMDB-specific behavior rather than raw storage speed.
Hypothesis
This could be due to:
Reproduction Steps
Questions