feat: add NIP-50 support #160
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This PR implements NIP-50 (Search Capability) for strfry, enabling full-text search across Nostr events using BM25 ranking. The implementation includes:
Architecture
Core Components
Search Provider Interface (
src/search/SearchProvider.h)LMDB Search Backend (
src/search/LmdbSearchProvider.h)Background Indexer (in
LmdbSearchProvider::runCatchupIndexer())SearchState.lastIndexedLevIdSearch Runner (
src/search/SearchRunner.h)Database Schema
New LMDB tables (defined in
golpe.yaml):Configuration
Key settings in
strfry.conf(relay.search):Supported
candidateRankingorders (desc for each component):terms-tf-recency(default)terms-recency-tftf-terms-recencytf-recency-termsrecency-terms-tfrecency-tf-termsConfiguration Parameters
enabled: Master switch for search functionalitybackend: Search provider implementation ("lmdb" or "noop")indexedKinds: Pattern of kinds to index (numbers/ranges/*/exclusions)maxQueryTerms: Maximum query terms parsedmaxPostingsPerToken: Max postings per token key (upper bound during fetch; pruning TBD)maxCandidateDocs: Maximum candidates for scoringoverfetchFactor: Candidate over-fetch before post-filteringrecencyBoostPercent: Recency tie-breaker percent (0–100; 1 = 1%)candidateRankMode:orderorweightedcandidateRanking: Order used when mode=order(list above)rankWeightTerms/rankWeightTf/rankWeightRecency: Weights for mode=weightedUsage
Enabling Search
Build strfry:
make -j$(nproc)Update
strfry.conf:Start strfry:
Indexing behavior:
Search Queries
Clients can issue NIP-50 search queries using the
searchfilter field:{ "kinds": [1], "search": "bitcoin lightning network", "limit": 100 }Search features:
Monitoring
Background indexer logs:
Query metrics include search-specific timings when
relay.logging.dbScanPerf = true(scan=Search).Performance Characteristics
Indexing Performance
Query Performance
maxCandidateDocsand result set sizeTuning guidelines:
maxCandidateDocsfor faster queries with slightly lower recalloverfetchFactorto improve recall for multi-token queriesBenchmark Suite
Put something together for benchmarks, but didn't finish. Will likely remove it before marking ready for review
A comprehensive benchmark suite is included under `bench/`:Running Benchmarks
Prepare a test database:
This generates cryptographically valid Nostr events using
nakand ingests them into a fresh database.Run the benchmark:
bench/scripts/run.sh -s scenarios/small.yml --out bench/results/raw/small-$(date +%s)Generate reports:
Benchmark Metrics
Testing
Manual Testing
Index a test database:
Issue search queries via WebSocket:
Verify results are returned in relevance order
Integration Points
DBQuery.h: Search queries execute alongside traditional index scansActiveMonitors.h: Search filters excluded from live subscription indexes (one-shot queries)QueryScheduler.h: Search provider injected into query execution pathcmd_relay.cpp: Background indexer lifecycle managementMigration Notes
Existing Databases
For existing strfry installations:
cd golpe && ./build.sh && cd .. && makeThe indexer will automatically catch up on all existing events. Monitor logs for progress.
Rollback
To disable search without data loss:
relay.search.enabled = falsein configThe search tables remain in the database but are not used. They can be manually removed using the
mdbcommand-line tools if desired.Known Limitations
contentfield of events (does not index tags or metadata)maxCandidateDocsfor optimal performanceFuture Enhancements
Potential improvements for future iterations:
Related Issues