forked from NixOS/nix
-
Notifications
You must be signed in to change notification settings - Fork 13
Binary cache Bloom filter #488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
edolstra
wants to merge
25
commits into
main
Choose a base branch
from
eelcodolstra/nix-423
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
22c182d
createCache(): Take CacheInfo as argument
edolstra d94497a
Deduplicate Cache/CacheInfo
edolstra 343b7b1
nix serve: publish a bloom filter of valid store paths
edolstra 6658cb9
BinaryCacheStore: consult bloom filter to skip definite misses
edolstra 885295d
libstore: factor bloom-filter bit positions into a shared helper
edolstra 0d82d91
Formatting
edolstra 4bf9be5
libstore: hoist buildBloomFilter and add nix store generate-bloom-filter
edolstra c41ff96
Test false-positive rate
edolstra bc71363
tests: cover bloom-filter rule-out and disk-cache reuse via nix serve
edolstra e12c37e
nix serve: Implement ETag for the bloom filter
edolstra 696e27e
tests: loop fake hashparts until one is ruled out by the bloom filter
edolstra 1dbe07a
nix serve: Add --false-positive-rate flag for the bloom filter
edolstra 8781bbe
Formatting
edolstra 4b4ff7f
bloom filter: switch wire integers to u64 and use StringSink/StringSo…
edolstra 55577fa
Move ConditionalGetResult
edolstra 4f41029
bloom filter -> Bloom filter
edolstra 38d11a1
bloom filter: validate falsePositiveRate and floor mBits at 8
edolstra 1e5fe74
bloom filter: support absolute BloomFilter URLs
edolstra 3f8dc57
bloom filter: fix doc comments
edolstra bf3cec8
bloom filter: drop BloomState; store raw blob; combine lookup+probe
edolstra 538e107
bloom filter: inline maybeDisableBloomFilter as a local lambda
edolstra cbe4c3f
Drop catch all
edolstra c400002
Drop testing code
edolstra 063501b
binary cache: add use-bloom-filter setting (default true)
edolstra cfce945
Drop noexcept
edolstra File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # Binary Cache Bloom Filter Format | ||
|
|
||
| A [binary cache](@docroot@/package-management/binary-cache-substituter.md) may publish a Bloom filter of all store paths it contains. | ||
| The filter's URL is announced through the [`BloomFilter`](@docroot@/protocols/nix-cache-info.md#bloomfilter) field of the cache's [`nix-cache-info`](@docroot@/protocols/nix-cache-info.md) file — either as an absolute URL or as a path relative to the cache root. | ||
| A cache that does not advertise the field does not provide a Bloom filter; clients must not probe for one at a default path. | ||
|
|
||
| A Bloom filter lets a client decide that a store path is **definitely not** in the cache without issuing a `.narinfo` request. | ||
| Membership tests are one-sided: a "not present" answer is authoritative, while a "possibly present" answer must still be confirmed by fetching the `.narinfo`. | ||
| False positives occur at a configurable rate; false negatives do not. | ||
|
|
||
| MIME type: `application/octet-stream` | ||
|
|
||
| ## Format | ||
|
|
||
| The response is binary, little-endian, with a fixed 32-byte header followed by the raw bit array: | ||
|
|
||
| | Offset | Size | Field | Description | | ||
| |-------:|-----------:|-----------|----------------------------------------------------------| | ||
| | 0 | 8 | `magic` | ASCII bytes `NixBloom` (no terminating NUL). | | ||
| | 8 | 8 | `version` | `uint64` format version. Currently `1`. | | ||
| | 16 | 8 | `k` | `uint64` number of hash functions. | | ||
| | 24 | 8 | `m` | `uint64` size of the bit array, in bits. Multiple of 8. | | ||
| | 32 | `m / 8` | `bits` | The bit array. Bit at position `p` is `bits[p / 8] >> (p % 8)` masked with `1`. | | ||
|
|
||
| The total response size is `32 + m / 8` bytes. | ||
|
|
||
| ## Membership test | ||
|
|
||
| A client tests whether a store path *might* be in the cache as follows: | ||
|
|
||
| 1. Take the path's [hash part](@docroot@/protocols/store-path.md) — the first 32 [Nix32](@docroot@/protocols/nix32.md) characters of its base name. | ||
| 2. Decode it into a 20-byte (160-bit) sequence using Nix32 decoding. | ||
| 3. Read two 64-bit unsigned values from the decoded bytes, little-endian: | ||
| - `h1` from bytes `0..8` | ||
| - `h2` from bytes `8..16` | ||
| (The trailing 4 bytes are unused.) | ||
| 4. For each `i` in `0, 1, …, k − 1`, compute the bit position | ||
| ``` | ||
| pos = ((h1 + i * h2) mod 2^64) mod m | ||
| ``` | ||
| The intermediate addition and multiplication wrap modulo 2^64 (standard unsigned 64-bit overflow) before the modulo by `m`. | ||
| 5. If every `bits[pos / 8] >> (pos % 8)` has its low bit set, the path is *possibly* present; otherwise it is *definitely not* present. | ||
|
|
||
| This is the standard Kirsch-Mitzenmacher double-hashing scheme. | ||
| Because a store path's hash part is already a cryptographic hash, no further hashing is required. | ||
|
|
||
| ## Server-side construction | ||
|
|
||
| The server populates the filter by performing the same membership procedure for every valid store path and OR-ing in the resulting bits. | ||
|
|
||
| Parameters are chosen from the count `n` of valid paths and a target false-positive rate `p`: | ||
|
|
||
| ``` | ||
| m = ceil(-n * ln(p) / (ln 2)^2), rounded up to a multiple of 8 | ||
| k = max(1, round((m / n) * ln 2)) | ||
| ``` | ||
|
|
||
| If `n` is zero, the server may emit a minimal filter (e.g., `m = 8`, `k = 1`, all bits zero), which correctly reports every query as "not present". | ||
|
|
||
| The choice of `p` is server-defined and not advertised separately: a client can infer the asymptotic FPR from `m` and the number of paths in the cache, but does not need to in order to use the filter. | ||
|
|
||
| ## Caching | ||
|
|
||
| The Bloom filter changes whenever the cache's path set changes. | ||
| Clients should refetch periodically; an HTTP cache lifetime on the order of minutes-to-hours is typically appropriate. | ||
|
|
||
| ## Example | ||
|
|
||
| A cache containing roughly 500 000 paths, with a 1% target false-positive rate, produces a filter with `k = 7` and `m ≈ 4.7 × 10^6` bits — roughly 590 KB on the wire including the header. | ||
|
|
||
| ## See Also | ||
|
|
||
| - [Nix Cache Info Format](@docroot@/protocols/nix-cache-info.md) | ||
| - [Store Path Specification](@docroot@/protocols/store-path.md) | ||
| - [Nix32 Encoding](@docroot@/protocols/nix32.md) | ||
| - [HTTP Binary Cache Store](@docroot@/store/types/http-binary-cache-store.md) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| #include "nix/store/bloom-filter.hh" | ||
| #include "nix/util/serialise.hh" | ||
|
|
||
| #include <cmath> | ||
|
|
||
| namespace nix { | ||
|
|
||
| std::optional<BloomFilterParams> parseBloomFilterHeader(std::string_view header) | ||
| { | ||
| using namespace std::string_view_literals; | ||
| if (header.size() < bloomFilterHeaderLen || header.substr(0, 8) != "NixBloom"sv) | ||
| return std::nullopt; | ||
|
|
||
| StringSource source(header.substr(8)); | ||
| uint64_t version; | ||
| uint32_t k; | ||
| uint64_t mBits; | ||
| try { | ||
| source >> version >> k >> mBits; | ||
| } catch (SerialisationError &) { | ||
| return std::nullopt; | ||
| } | ||
|
|
||
| if (version != 1 || mBits == 0 || mBits % 8 != 0) | ||
| return std::nullopt; | ||
|
|
||
| return BloomFilterParams{.k = k, .mBits = mBits}; | ||
| } | ||
|
|
||
| std::string buildBloomFilter(const StorePathSet & paths, double falsePositiveRate) | ||
| { | ||
| /* Rejects NaN as well, because all comparisons with NaN are false. */ | ||
| if (!(falsePositiveRate > 0 && falsePositiveRate < 1)) | ||
| throw Error("Bloom filter false positive rate must be between 0 and 1, got %f", falsePositiveRate); | ||
|
|
||
| size_t n = paths.size(); | ||
|
|
||
| uint64_t mBits = 8; | ||
| uint32_t k = 1; | ||
| if (n) { | ||
| constexpr double ln2 = 0.6931471805599453; | ||
| double mF = -double(n) * std::log(falsePositiveRate) / (ln2 * ln2); | ||
| /* `falsePositiveRate` very close to 1 makes `mF` round down to zero; | ||
| keep the floor of 8 bits so we never modulo by zero later. */ | ||
| mBits = std::max<uint64_t>(8, ((uint64_t(std::ceil(mF)) + 7) / 8) * 8); | ||
| long kL = std::lround((double(mBits) / double(n)) * ln2); | ||
| k = uint32_t(std::max<long>(1, kL)); | ||
| } | ||
|
coderabbitai[bot] marked this conversation as resolved.
|
||
|
|
||
| StringSink sink(bloomFilterHeaderLen + mBits / 8); | ||
|
|
||
| using namespace std::string_view_literals; | ||
| sink("NixBloom"sv); | ||
| sink << 1; // version | ||
| sink << k; | ||
| sink << mBits; | ||
| assert(sink.s.size() == bloomFilterHeaderLen); | ||
|
|
||
| sink.s.resize(bloomFilterHeaderLen + mBits / 8); | ||
| char * bits = sink.s.data() + bloomFilterHeaderLen; | ||
| for (auto & path : paths) | ||
| forEachBloomBitPosition(path, k, mBits, [&](uint64_t pos) { bits[pos / 8] |= uint8_t(1) << (pos % 8); }); | ||
|
|
||
| return std::move(sink.s); | ||
| } | ||
|
|
||
| } // namespace nix | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.