-
Notifications
You must be signed in to change notification settings - Fork 6k
BIPs: SwiftSync Specification #2152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,273 @@ | ||
| ``` | ||
| BIP: ? | ||
| Layer: Peer Services | ||
| Title: Peer sharing of block spent coins | ||
| Authors: Robert Netzke <rob@2140.dev> | ||
| Deputies: Ruben Somsen <ruben@2140.dev> | ||
| Status: Draft | ||
| Type: Specification | ||
| Assigned: ? | ||
| License: BSD-3-Clause | ||
|
rustaceanrob marked this conversation as resolved.
|
||
| Discussion: https://groups.google.com/g/bitcoindev/c/FpSWUxItXQs/m/pnfjP6rFCgAJ | ||
| ``` | ||
|
|
||
| ## Abstract | ||
|
|
||
| Inputs of a Bitcoin block are referenced by the outpoint data structure. This poses a limitation during initial block | ||
| download (IBD), such that a client must process blocks sequentially to validate the chain history. The SwiftSync | ||
| protocol allows blocks to be evaluated in arbitrary order, however additional data is required that must be served over | ||
| the peer-to-peer network. This document describes how to share this data over the peer to peer network. | ||
|
|
||
| ## Motivation | ||
|
|
||
| A current limitation of IBD is that it must be done sequentially. This is a result of the height, coinbase flag, input | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Given the postulation or existence of alternative syncing models, that feels a bit loaded. Maybe mention that this specifically refers to Bitcoin Core or alternatively consider something along the lines of: "The common approach to IBD is to process blocks sequentially as that ensures the existence of TXO details when input validation requires them to be available."
This is jumping several steps from the prior statement at once. Maybe you could segue that a bit more, e.g., by mentioning that fields you introduce are TXO details, before going into them being only implicitly or not at all committed to by transaction inputs, before explaining how that makes it impossible to verify what is provided by a peer. |
||
| script, and amount of the block inputs being omitted from the data committed to by proof of work in the current block, | ||
| and, thus, this data cannot be trusted if received over the wire naively. Using the SwiftSync protocol, a client is able | ||
| to verify the correctness of this data, even if served by a potentially untrusted party. This allows a significant | ||
| improvement in IBD performance, as block downloads may be done in parallel. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a bit imprecise: block download is always done in parallel, it’s just validation that is sequential. Do you mean that block validation can be parallelized? |
||
|
|
||
| ## Specification | ||
|
rustaceanrob marked this conversation as resolved.
|
||
|
|
||
| In Bitcoin Core, to roll-back the chain state in the event of a block reorganization, the height, coinbase flag, script | ||
| and amount metadata for each spend transaction output of a block are stored in a data structure known colloquially as | ||
| "undo data". This terminology stems from its use to "undo" the effect of a block by repopulating the UTXO set with the | ||
| coins that were spent by the reorganized block. To remain general in language, this data will be referred as "spent | ||
| coins." | ||
|
|
||
| Bitcoin Core full archival nodes store spent coins for all blocks. This is useful in the context of SwiftSync, as no | ||
| additional index must be created or maintained to serve this data to peers. There are, however, some discrepancies | ||
| between how this data is serialized on disk in Bitcoin Core and how this proposal seeks to serialize this data over the | ||
| peer-to-peer protocol, which are detailed in the rationale section. | ||
|
|
||
| This section defines how to request and serve block spent coins over the peer-to-peer protocol, as well as signaling | ||
| support of this feature to peers. | ||
|
|
||
| ### Definitions | ||
|
|
||
| - `[]byte`: arbitrary sequence of bytes with no fixed length | ||
| - `<N bytes>`: byte vector of size N, where N is specified inline. N is fixed length and known at compile time (e.g. | ||
| \<32 bytes>) | ||
| - `vector<Foo>`: vector of arbitrary length of elements of type Foo | ||
| - `CompactSize`: encoding of unsigned integers defined in peer-to-peer messages, as defined in the Function Appendix | ||
| section | ||
| - `CompressAmount`: compression function for integer amounts, as defined in the Function Appendix section | ||
|
|
||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and | ||
| "OPTIONAL" in this document are to be interpreted as described in RFC 2119. | ||
|
|
||
| ### Data structures | ||
|
|
||
| #### Height Code | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is it called "Height Code" when it encodes both height and coinbase flag? |
||
|
|
||
| When validating a block, a client must confirm coinbase outputs are mature, which is given by the height of the coin. | ||
| The height and coinbase flag are encoded as a 32 bit integer. To encode the height and flag, binary left shift the | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe you could add a footnote why both are stored in one data structure, and/or mention here that even with sacrificing one bit, heights up to 2,147,483,647 can be expressed and 30,000+ years of blocks is plenty planning horizon? :p |
||
| height one bit, treat the coinbase flag as a bit, insert it into the newly opened bit position. To decode the height, | ||
| binary right shift the code. To decode the coinbase flag, mask the first bit position of the header code and interpret | ||
| the bit as a boolean. | ||
|
|
||
| Take, for example, a height with binary encoding `0010 0111`. To encode a coinbase output at this height, one begins | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: that’s not a 32-bit integer. ;) |
||
| with a left shift: `0100 1110`, and places the coinbase flag in the least significant bit: `0100 1111`. | ||
|
|
||
| #### Reconstructable Script Format | ||
|
|
||
| Scripts are serialized in this format by concatenating the `Prefix` and `Format` fields specified below. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This table could use a bit more context. E.g., you could mention:
|
||
|
|
||
| | Prefix | Script | Format | Expansion | | ||
| | :----- | :------- | :------------------------------------ | :----------------------------------------------------------- | | ||
| | `0x00` | Unknown | `CompactSize(Len([]bytes)) + []bytes` | `[]bytes` | | ||
| | `0x01` | `P2PKH` | `<20 bytes>` | `OP_DUP OP_HASH160 20 <20 bytes> OP_EQUALVERIFY OP_CHECKSIG` | | ||
| | `0x02` | `P2PK` | `<32-byte public key (0x02 parity)>` | `33 0x02 <32 byte public key> OP_CHECKSIG` | | ||
| | `0x03` | `P2PK` | `<32-byte public key (0x03 parity)>` | `33 0x03 <32 byte public key> OP_CHECKSIG` | | ||
| | `0x04` | `P2PK` | `<64 byte public key>` | `65 0x04 <64 byte public key> OP_CHECKSIG` | | ||
|
Comment on lines
+79
to
+81
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given that they are pretty uncommon these days, and it only saves two bytes, have you considered grouping the P2PK outputs with the other bare scripts and just encoding these per the |
||
| | `0x05` | `P2SH` | `<20 bytes>` | `OP_HASH160 20 <20 bytes> OP_EQUAL` | | ||
| | `0x06` | `P2WSH` | `<32 bytes>` | `OP_0 32 <32 bytes>` | | ||
| | `0x07` | `P2WPKH` | `<20 bytes>` | `OP_0 20 <20 bytes>` | | ||
| | `0x08` | `P2TR` | `<32-byte X-only public key>` | `OP_1 32 <32 bytes>` | | ||
|
|
||
| #### Amount Format | ||
|
|
||
| The 64 bit unsigned integers representing amounts are compressed by first using the `CompressAmount` function defined | ||
| below, and serializing the result with `CompactSize`. | ||
|
|
||
| #### Coin | ||
|
|
||
| | Field | Type | Serialization | Description | | ||
| | :----------------------- | :---------------------------- | :---------------------------------- | :------------------------------------------------ | | ||
| | Input index | 32-bit unsigned integer | Little endian | The index in the block inputs, coinbase excluded. | | ||
| | Height and coinbase flag | Height code | Defined above | — | | ||
| | Script | Reconstructable script format | Defined above | — | | ||
| | Amount | 64-bit unsigned integer | `CompressAmount` then `CompactSize` | Satoshi-denominated value. | | ||
|
|
||
| ### Messages | ||
|
|
||
| #### MSG_GET_SPENT_COINS | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the idea that a peer would issue this request for every block in the chain? If we assume mainnet at height, and a 150 ms round trip time, then a peer would spend nearly 80 hours just downloading this undo data. You may want to consider a batched variant, similar to the way messages like
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've found that bandwidth throughput is the limiting factor when downloading blocks in parallel. Not all spent coins have to be downloaded if a client keeps a cache, as this document describes. In the batched variant, the cache is not possible and the bandwidth requirement increases significantly. |
||
|
|
||
| `MSG_GET_SPENT_COINS` defines a request for the inputs of a block. | ||
|
|
||
| Define `cmdString` as `getbspent`. Define BIP-324 message type as ???. | ||
|
|
||
| | Field | Type | Description | | ||
| | :---------- | :---------------------- | :------------------------------------------------------------------- | | ||
| | `blockhash` | `<32 bytes>` | Hash of the block for which inputs are requested. | | ||
| | `cutoff` | 32-bit unsigned integer | If greater than zero, include only coins created before this height. | | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could see coins created after a specific height (due to having already processed all blocks up to that height), but why only coins created before a height? Perhaps add an explanation what the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| Rationale of the `cutoff` field is detailed in the rationale section below. | ||
|
|
||
| #### MSG_SPENT_COINS | ||
|
|
||
| `MSG_SPENT_COINS` defines the data structure for inputs of a block. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Probably not really y'all's intended use case, but if you optionally make it possible to include merkle proofs for the set of coins, then this message can be used to obtain a proof that an output was spent in a given block.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would actually also be useful for BIP 157+158 peers, as the final version that shipped includes the script spent (instead of the outpoint), which means that if you're using the filters to find a block where a given script has been spent, you need to make some assumptions about what the prev script is for a given transaction.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The most recent response on this mailing list post mentions commitment to the UTXO set as part of the block header. There are additional ways to do this outside of a soft fork as well, i.e. utreexo proofs. For now I think it best to leave this unspecified in this version of the message while the community shares ideas, but I do think this is interesting. |
||
|
|
||
| Define `cmdString` as `bspent`. Define BIP-324 message type as ???. | ||
|
|
||
| | Field | Type | Description | | ||
| | :---------- | :------------------------------- | :------------------------------------------------ | | ||
| | `blockhash` | `<32 bytes>` | Block hash these coins are spent from. | | ||
| | `len` | `CompactSize(Len(vector<Coin>))` | Length of the coins vector. | | ||
| | `coins` | `vector<Coin>` | Coins spent, after filtering on request `cutoff`. | | ||
|
|
||
| A client supporting the `bspent` MUST include coins created _before_ the `cutoff` field in `getbspent` requests. A | ||
| client receiving a `bspent` message with un-requested or missing coins MUST disconnect from the serving peer. | ||
|
|
||
| ## Signaling | ||
|
|
||
| Support for serving historical block spent coins is advertised by a feature message, introduced by | ||
| [BIP-434](https://github.com/bitcoin/bips/blob/master/bip-0434.md). | ||
|
|
||
| | featureid | featuredata | | ||
| | :------------------ | :---------- | | ||
| | `blockspentcoinsv1` | `0x00` | | ||
|
rustaceanrob marked this conversation as resolved.
|
||
|
|
||
| A client advertising this feature SHOULD respond to `getbspent` messages, subject to rate-limiting and bandwidth | ||
| limiting. | ||
|
|
||
| ## Rationale | ||
|
|
||
| The lifetime, or interval between creation and spending height, of the coins on the Bitcoin blockchain demonstrate an | ||
| empirical phenomena that the majority of coins are spent within 100 blocks. In fact, approximately 41 percent of coins | ||
| are spent within 10 blocks at the time of writing[^1]. Clients may leverage this to reduce the bandwidth required to | ||
| fetch spent coins by using an in-memory cache. For example, a client may store coins that were created in a 5 block | ||
| window, and request only coins that are older than this height via the `cutoff` filter. This results in a significant | ||
| bandwidth reduction at the cost of a cache that can be set dynamically by the client depending on available memory. | ||
|
Comment on lines
+145
to
+150
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This cutoff cache optimization seems to nudge implementors back to sequentially processing blocks with the added burden of requesting extra data over the wire. Also with the current messages I still need to get the data for the block (even if there's only one unspent cache miss?), right? At Is this understanding correct?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Caching requires sequential processing, but you can have multiple sequential threads in parallel.
You're going to have to request the undo data regardless for non-assumevalid SwiftSync - it is not related to caching.
Caching does not prevent the need for requesting undo data. You can safely assume pretty much every block has cache misses. No cache missses is equivalent to having the full UTXO set (and impossible with multiple sequential threads), which defeats the point.
I have no strong opinion on batching, but round-trip latency won't add up sequentially if requests are sent out in parallel. Concrete example: Let's say you're starting another sequential thread from block height 1001 and you intend to cache the last 5 blocks worth of outputs. For the first block you'd request the full undo data. For block 1002 until 1005 you'd request everything created until block height 1000. From height 1006 onwards your 5-block window starts to shift so you'd request everything created until block height 1001, and so on. All this data can be requested in parallel. As long as your caching strategy is not based on what you witnessed during the previous block, at no point do you have to wait for one block to finish processing before requesting the data for upcoming blocks.
Comment on lines
+149
to
+150
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah cool. I was missing this context above when
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a short note when introducing the request message that the |
||
|
|
||
| Beyond the use of a dynamic coin height filter, there are additional reasons to not simply read the spent coins from | ||
| disk and send it over the wire. Legacy fields (`nVersion`) are set to `0x00` when writing and reading this data to | ||
| maintain compatibility of disk format with old clients. Furthermore, using the amount compression specified above, an | ||
| 11gb reduction in bandwidth is achieved. The application of `VARINT` as opposed to `CompactSize` offers a further | ||
| reduction of 4gb, however the `VARINT` primitive is currently a Bitcoin Core implementation detail. Reusing existing | ||
|
Comment on lines
+155
to
+156
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the confusing terminology in regard to |
||
| network primitives results in the majority of savings, so this specification opts to lower implementation burden for | ||
| clients. With respect to reconstructable script, utilizing this format results in a savings of around 12gb. The scheme | ||
| is loss-less, and may be upgradable by appending script variants. For reference, the naive encoding of block spent coins | ||
| is 118gb as of block 930,000[^1][^2][^3]. | ||
|
|
||
| ## Function Appendix | ||
|
|
||
| Bitcoin Core utilizes a technique to remove trailing zeros from the representation of amounts. This technique offers a | ||
| significant size reduction in amount serialization. These functions are duplicated from the | ||
| [test framework](https://github.com/bitcoin/bitcoin/blob/master/test/functional/test_framework/compressor.py). | ||
|
|
||
| ### Compress Amount | ||
|
|
||
| ```python | ||
| def compress_amount(n): | ||
| if n == 0: | ||
| return 0 | ||
| e = 0 | ||
| while ((n % 10) == 0) and (e < 9): | ||
| n //= 10 | ||
| e += 1 | ||
| if e < 9: | ||
| d = n % 10 | ||
| assert (d >= 1 and d <= 9) | ||
| n //= 10 | ||
| return 1 + (n*9 + d - 1)*10 + e | ||
| else: | ||
| return 1 + (n - 1)*10 + 9 | ||
| ``` | ||
|
|
||
| ## Decompress Amount | ||
|
|
||
| ```python | ||
| def decompress_amount(x): | ||
| if x == 0: | ||
| return 0 | ||
| x -= 1 | ||
| e = x % 10 | ||
| x //= 10 | ||
| n = 0 | ||
| if e < 9: | ||
| d = (x % 9) + 1 | ||
| x //= 9 | ||
| n = x * 10 + d | ||
| else: | ||
| n = x + 1 | ||
| while e > 0: | ||
| n *= 10 | ||
| e -= 1 | ||
| return n | ||
| ``` | ||
|
|
||
| `CompactSize` is commonly used to represent the size of collections in peer-to-peer messages. | ||
|
|
||
| ## Encode compact size | ||
|
|
||
| ```python | ||
| def encode_compactsize(n): | ||
| if n < 0xfd: | ||
| return bytes([n]) | ||
| elif n <= 0xffff: | ||
| return b"\xfd" + n.to_bytes(2, "little") | ||
| elif n <= 0xffffffff: | ||
| return b"\xfe" + n.to_bytes(4, "little") | ||
| else: | ||
| return b"\xff" + n.to_bytes(8, "little") | ||
| ``` | ||
|
|
||
| ## Decode compact size | ||
|
|
||
| ```python | ||
| def decode_compactsize(b): | ||
| prefix = b[0] | ||
| if prefix < 0xfd: | ||
| return prefix | ||
| elif prefix == 0xfd: | ||
| return int.from_bytes(b[1:3], "little") | ||
| elif prefix == 0xfe: | ||
| return int.from_bytes(b[1:5], "little") | ||
| else: | ||
| return int.from_bytes(b[1:9], "little") | ||
| ``` | ||
|
|
||
| ## Compatibility | ||
|
|
||
| Clients seeking to perform fully-validating SwiftSync require peers that serve undo data. Serving data requires no | ||
| additional index and may be enabled via advertising the `feature` message. | ||
|
|
||
| ## Reference Implementation and Test Vectors | ||
|
|
||
| ### Reference Implementation | ||
|
|
||
| - [Bitcoin Core](https://github.com/rustaceanrob/bitcoin/tree/bip-block-undo) | ||
|
|
||
| ### Test Vectors | ||
|
|
||
| - [Reconstructable script](test_vectors/block_undo/reconstructable_script.json) | ||
| - [Compressed Amount](test_vectors/block_undo/compressed_amount.json) | ||
|
|
||
| In order: | ||
| `P2PKH, P2SH, P2TR, P2WPKH, P2WSH, P2PK (odd), P2PK (even), P2PK (uncompressed), OP_RETURN (unspendable/unknown)` | ||
|
|
||
| ## Copyright | ||
|
|
||
| This BIP is licensed under the 3-clause BSD license. | ||
|
|
||
| ## Footnotes | ||
|
|
||
| [^1]: Relevant statistics may be generated via binaries in the | ||
| [`swiftsync-research`](https://github.com/rustaceanrob/swiftsync-research) repository | ||
| [^2]: Reconstructable scripts are borrowed from [UTREEXO](https://github.com/bitcoin/bips/pull/1923) which is subsequently | ||
| borrowed from Cory Field's UHS proposal | ||
| [^3]: Astute readers may notice uncompressed public keys may be compressed before they are sent and | ||
| decompressed by the receiving client. Although this would slightly reduce bandwidth, it would increase the complexity of | ||
| client code, as a `secp256k1` context would be required to decode the message, which is not currently a requirement. As | ||
| of height 936,212 the number of uncompressed public keys spent in blocks is 853,515. This represents a very modest | ||
| savings in bandwidth, around 30MB. As such, this technique is omitted for implementation simplicity. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| [ | ||
| [0, "0x0"], | ||
| [1, "0x1"], | ||
| [1000000, "0x7"], | ||
| [100000000, "0x9"], | ||
| [5000000000, "0x32"], | ||
| [2100000000000000, "0x1406f40"] | ||
| ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| [ | ||
| ["76a9142365e46227cc171083ea275f45ea8646c61d1fbb88ac", "012365e46227cc171083ea275f45ea8646c61d1fbb"], | ||
| ["a914b472a266d0bd89c13706a4132ccfb16f7c3b9fcb87", "05b472a266d0bd89c13706a4132ccfb16f7c3b9fcb"], | ||
| ["5120720b1ffb2c63684973c5e9898b188c9d367fa2bc1ce76b8ea02872b5e3ffe705", "08720b1ffb2c63684973c5e9898b188c9d367fa2bc1ce76b8ea02872b5e3ffe705"], | ||
| ["00146262b97a514ea54d12f51e0a4fe4c09fb74ff7bd", "076262b97a514ea54d12f51e0a4fe4c09fb74ff7bd"], | ||
| ["00200000000000000000000000000000000000000000000000000000000000000000", "060000000000000000000000000000000000000000000000000000000000000000"], | ||
| ["210334ed84e3c579d5ff9122fb4215210ec5aaad51c3f60bf971d939db1c5b56a9fbac", "0334ed84e3c579d5ff9122fb4215210ec5aaad51c3f60bf971d939db1c5b56a9fb"], | ||
| ["210299745a46d9f42b4f578e32d5582120a4688b4224f7e20081f781efc198d11edeac", "0299745a46d9f42b4f578e32d5582120a4688b4224f7e20081f781efc198d11ede"], | ||
| ["410441a5367189b64cc1601c2a708556e37ade94ec808be746e45e35d86d2ee0cb9cd3b2e65ee51baf285cda78589605c3a59ba0492d577349ad3f0afaac862aa59eac", "0441a5367189b64cc1601c2a708556e37ade94ec808be746e45e35d86d2ee0cb9cd3b2e65ee51baf285cda78589605c3a59ba0492d577349ad3f0afaac862aa59e"], | ||
| ["6a", "00016a"] | ||
| ] |
Uh oh!
There was an error while loading. Please reload this page.