Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
273 changes: 273 additions & 0 deletions bip-xxxx-block-undo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
```
BIP: ?
Layer: Peer Services
Title: Peer sharing of block spent coins
Comment thread
rustaceanrob marked this conversation as resolved.
Authors: Robert Netzke <rob@2140.dev>
Deputies: Ruben Somsen <ruben@2140.dev>
Status: Draft
Type: Specification
Assigned: ?
License: BSD-3-Clause
Comment thread
rustaceanrob marked this conversation as resolved.
Discussion: https://groups.google.com/g/bitcoindev/c/FpSWUxItXQs/m/pnfjP6rFCgAJ
```

## Abstract

Inputs of a Bitcoin block are referenced by the outpoint data structure. This poses a limitation during initial block
download (IBD), such that a client must process blocks sequentially to validate the chain history. The SwiftSync
protocol allows blocks to be evaluated in arbitrary order, however additional data is required that must be served over
the peer-to-peer network. This document describes how to share this data over the peer to peer network.

## Motivation

A current limitation of IBD is that it must be done sequentially. This is a result of the height, coinbase flag, input

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A current limitation of IBD is that it must be done sequentially.

Given the postulation or existence of alternative syncing models, that feels a bit loaded. Maybe mention that this specifically refers to Bitcoin Core or alternatively consider something along the lines of: "The common approach to IBD is to process blocks sequentially as that ensures the existence of TXO details when input validation requires them to be available."

This is a result of the height, coinbase flag, input script, and amount of the block inputs being omitted from the data committed to by proof of work in the current block

This is jumping several steps from the prior statement at once. Maybe you could segue that a bit more, e.g., by mentioning that fields you introduce are TXO details, before going into them being only implicitly or not at all committed to by transaction inputs, before explaining how that makes it impossible to verify what is provided by a peer.

script, and amount of the block inputs being omitted from the data committed to by proof of work in the current block,
and, thus, this data cannot be trusted if received over the wire naively. Using the SwiftSync protocol, a client is able
to verify the correctness of this data, even if served by a potentially untrusted party. This allows a significant
improvement in IBD performance, as block downloads may be done in parallel.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit imprecise: block download is always done in parallel, it’s just validation that is sequential. Do you mean that block validation can be parallelized?


## Specification
Comment thread
rustaceanrob marked this conversation as resolved.

In Bitcoin Core, to roll-back the chain state in the event of a block reorganization, the height, coinbase flag, script
and amount metadata for each spend transaction output of a block are stored in a data structure known colloquially as
"undo data". This terminology stems from its use to "undo" the effect of a block by repopulating the UTXO set with the
coins that were spent by the reorganized block. To remain general in language, this data will be referred as "spent
coins."

Bitcoin Core full archival nodes store spent coins for all blocks. This is useful in the context of SwiftSync, as no
additional index must be created or maintained to serve this data to peers. There are, however, some discrepancies
between how this data is serialized on disk in Bitcoin Core and how this proposal seeks to serialize this data over the
peer-to-peer protocol, which are detailed in the rationale section.

This section defines how to request and serve block spent coins over the peer-to-peer protocol, as well as signaling
support of this feature to peers.

### Definitions

- `[]byte`: arbitrary sequence of bytes with no fixed length
- `<N bytes>`: byte vector of size N, where N is specified inline. N is fixed length and known at compile time (e.g.
\<32 bytes>)
- `vector<Foo>`: vector of arbitrary length of elements of type Foo
- `CompactSize`: encoding of unsigned integers defined in peer-to-peer messages, as defined in the Function Appendix
section
- `CompressAmount`: compression function for integer amounts, as defined in the Function Appendix section

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in RFC 2119.

### Data structures

#### Height Code

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called "Height Code" when it encodes both height and coinbase flag?


When validating a block, a client must confirm coinbase outputs are mature, which is given by the height of the coin.
The height and coinbase flag are encoded as a 32 bit integer. To encode the height and flag, binary left shift the

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could add a footnote why both are stored in one data structure, and/or mention here that even with sacrificing one bit, heights up to 2,147,483,647 can be expressed and 30,000+ years of blocks is plenty planning horizon? :p

height one bit, treat the coinbase flag as a bit, insert it into the newly opened bit position. To decode the height,
binary right shift the code. To decode the coinbase flag, mask the first bit position of the header code and interpret
the bit as a boolean.

Take, for example, a height with binary encoding `0010 0111`. To encode a coinbase output at this height, one begins

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: that’s not a 32-bit integer. ;)

with a left shift: `0100 1110`, and places the coinbase flag in the least significant bit: `0100 1111`.

#### Reconstructable Script Format

Scripts are serialized in this format by concatenating the `Prefix` and `Format` fields specified below.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table could use a bit more context. E.g., you could mention:

  • What the columns refer to
  • that the table introduces a serialization that allows compressing standard output types with addresses and P2PK
  • Bare scripts and future output types are not compressed


| Prefix | Script | Format | Expansion |
| :----- | :------- | :------------------------------------ | :----------------------------------------------------------- |
| `0x00` | Unknown | `CompactSize(Len([]bytes)) + []bytes` | `[]bytes` |
| `0x01` | `P2PKH` | `<20 bytes>` | `OP_DUP OP_HASH160 20 <20 bytes> OP_EQUALVERIFY OP_CHECKSIG` |
| `0x02` | `P2PK` | `<32-byte public key (0x02 parity)>` | `33 0x02 <32 byte public key> OP_CHECKSIG` |
| `0x03` | `P2PK` | `<32-byte public key (0x03 parity)>` | `33 0x03 <32 byte public key> OP_CHECKSIG` |
| `0x04` | `P2PK` | `<64 byte public key>` | `65 0x04 <64 byte public key> OP_CHECKSIG` |
Comment on lines +79 to +81

@murchandamus murchandamus Jun 27, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that they are pretty uncommon these days, and it only saves two bytes, have you considered grouping the P2PK outputs with the other bare scripts and just encoding these per the 0x00 prefix?

| `0x05` | `P2SH` | `<20 bytes>` | `OP_HASH160 20 <20 bytes> OP_EQUAL` |
| `0x06` | `P2WSH` | `<32 bytes>` | `OP_0 32 <32 bytes>` |
| `0x07` | `P2WPKH` | `<20 bytes>` | `OP_0 20 <20 bytes>` |
| `0x08` | `P2TR` | `<32-byte X-only public key>` | `OP_1 32 <32 bytes>` |

#### Amount Format

The 64 bit unsigned integers representing amounts are compressed by first using the `CompressAmount` function defined
below, and serializing the result with `CompactSize`.

#### Coin

| Field | Type | Serialization | Description |
| :----------------------- | :---------------------------- | :---------------------------------- | :------------------------------------------------ |
| Input index | 32-bit unsigned integer | Little endian | The index in the block inputs, coinbase excluded. |
| Height and coinbase flag | Height code | Defined above | — |
| Script | Reconstructable script format | Defined above | — |
| Amount | 64-bit unsigned integer | `CompressAmount` then `CompactSize` | Satoshi-denominated value. |

### Messages

#### MSG_GET_SPENT_COINS

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that a peer would issue this request for every block in the chain? If we assume mainnet at height, and a 150 ms round trip time, then a peer would spend nearly 80 hours just downloading this undo data.

You may want to consider a batched variant, similar to the way messages like getheaders works.

@rustaceanrob rustaceanrob Jun 22, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've found that bandwidth throughput is the limiting factor when downloading blocks in parallel. Not all spent coins have to be downloaded if a client keeps a cache, as this document describes. In the batched variant, the cache is not possible and the bandwidth requirement increases significantly.


`MSG_GET_SPENT_COINS` defines a request for the inputs of a block.

Define `cmdString` as `getbspent`. Define BIP-324 message type as ???.

| Field | Type | Description |
| :---------- | :---------------------- | :------------------------------------------------------------------- |
| `blockhash` | `<32 bytes>` | Hash of the block for which inputs are requested. |
| `cutoff` | 32-bit unsigned integer | If greater than zero, include only coins created before this height. |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see coins created after a specific height (due to having already processed all blocks up to that height), but why only coins created before a height? Perhaps add an explanation what the cutoff is useful for?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Rationale of the `cutoff` field is detailed in the rationale section below.

#### MSG_SPENT_COINS

`MSG_SPENT_COINS` defines the data structure for inputs of a block.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not really y'all's intended use case, but if you optionally make it possible to include merkle proofs for the set of coins, then this message can be used to obtain a proof that an output was spent in a given block.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would actually also be useful for BIP 157+158 peers, as the final version that shipped includes the script spent (instead of the outpoint), which means that if you're using the filters to find a block where a given script has been spent, you need to make some assumptions about what the prev script is for a given transaction.

@rustaceanrob rustaceanrob Jun 23, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most recent response on this mailing list post mentions commitment to the UTXO set as part of the block header. There are additional ways to do this outside of a soft fork as well, i.e. utreexo proofs. For now I think it best to leave this unspecified in this version of the message while the community shares ideas, but I do think this is interesting.


Define `cmdString` as `bspent`. Define BIP-324 message type as ???.

| Field | Type | Description |
| :---------- | :------------------------------- | :------------------------------------------------ |
| `blockhash` | `<32 bytes>` | Block hash these coins are spent from. |
| `len` | `CompactSize(Len(vector<Coin>))` | Length of the coins vector. |
| `coins` | `vector<Coin>` | Coins spent, after filtering on request `cutoff`. |

A client supporting the `bspent` MUST include coins created _before_ the `cutoff` field in `getbspent` requests. A
client receiving a `bspent` message with un-requested or missing coins MUST disconnect from the serving peer.

## Signaling

Support for serving historical block spent coins is advertised by a feature message, introduced by
[BIP-434](https://github.com/bitcoin/bips/blob/master/bip-0434.md).

| featureid | featuredata |
| :------------------ | :---------- |
| `blockspentcoinsv1` | `0x00` |
Comment thread
rustaceanrob marked this conversation as resolved.

A client advertising this feature SHOULD respond to `getbspent` messages, subject to rate-limiting and bandwidth
limiting.

## Rationale

The lifetime, or interval between creation and spending height, of the coins on the Bitcoin blockchain demonstrate an
empirical phenomena that the majority of coins are spent within 100 blocks. In fact, approximately 41 percent of coins
are spent within 10 blocks at the time of writing[^1]. Clients may leverage this to reduce the bandwidth required to
fetch spent coins by using an in-memory cache. For example, a client may store coins that were created in a 5 block
window, and request only coins that are older than this height via the `cutoff` filter. This results in a significant
bandwidth reduction at the cost of a cache that can be set dynamically by the client depending on available memory.
Comment on lines +145 to +150

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cutoff cache optimization seems to nudge implementors back to sequentially processing blocks with the added burden of requesting extra data over the wire.

Also with the current messages I still need to get the data for the block (even if there's only one unspent cache miss?), right?
If that's true, wouldn't a parameter for inputs of interest (delta encoded index) help here?

At 150ms RTT * 955233 blocks that's ~39.8 hrs of round-trip latency for uncached requests alone, before counting download time (as Roasbeef noted in an earlier review). It seems to me that the cache mitigates this but at the cost of reintroducing the very sequentiality it aims to eliminate.

Is this understanding correct?

@RubenSomsen RubenSomsen Jun 25, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to nudge implementors back to sequentially processing blocks

Caching requires sequential processing, but you can have multiple sequential threads in parallel.

added burden of requesting extra data over the wire

You're going to have to request the undo data regardless for non-assumevalid SwiftSync - it is not related to caching.

I still need to get the data for the block (even if there's only one unspent cache miss?), right?

It seems to me that the cache mitigates [round-trip latency]

Caching does not prevent the need for requesting undo data. You can safely assume pretty much every block has cache misses. No cache missses is equivalent to having the full UTXO set (and impossible with multiple sequential threads), which defeats the point.

round-trip latency for uncached requests

I have no strong opinion on batching, but round-trip latency won't add up sequentially if requests are sent out in parallel.

Concrete example: Let's say you're starting another sequential thread from block height 1001 and you intend to cache the last 5 blocks worth of outputs. For the first block you'd request the full undo data. For block 1002 until 1005 you'd request everything created until block height 1000. From height 1006 onwards your 5-block window starts to shift so you'd request everything created until block height 1001, and so on.

All this data can be requested in parallel. As long as your caching strategy is not based on what you witnessed during the previous block, at no point do you have to wait for one block to finish processing before requesting the data for upcoming blocks.

Comment on lines +149 to +150

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah cool. I was missing this context above when cutoff was introduced.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a short note when introducing the request message that the cutoff field is motivated in the rationale section.


Beyond the use of a dynamic coin height filter, there are additional reasons to not simply read the spent coins from
disk and send it over the wire. Legacy fields (`nVersion`) are set to `0x00` when writing and reading this data to
maintain compatibility of disk format with old clients. Furthermore, using the amount compression specified above, an
11gb reduction in bandwidth is achieved. The application of `VARINT` as opposed to `CompactSize` offers a further
reduction of 4gb, however the `VARINT` primitive is currently a Bitcoin Core implementation detail. Reusing existing
Comment on lines +155 to +156

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the confusing terminology in regard to CompactSize and VARINT and Bitcoin Core, you probably want to define these terms more concretely.

network primitives results in the majority of savings, so this specification opts to lower implementation burden for
clients. With respect to reconstructable script, utilizing this format results in a savings of around 12gb. The scheme
is loss-less, and may be upgradable by appending script variants. For reference, the naive encoding of block spent coins
is 118gb as of block 930,000[^1][^2][^3].

## Function Appendix

Bitcoin Core utilizes a technique to remove trailing zeros from the representation of amounts. This technique offers a
significant size reduction in amount serialization. These functions are duplicated from the
[test framework](https://github.com/bitcoin/bitcoin/blob/master/test/functional/test_framework/compressor.py).

### Compress Amount

```python
def compress_amount(n):
if n == 0:
return 0
e = 0
while ((n % 10) == 0) and (e < 9):
n //= 10
e += 1
if e < 9:
d = n % 10
assert (d >= 1 and d <= 9)
n //= 10
return 1 + (n*9 + d - 1)*10 + e
else:
return 1 + (n - 1)*10 + 9
```

## Decompress Amount

```python
def decompress_amount(x):
if x == 0:
return 0
x -= 1
e = x % 10
x //= 10
n = 0
if e < 9:
d = (x % 9) + 1
x //= 9
n = x * 10 + d
else:
n = x + 1
while e > 0:
n *= 10
e -= 1
return n
```

`CompactSize` is commonly used to represent the size of collections in peer-to-peer messages.

## Encode compact size

```python
def encode_compactsize(n):
if n < 0xfd:
return bytes([n])
elif n <= 0xffff:
return b"\xfd" + n.to_bytes(2, "little")
elif n <= 0xffffffff:
return b"\xfe" + n.to_bytes(4, "little")
else:
return b"\xff" + n.to_bytes(8, "little")
```

## Decode compact size

```python
def decode_compactsize(b):
prefix = b[0]
if prefix < 0xfd:
return prefix
elif prefix == 0xfd:
return int.from_bytes(b[1:3], "little")
elif prefix == 0xfe:
return int.from_bytes(b[1:5], "little")
else:
return int.from_bytes(b[1:9], "little")
```

## Compatibility

Clients seeking to perform fully-validating SwiftSync require peers that serve undo data. Serving data requires no
additional index and may be enabled via advertising the `feature` message.

## Reference Implementation and Test Vectors

### Reference Implementation

- [Bitcoin Core](https://github.com/rustaceanrob/bitcoin/tree/bip-block-undo)

### Test Vectors

- [Reconstructable script](test_vectors/block_undo/reconstructable_script.json)
- [Compressed Amount](test_vectors/block_undo/compressed_amount.json)

In order:
`P2PKH, P2SH, P2TR, P2WPKH, P2WSH, P2PK (odd), P2PK (even), P2PK (uncompressed), OP_RETURN (unspendable/unknown)`

## Copyright

This BIP is licensed under the 3-clause BSD license.

## Footnotes

[^1]: Relevant statistics may be generated via binaries in the
[`swiftsync-research`](https://github.com/rustaceanrob/swiftsync-research) repository
[^2]: Reconstructable scripts are borrowed from [UTREEXO](https://github.com/bitcoin/bips/pull/1923) which is subsequently
borrowed from Cory Field's UHS proposal
[^3]: Astute readers may notice uncompressed public keys may be compressed before they are sent and
decompressed by the receiving client. Although this would slightly reduce bandwidth, it would increase the complexity of
client code, as a `secp256k1` context would be required to decode the message, which is not currently a requirement. As
of height 936,212 the number of uncompressed public keys spent in blocks is 853,515. This represents a very modest
savings in bandwidth, around 30MB. As such, this technique is omitted for implementation simplicity.
8 changes: 8 additions & 0 deletions bip-xxxx-block-undo/test_vectors/compressed_amount.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[
[0, "0x0"],
[1, "0x1"],
[1000000, "0x7"],
[100000000, "0x9"],
[5000000000, "0x32"],
[2100000000000000, "0x1406f40"]
]
11 changes: 11 additions & 0 deletions bip-xxxx-block-undo/test_vectors/reconstructable_script.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
[
["76a9142365e46227cc171083ea275f45ea8646c61d1fbb88ac", "012365e46227cc171083ea275f45ea8646c61d1fbb"],
["a914b472a266d0bd89c13706a4132ccfb16f7c3b9fcb87", "05b472a266d0bd89c13706a4132ccfb16f7c3b9fcb"],
["5120720b1ffb2c63684973c5e9898b188c9d367fa2bc1ce76b8ea02872b5e3ffe705", "08720b1ffb2c63684973c5e9898b188c9d367fa2bc1ce76b8ea02872b5e3ffe705"],
["00146262b97a514ea54d12f51e0a4fe4c09fb74ff7bd", "076262b97a514ea54d12f51e0a4fe4c09fb74ff7bd"],
["00200000000000000000000000000000000000000000000000000000000000000000", "060000000000000000000000000000000000000000000000000000000000000000"],
["210334ed84e3c579d5ff9122fb4215210ec5aaad51c3f60bf971d939db1c5b56a9fbac", "0334ed84e3c579d5ff9122fb4215210ec5aaad51c3f60bf971d939db1c5b56a9fb"],
["210299745a46d9f42b4f578e32d5582120a4688b4224f7e20081f781efc198d11edeac", "0299745a46d9f42b4f578e32d5582120a4688b4224f7e20081f781efc198d11ede"],
["410441a5367189b64cc1601c2a708556e37ade94ec808be746e45e35d86d2ee0cb9cd3b2e65ee51baf285cda78589605c3a59ba0492d577349ad3f0afaac862aa59eac", "0441a5367189b64cc1601c2a708556e37ade94ec808be746e45e35d86d2ee0cb9cd3b2e65ee51baf285cda78589605c3a59ba0492d577349ad3f0afaac862aa59e"],
["6a", "00016a"]
]
Loading