docs: add recommended metadata encodings for Partial Messages extension#13
Open
adarsh-7-satyam wants to merge 1 commit into
Open
Conversation
Signed-off-by: Adarsh Satyam <adarsh5.satyam@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The Partial Messages extension for Gossipsub 1.4 introduces the
partsMetadatafield as a mechanism for peers to communicate which parts of a large message they hold. However, the currentpartial-messages.mdspecification deliberately leaves the encoding of this field as "application defined", providing no concrete guidance on what encoding formats to use. While this flexibility is intentional, it creates a practical problem: when two independent implementations likepy-libp2pandnim-libp2pattempt to exchange partial messages, they must agree on howpartsMetadatais encoded, but the specification gives them no shared vocabulary to do so.Problem
Without standard encoding recommendations, every application and implementation team invents their own encoding independently. This leads to fragmentation across the ecosystem, makes interoperability testing between implementations significantly harder, and forces every new implementer to rediscover the same tradeoffs from scratch. The specification currently mentions bitmaps, ranges, and bloom filters only as passing examples in a single sentence, without any technical detail on how to implement or choose between them.
Changes Made
Added a new document
pubsub/gossipsub/metadata-encodings.mdthat provides detailed technical documentation for three recommended standard encoding approaches forpartsMetadata:Bitmask Encoding — Represents part availability as a fixed-length bit array where each bit position maps to a part index. Includes a concrete 8-part worked example in binary and hex, wire format specification using big-endian byte packing, and tradeoff analysis highlighting its suitability for Ethereum DAS columns with 32-64 parts.
Range-based Encoding — Represents contiguous blocks of held parts as start/length pairs encoded as unsigned varints. Includes a concrete 100-part worked example, wire format using the multiformats unsigned varint spec, and tradeoff analysis noting its efficiency for streaming applications where data arrives in ordered chunks.
Bloom Filter Encoding — A probabilistic data structure for representing very large sparse part sets. Includes a concrete worked example with m=8 bits and k=2 hash functions, tradeoff analysis covering false positive rates and the impossibility of false negatives, and guidance on when the size savings justify the probabilistic accuracy tradeoff.
The document also includes an Encoding Selection Guide table that maps message part count, distribution pattern, and accuracy requirements to the recommended encoding, and an Interoperability Considerations section explaining that implementations MUST treat
partsMetadataas opaque bytes at the Gossipsub layer and that topic-level specifications SHOULD document their chosen encoding to ensure cross-implementation compatibility.How It Solves The Problem
By providing concrete technical documentation, worked examples, wire format specifications, and a selection guide, this document gives implementers a shared reference point for encoding decisions. Teams building
py-libp2pandnim-libp2pimplementations can now refer to the same document to ensure their encoding choices are compatible without requiring out-of-band coordination for common use cases.Impact
py-libp2pandnim-libp2pas outlined in the DMP 2026 projectRelated Issue
Closes #12