Skip to content

gossipsub: experimental draft for large message segmentation extension#2

Open
theUtkarshRaj wants to merge 5 commits into
seetadev:masterfrom
theUtkarshRaj:experimental/large-message-segmentation
Open

gossipsub: experimental draft for large message segmentation extension#2
theUtkarshRaj wants to merge 5 commits into
seetadev:masterfrom
theUtkarshRaj:experimental/large-message-segmentation

Conversation

@theUtkarshRaj
Copy link
Copy Markdown

Summary

This is a draft exploratory spec note for transparent segmentation of large Gossipsub payloads, plus minimal protobuf registry hooks so implementations can experiment in a consistent way.

  • Adds pubsub/gossipsub/extensions/experimental/large-message-segmentation.md (discussion-oriented; non-normative).
  • Extends pubsub/gossipsub/extensions/extensions.proto with an experimental capability flag, an RPC extension field, and LargeMessageSegmentationExtension.

Context

The goal is to sketch how segmentation could sit alongside the existing extension framework and how it differs from Partial Messages (complementary use cases: “no one has the full blob yet” vs “most of the message is already held”). Peer scoring is discussed at the reconstructed message level only; retry/retransmission is explicitly out of scope here.

Notes

Co-authored-by: Cursor <cursoragent@cursor.com>
@theUtkarshRaj theUtkarshRaj force-pushed the experimental/large-message-segmentation branch from 27507cd to f909f55 Compare May 8, 2026 11:48
@theUtkarshRaj
Copy link
Copy Markdown
Author

Pushed a follow-up commit (f909f55) addressing review-readiness gaps in the draft:

  • Security Considerations — covers reassembly buffer exhaustion, segment flooding under forged messageID, last-segment withholding, and inconsistent totalSegments, each with mitigations.
  • Wire format clarificationchecksum is now specified as SHA-256 over messageID || segmentIndex || payload for per-segment integrity before reassembly.
  • Interaction with Existing Gossipsub Mechanics — clarifies that segments are themselves gossipsub messages subject to standard MessageID/IHAVE/IWANT/IDONTWANT semantics, with the extension's messageID identifying only the parent payload.
  • Tentative answers to the two open questions — protocol-generated messageID via SHA-256(publisherPeerID || topic || nonce)[:16], and a 1 MiB max segment size with publisher choice below. Original questions are kept visible for discussion.

Also worth flagging: a py-libp2p reference implementation tracking this draft was opened today at libp2p/py-libp2p#1323 by @shivv23. There are two minor divergences (RPC field number, default segment size) — I've responded over there with my thoughts and we should be able to converge quickly. Cross-implementation activity at this stage is exactly what the experimental-extension lifecycle is meant to surface.

cc @MarcoPolo @cskiraly — would appreciate early signal on whether the experimental-extension framing is the right entry point here, or whether segmentation should target a separate protocol ID. Related to #1.

shivv23 and others added 2 commits May 8, 2026 22:47
- Change largeMessageSegmentation field from 8473921 to 6492435
  in both ControlExtensions and RPC to match py-libp2p (PR libp2p/py-libp2p#1323)
- Rename RPC.largeSegmentation to RPC.largeMessageSegmentation for consistency
- Note py-libp2p's 256 KiB default segment size under Open Question 2
align field number and naming with py-libp2p reference implementation
@theUtkarshRaj
Copy link
Copy Markdown
Author

Quick status update: merged shivv23/specs#align-field-number into the spec branch. The protobuf field number is now 6492435 (matching py-libp2p), the RPC field is consistently named largeMessageSegmentation, and Open Question 2 in the spec text now notes the py-libp2p reference default of 256 KiB segment payload under the 1 MiB ceiling. Spec and reference implementation are now aligned on field number, segment-size policy, and (per shivv23's matching update on libp2p/py-libp2p#1323) messageID derivation.

…tion)

Addresses the implementer-raised gap from py-libp2p#1323. Defines
normative MUSTs/SHOULDs for per-peer caps, per-messageID memory bounds,
timeouts, inconsistency handling, successful reassembly, and eviction.
Promotes existing security mitigations from inferred to normative.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants