-
Notifications
You must be signed in to change notification settings - Fork 149
feat(unixfs): configurable CID Profiles from IPIP-499 #1088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
add configurable size estimation modes for determining when to switch between BasicDirectory and HAMTDirectory: - SizeEstimationLinks: legacy mode using len(name) + len(CID), default - SizeEstimationBlock: full serialized dag-pb block size (accurate) - SizeEstimationDisabled: link-count only via MaxLinks, ignores size includes: - HAMTSizeEstimation global for default mode - WithSizeEstimationMode option for per-directory override - helper functions for accurate protobuf size calculation part of IPIP-499 UnixFS CID Profiles implementation.
introduces UnixFSProfile struct with predefined profiles: - UnixFS_v0_2015: legacy CIDv0 settings (256 KiB chunks, 174 links/node) - UnixFS_v1_2025: modern CIDv1 settings (1 MiB chunks, 1024 links/node) profiles control file chunking, DAG width, and HAMT sharding parameters. ApplyGlobals() sets all relevant global variables at once. part of IPIP-499 implementation.
Codecov Report❌ Patch coverage is
@@ Coverage Diff @@
## main #1088 +/- ##
==========================================
+ Coverage 61.11% 61.22% +0.11%
==========================================
Files 264 265 +1
Lines 26217 26340 +123
==========================================
+ Hits 16022 16127 +105
- Misses 8520 8526 +6
- Partials 1675 1687 +12
... and 9 files with indirect coverage changes 🚀 New features to boost your workflow:
|
add SerialFileOptions and NewSerialFileWithOptions to control whether symlinks are preserved as UnixFS symlink nodes (Data.Type=4) or dereferenced and replaced with their target content during file traversal.
Link.Size is already uint64, so the explicit conversions are redundant and flagged by golangci-lint unconvert check.
HAMT sharding threshold comparison was historically implemented as `>` in JS and `>=` in Go: - JS: https://github.com/ipfs/helia/blob/005c2a7/packages/unixfs/src/commands/utils/is-over-shard-threshold.ts#L31 - Go: https://github.com/ipfs/boxo/blob/319662c/ipld/unixfs/io/directory.go#L438 This inconsistency meant a directory exactly at the 256 KiB threshold would stay basic in JS but convert to HAMT in Go, producing different CIDs for the same input. This commit changes Go to use `>` (matching JS), so a directory exactly at the threshold now stays as a basic (flat) directory. This aligns cross-implementation behavior for CID determinism per IPIP-499. Also adds SizeEstimationMode to MkdirOpts so MFS directories respect the configured estimation mode instead of always using the global default.
b844fc9 to
6707376
Compare
- fix trailing newline in directory_test.go - add #1088 PR references to changelog entries
- files: fix nil filter check in serialFile.Size() - unixfs/io: document thread-safety for global vars and ApplyGlobals - changelog: move DefaultBlockSize to Changed section with breaking marker
| // Thread safety: this function modifies global variables and is not safe | ||
| // for concurrent use. Call it once during program initialization, before | ||
| // starting any imports. Do not call from multiple goroutines. | ||
| func (p UnixFSProfile) ApplyGlobals() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ℹ️ boxo already had globals (DefaultBlockSize, DefaultLinksPerBlock etc)
this is a very surgical way of having predefined UnixFS profiles in boxo itself that users can apply programmatically at startup of their app.
i dont like it tbh, but others in golang that we already use (like certmagic, or even net.DefaultResolver) have similar way of managing global defaults, so maybe its just me not being a fan of globals.
this is a compromise which delivers ability to set-and-forget profile on startup, but implemented in smallest amount of code that avoids breaking every user of existing APIs
gammazero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a circular symlink test to show that it works when not dereferencing symlinks, otherwise it fails.
Adds building blocks for reproducible CID generation across IPFS implementations, based on IPIP-499
Users of boxo can now
SizeEstimationLinks), accurate block-based (SizeEstimationBlock), or disable size-based thresholds entirely(
SizeEstimationDisabled)UnixFS_v0_2015orUnixFS_v1_2025profiles from IPIP-499 viaApplyGlobals()andCidBuilder()SerialFileOptions.DereferenceSymlinks>=to>) – 6707376Related: IPIP-499, used by kubo#11148