feat(provide): +unique and +entities strategy modifiers#11245
Conversation
8d8d18c to
420b111
Compare
- config: ParseProvideStrategy returns error, rejects "all" mixed with selective strategies, removes dead strategy==0 check - config: add MustParseProvideStrategy for pre-validated call sites - config: ValidateProvideConfig validates strategy at startup - config: ShouldProvideForStrategy uses bitmask check for ProvideStrategyAll - core/node: downstream callers use MustParseProvideStrategy - core/node: fix Pinning() nil return that caused fx.Provide panic
420b111 to
4468527
Compare
- ProvideStrategyUnique: bloom filter cross-DAG deduplication - ProvideStrategyEntities: entity-aware traversal (implies Unique) - parser: "unique" and "entities" tokens recognized - validation: modifiers must combine with pinned/mfs, incompatible with all/roots - go.mod: update boxo to feat/provide-entity-roots-with-dedup (VisitedTracker, WalkDAG, WalkEntityRoots, NewConcatProvider, NewUniquePinnedProvider, NewPinnedEntityRootsProvider)
pure rename, no behavior change. prepares for ExecuteFastProvideDAG which will walk the DAG according to Provide.Strategy.
adds ExecuteFastProvideRoot calls to pin add and pin update, matching the behavior of ipfs add and ipfs dag import. respects Import.FastProvideRoot and Import.FastProvideWait config options. previously, pin add/update did not trigger any immediate providing, leaving pinned content invisible to the DHT until the next reprovide cycle (up to 22h).
when Provide.Strategy includes +unique, the reprovide cycle uses a shared BloomTracker across all sub-walks (MFS, recursive pins, direct pins). duplicate sub-DAG branches across recursive pins are detected and skipped, reducing traversal from O(pins * total_blocks) to O(unique_blocks). - readLastUniqueCount / persistUniqueCount: persist bloom sizing count between cycles at /reprovideLastUniqueCount - uniqueMFSProvider: MFS walker with shared tracker + locality check - createKeyProvider restructured: +unique bit checked first, non-unique strategies fall through to existing switch unchanged - per-cycle fresh BloomTracker sized from previous cycle's count - channel wrapper persists count on successful cycle completion
when Provide.Strategy includes +entities (which implies +unique), the reprovide cycle uses WalkEntityRoots instead of WalkDAG, emitting only entity roots (files, directories, HAMT shards) and skipping internal file chunks. - mfsEntityRootsProvider: MFS walk with entity root detection - createKeyProvider: select walker based on +entities flag via function references (makePinProv / makeMFSProv) to avoid duplicating the stream wiring logic - all combinations: pinned+entities, mfs+entities, pinned+mfs+entities
- config.md: document +unique, +entities modifiers with caveats (range request limitation, roots vs entities distinction) - changelog v0.41: add entries for strategy modifiers, pin add/update fast-provide, and hardened strategy parsing
per-block providing during ipfs add is now opt-in via --fast-provide-dag (or Import.FastProvideDAG config, default: false). without it, only the root CID is fast-provided after add, and the reprovide cycle handles the rest. this changes the default for Provide.Strategy=pinned: previously every block was provided during write, now only the root is immediate. use --fast-provide-dag=true to restore the previous behavior. Provide.Strategy=all is unaffected (blockstore hook provides on Put).
pin add and pin update now accept the same --fast-provide-root and --fast-provide-wait CLI flags as ipfs add and ipfs dag import, with the same config fallbacks (Import.FastProvideRoot, Import.FastProvideWait). previously these were config-only with no CLI override.
--fast-provide-dag now available on ipfs add, ipfs dag import, ipfs pin add, and ipfs pin update (matching --fast-provide-root). - ExecuteFastProvideDAG accepts []cid.Cid so multiple roots share one bloom tracker (cross-root dedup for dag import and pin add) - --fast-provide-dag supersedes --fast-provide-root (DAG walk includes the root CID as the first emitted via DFS pre-order) - wait parameter: when true blocks until walk completes, when false runs in background goroutine - Import.FastProvideDAG config option (default: false)
05f8870 to
07d7c66
Compare
- strategy section: clearer trade-offs, suggested configurations, memory comparison with concrete numbers - Import.FastProvideDAG: new config option documentation - Import.FastProvideRoot/Wait: updated to mention pin commands - all three Import.FastProvide* options: consistent "Applies to" lists
…-roots-with-dedup
800a1ef to
a858eb1
Compare
when TEST_DHT_STUB=1, the CLI test harness creates 20 in-process libp2p hosts on loopback, each running a DHT server with a shared in-memory ProviderStore. kubo daemons bootstrap to them over real TCP, exercising the full DHT code path without public internet. tests opt in via h.SetStubBootstrap(nodes) after Init(). on the daemon side, WAN DHT filters (AddressFilter, QueryFilter, RoutingTableFilter, RoutingTablePeerDiversityFilter) are lifted to accept loopback peers when TEST_DHT_STUB is set. depends on: github.com/libp2p/go-libp2p-kad-dht#1241
a858eb1 to
4a47439
Compare
add sweep reprovide tests for all strategies (all, pinned, roots, mfs, pinned+mfs). each test waits for two reprovide cycles to confirm the schedule runs repeatedly. sweep uses short Provide.DHT.Interval and polls provide stat --enc=json. harden negative assertions: - roots: test excludes child blocks of a recursive pin (not just unpinned content), using --only-hash to learn the child CID - mfs: test that pinned content outside MFS is not provided fix: ipfs add --only-hash no longer triggers fast-provide or pinning (was providing CIDs for data that was never stored) rename SetStubBootstrap to BootstrapWithStubDHT with lazy-init (ephemeral peers created on first call, not on harness creation)
…-roots-with-dedup # Conflicts: # docs/changelogs/v0.41.md
d52b242 to
8ae795c
Compare
strategy tests for pinned+mfs+unique and pinned+mfs+entities, covering both provide-at-add-time and reprovide (two cycles). content uses a nested DAG (root/subdir/largefile with 1 MiB chunks) to exercise the walker on multi-level structures. BootstrapWithStubDHT is now self-contained: it always creates 20 ephemeral DHT peers on loopback and sets TEST_DHT_STUB=1 on each node's environment so the daemon lifts WAN DHT filters. no external env var needed. the sweep provider requires >=20 DHT peers to estimate network size (prefix length); without enough peers it stays offline and never provides. TEST_DHT_STUB on the daemon side lifts WAN DHT filters (AddressFilter, QueryFilter, RoutingTableFilter, RoutingTablePeerDiversityFilter) to accept loopback peers. this is set automatically by BootstrapWithStubDHT. other changes: - Provide.DHT.Interval=30s in sweep reprovide tests (was 1m) - uniq() helper for unique CIDs across parallel subtests - ipfs add --only-hash disables fast-provide and pinning
8ae795c to
0243a1c
Compare
…-roots-with-dedup
ipfs add --help: rewrite fast-provide section with clear structure (content discoverability, flag defaults, strategy=all behavior) ipfs routing reprovide: mark as deprecated, note it returns an error with sweep provider, log error with actionable guidance changelog: fix missing --fast-provide-dag flag on pin commands, use "routing system" instead of "DHT" where applicable, link to docs/config.md as source of truth for defaults environment-variables.md: note that BootstrapWithStubDHT sets TEST_DHT_STUB automatically, no external env var needed
the fork (NoopMessageSender, MsgSenderBuilder) is no longer used. the ephemeral peer pool in BootstrapWithStubDHT replaced the NoopMessageSender approach.
log providedCIDs and skippedBranches after each unique reprovide cycle and fast-provide-dag walk. tests verify exact counts with two dir pins sharing a 10 KiB file (5 KiB chunks): fast-provide-dag asserts 5 provided + 1 skipped branch, reprovide asserts 6 provided + 1 skipped branch (includes empty MFS root pin). both assert bloom tracker created and no autoscale. updates boxo to pick up Deduplicated() counter, bloom creation/autoscale logging, and review feedback fixes.
…-roots-with-dedup # Conflicts: # docs/changelogs/v0.41.md # docs/examples/kubo-as-a-library/go.mod # docs/examples/kubo-as-a-library/go.sum # go.mod # go.sum # test/dependencies/go.mod # test/dependencies/go.sum
boxo#1124 landed on master; point to the merge commit instead of the PR branch.
ipfs add --pin --fast-provide-dag wrapped the DAGService with providingDagService, which announced every block as it was written regardless of strategy modifiers. ExecuteFastProvideDAG ran in parallel as the post-add walker. Net effect: - pinned+entities: chunks reached the DHT despite +entities saying they should be skipped (correctness bug) - pinned+unique: every block announced twice; the post-walk bloom only dedups against its own pass - pinned (plain): every block announced twice ExecuteFastProvideDAG already has bloom dedup, entity-roots support, and unbuffered backpressure, so it is now the single mechanism for --fast-provide-dag across ipfs add, dag import, pin add, and pin update. Provide.Strategy=all is untouched: every block is provided at the blockstore level via the blockstore.Provider hook in core/node/storage.go, which is independent of coreapi. The Pinned strategy bit gated providingDagService and the parser rejects combining "all" with other strategies, so "all" never set that bit in the first place. - core/coreapi/unixfs.go: drop the wrap, the providingDagService struct, and the now-unused mh and boxo/provider imports - core/coreiface/options/unixfs.go: drop FastProvideDAG option - core/coreapi/coreapi.go: drop now-dead providingStrategy field - core/commands/add.go: drop the FastProvideDAG option pass-through - test/cli/provider_test.go: regression test using ipfs add --fast-provide-dag with pinned+entities -- fails on the previous code and passes here
Operators tuning +unique or +entities strategies on memory-constrained
or extra-large repos previously had no way to trade bloom filter memory
against false-positive rate -- both the reprovide cycle and
fast-provide-dag walks hardcoded walker.DefaultBloomFPRate.
Provide.BloomFPRate is the target false positive rate (1/N) for the
shared bloom tracker. Has no effect on Provide.Strategy=all or other
strategies that do not walk DAGs through the tracker. Validation
rejects values below 1_000_000 (~1 in 1M); below that the bloom
becomes lossy enough to drop a meaningful fraction of CIDs from each
reprovide cycle.
The single source of truth for the default value is
config.DefaultProvideBloomFPRate; docs reference it descriptively
(~1 in 4.75M, ~4 bytes/CID) so the literal lives in exactly one place.
- config/provide.go: BloomFPRate field, DefaultProvideBloomFPRate
and MinProvideBloomFPRate constants, validation
- config/provide_test.go: round-trip + validation cases
- core/node/provider.go: plumb fpRate through setReproviderKeyProvider
and createKeyProvider
- core/commands/cmdenv/env.go: ExecuteFastProvideDAG takes fpRate
- core/commands/{add,dag/import,pin/pin}.go: resolve from cfg and
pass through to ExecuteFastProvideDAG
- docs/config.md: new Provide.BloomFPRate section after Provide.DHT.*
with memory tradeoff table and minimum-value note
- docs/changelogs/v0.41.md: link to the new option from the +unique/
+entities section
readLastUniqueCount and persistUniqueCount were exercised only indirectly via CLI tests, leaving the 8-byte length check and the "missing key" fallback without direct coverage. - empty datastore returns 0 (no previous cycle) - round trip across the full uint64 range (0, 1, 1k, 1M, 1B, MaxUint64) - overwrite returns the most recent value (matches per-cycle persist) - corrupt length (empty, short, long, single byte) returns 0 instead of panicking
Background fast-provide goroutines were implicitly bound to req.Context, which go-ipfs-cmds cancels on handler exit, so async --fast-provide-dag (and --fast-provide-root parented on context.Background) aborted or outlived the node. Parent both paths off the IpfsNode lifetime context instead. - ExecuteFastProvideRoot: async goroutine now derives from ipfsNode.Context(), so it cancels on daemon shutdown rather than potentially touching a closed DHT client. - ExecuteFastProvideDAG: takes cmdCtx and nodeCtx; wait=true runs inline under cmdCtx (Ctrl+C still cancels the walk), wait=false runs in a goroutine under nodeCtx so the walk survives command exit but still stops on shutdown. - add, dag import, pin add/update: pass node.Context() as the new nodeCtx argument. - changelog: note the behavior change for opt-in strategies.
Adds TestProviderFastProvideDAGAsyncSurvives: ipfs add with --fast-provide-dag=true but no --fast-provide-wait must walk the full DAG in a background goroutine that outlives the command handler, announce every block, and leave chunk CIDs findable by peers via findprovs. A long Provide.DHT.Interval ensures the scheduled reprovide cycle cannot be the source of the chunk announcements.
…-roots-with-dedup # Conflicts: # docs/examples/kubo-as-a-library/go.mod # docs/examples/kubo-as-a-library/go.sum # go.mod # go.sum # test/dependencies/go.mod # test/dependencies/go.sum
68ecb80 to
cdd9103
Compare
|
The headline feature is Along the way, 👉 Default behavior ( Merging now and shipping in 0.41-rc1 for wider testing. |
Summary
Provide.Strategymodifiers (+uniqueand+entities) for nodes with large, overlapping pin sets (e.g. https://collab.ipfscluster.io hosting https://github.com/ipfs/distributions)pin add/pin update, new--fast-provide-dagflagipfs add --only-hashbug fixChanges
+uniqueand+entitiesstrategy modifiersNew opt-in modifiers for
Provide.Strategy:+unique: bloom filter dedup across recursive pins. Shared subtrees traversed once per reprovide cycle instead of once per pin. ~4 bytes/CID memory. LogsprovidedCIDsandskippedBranchesafter each cycle.+entities: announces only entity roots (files, directories, HAMT shards), skipping internal file chunks. Implies+unique.Example:
Provide.Strategy = "pinned+mfs+entities"Default
Provide.Strategy=allis unchanged. Seedocs/config.md#providestrategyfor details.Fast-provide on
pin addandpin updateBoth commands now accept
--fast-provide-root,--fast-provide-dag, and--fast-provide-wait, matchingipfs addandipfs dag import. Root CID is announced immediately after pinning. Seedocs/config.md#importfor defaults.--fast-provide-dagflagNew flag on
ipfs add,ipfs dag import,ipfs pin add,ipfs pin update. Walks and provides the full DAG immediately using the active strategy. No effect withProvide.Strategy=all(blockstore already provides every block on write). Configurable viaImport.FastProvideDAG(default: false).Hardened strategy parsing
Unknown tokens, empty tokens, and invalid combinations now produce clear errors at startup instead of being silently ignored.
ipfs routing reprovidedeprecatedMarked as deprecated. Returns an error with the sweep provider (default). Use
ipfs provide stat -ato monitor reprovide progress.Bug fix:
ipfs add --only-hash--only-hashno longer triggers fast-provide or pinning.Provider strategy test suite
Full test coverage for both legacy and sweep providers across all strategies (
all,pinned,roots,mfs,pinned+mfs,pinned+mfs+unique,pinned+mfs+entities):+uniquededup tests assert exactprovidedCIDsandskippedBranchescounts+entitiestests use nested DAGs with chunked files to verify chunks are skippedrootstests verify child blocks of a pin are excluded;mfstests verify pinned content outside MFS is excludedBootstrapWithStubDHT(nodes)creates ephemeral DHT peers on loopback for the sweep provider (needs >=20 peers to estimate network size)Compatibility
Provide.Strategy=all)+uniqueand+entitiesare opt-in--fast-provide-dagdefaults to falseDepends on
boxo#1124:dag/walker(BloomTracker, WalkEntityRoots, WalkDAG),pinning/dspinner(NewUniquePinnedProvider, NewPinnedEntityRootsProvider)Context
+entitiesstrategy modifier announces only file/directory roots and HAMT shards, the codec-aware DAG strategy this issue requested; already in PR description)+entitiesprovides only a subset of UnixFS nodes by skipping chunk blocks, fulfilling the "provide fewer nodes" request; the separate bitswap walk-up requirement remains tracked in Leverage Content Path Affinity in routing #10251)