Skip to content

feat(provide): +unique and +entities strategy modifiers#11245

Merged
lidel merged 31 commits intomasterfrom
feat/provide-entity-roots-with-dedup
Apr 10, 2026
Merged

feat(provide): +unique and +entities strategy modifiers#11245
lidel merged 31 commits intomasterfrom
feat/provide-entity-roots-with-dedup

Conversation

@lidel
Copy link
Copy Markdown
Member

@lidel lidel commented Mar 20, 2026

Summary

  • Experimental Provide.Strategy modifiers (+unique and +entities) for nodes with large, overlapping pin sets (e.g. https://collab.ipfscluster.io hosting https://github.com/ipfs/distributions)
  • Fast-provide extended to pin add/pin update, new --fast-provide-dag flag
  • Hardened strategy parsing, ipfs add --only-hash bug fix
  • Provider strategy test suite covering both legacy and sweep providers

Changes

+unique and +entities strategy modifiers

New opt-in modifiers for Provide.Strategy:

  • +unique: bloom filter dedup across recursive pins. Shared subtrees traversed once per reprovide cycle instead of once per pin. ~4 bytes/CID memory. Logs providedCIDs and skippedBranches after each cycle.
  • +entities: announces only entity roots (files, directories, HAMT shards), skipping internal file chunks. Implies +unique.

Example: Provide.Strategy = "pinned+mfs+entities"

Default Provide.Strategy=all is unchanged. See docs/config.md#providestrategy for details.

Fast-provide on pin add and pin update

Both commands now accept --fast-provide-root, --fast-provide-dag, and --fast-provide-wait, matching ipfs add and ipfs dag import. Root CID is announced immediately after pinning. See docs/config.md#import for defaults.

--fast-provide-dag flag

New flag on ipfs add, ipfs dag import, ipfs pin add, ipfs pin update. Walks and provides the full DAG immediately using the active strategy. No effect with Provide.Strategy=all (blockstore already provides every block on write). Configurable via Import.FastProvideDAG (default: false).

Hardened strategy parsing

Unknown tokens, empty tokens, and invalid combinations now produce clear errors at startup instead of being silently ignored.

ipfs routing reprovide deprecated

Marked as deprecated. Returns an error with the sweep provider (default). Use ipfs provide stat -a to monitor reprovide progress.

Bug fix: ipfs add --only-hash

--only-hash no longer triggers fast-provide or pinning.

Provider strategy test suite

Full test coverage for both legacy and sweep providers across all strategies (all, pinned, roots, mfs, pinned+mfs, pinned+mfs+unique, pinned+mfs+entities):

  • Provide-at-add-time and reprovide (two cycles) for each strategy
  • +unique dedup tests assert exact providedCIDs and skippedBranches counts
  • +entities tests use nested DAGs with chunked files to verify chunks are skipped
  • roots tests verify child blocks of a pin are excluded; mfs tests verify pinned content outside MFS is excluded
  • BootstrapWithStubDHT(nodes) creates ephemeral DHT peers on loopback for the sweep provider (needs >=20 peers to estimate network size)

Compatibility

  • Default behavior unchanged (Provide.Strategy=all)
  • +unique and +entities are opt-in
  • --fast-provide-dag defaults to false
  • Strategy parsing is stricter: previously-ignored typos will now error at startup

Depends on

  • boxo#1124: dag/walker (BloomTracker, WalkEntityRoots, WalkDAG), pinning/dspinner (NewUniquePinnedProvider, NewPinnedEntityRootsProvider)

Context

- config: ParseProvideStrategy returns error, rejects "all" mixed with
  selective strategies, removes dead strategy==0 check
- config: add MustParseProvideStrategy for pre-validated call sites
- config: ValidateProvideConfig validates strategy at startup
- config: ShouldProvideForStrategy uses bitmask check for ProvideStrategyAll
- core/node: downstream callers use MustParseProvideStrategy
- core/node: fix Pinning() nil return that caused fx.Provide panic
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 420b111 to 4468527 Compare March 24, 2026 00:34
lidel added 8 commits March 24, 2026 01:47
- ProvideStrategyUnique: bloom filter cross-DAG deduplication
- ProvideStrategyEntities: entity-aware traversal (implies Unique)
- parser: "unique" and "entities" tokens recognized
- validation: modifiers must combine with pinned/mfs, incompatible
  with all/roots
- go.mod: update boxo to feat/provide-entity-roots-with-dedup
  (VisitedTracker, WalkDAG, WalkEntityRoots, NewConcatProvider,
  NewUniquePinnedProvider, NewPinnedEntityRootsProvider)
pure rename, no behavior change. prepares for ExecuteFastProvideDAG
which will walk the DAG according to Provide.Strategy.
adds ExecuteFastProvideRoot calls to pin add and pin update,
matching the behavior of ipfs add and ipfs dag import. respects
Import.FastProvideRoot and Import.FastProvideWait config options.

previously, pin add/update did not trigger any immediate providing,
leaving pinned content invisible to the DHT until the next reprovide
cycle (up to 22h).
when Provide.Strategy includes +unique, the reprovide cycle uses a
shared BloomTracker across all sub-walks (MFS, recursive pins, direct
pins). duplicate sub-DAG branches across recursive pins are detected
and skipped, reducing traversal from O(pins * total_blocks) to
O(unique_blocks).

- readLastUniqueCount / persistUniqueCount: persist bloom sizing count
  between cycles at /reprovideLastUniqueCount
- uniqueMFSProvider: MFS walker with shared tracker + locality check
- createKeyProvider restructured: +unique bit checked first, non-unique
  strategies fall through to existing switch unchanged
- per-cycle fresh BloomTracker sized from previous cycle's count
- channel wrapper persists count on successful cycle completion
when Provide.Strategy includes +entities (which implies +unique), the
reprovide cycle uses WalkEntityRoots instead of WalkDAG, emitting only
entity roots (files, directories, HAMT shards) and skipping internal
file chunks.

- mfsEntityRootsProvider: MFS walk with entity root detection
- createKeyProvider: select walker based on +entities flag via function
  references (makePinProv / makeMFSProv) to avoid duplicating the
  stream wiring logic
- all combinations: pinned+entities, mfs+entities, pinned+mfs+entities
- config.md: document +unique, +entities modifiers with caveats
  (range request limitation, roots vs entities distinction)
- changelog v0.41: add entries for strategy modifiers, pin add/update
  fast-provide, and hardened strategy parsing
per-block providing during ipfs add is now opt-in via
--fast-provide-dag (or Import.FastProvideDAG config, default: false).

without it, only the root CID is fast-provided after add, and the
reprovide cycle handles the rest. this changes the default for
Provide.Strategy=pinned: previously every block was provided during
write, now only the root is immediate.

use --fast-provide-dag=true to restore the previous behavior.
Provide.Strategy=all is unaffected (blockstore hook provides on Put).
pin add and pin update now accept the same --fast-provide-root and
--fast-provide-wait CLI flags as ipfs add and ipfs dag import,
with the same config fallbacks (Import.FastProvideRoot,
Import.FastProvideWait).

previously these were config-only with no CLI override.
@lidel lidel changed the title fix(config): harden provide strategy parsing feat(provide): +unique and +entities strategy modifiers Mar 24, 2026
--fast-provide-dag now available on ipfs add, ipfs dag import,
ipfs pin add, and ipfs pin update (matching --fast-provide-root).

- ExecuteFastProvideDAG accepts []cid.Cid so multiple roots share
  one bloom tracker (cross-root dedup for dag import and pin add)
- --fast-provide-dag supersedes --fast-provide-root (DAG walk
  includes the root CID as the first emitted via DFS pre-order)
- wait parameter: when true blocks until walk completes, when false
  runs in background goroutine
- Import.FastProvideDAG config option (default: false)
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 05f8870 to 07d7c66 Compare March 24, 2026 03:33
lidel added 4 commits March 25, 2026 23:38
- strategy section: clearer trade-offs, suggested configurations,
  memory comparison with concrete numbers
- Import.FastProvideDAG: new config option documentation
- Import.FastProvideRoot/Wait: updated to mention pin commands
- all three Import.FastProvide* options: consistent "Applies to" lists
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 800a1ef to a858eb1 Compare March 26, 2026 23:31
when TEST_DHT_STUB=1, the CLI test harness creates 20 in-process
libp2p hosts on loopback, each running a DHT server with a shared
in-memory ProviderStore. kubo daemons bootstrap to them over real
TCP, exercising the full DHT code path without public internet.

tests opt in via h.SetStubBootstrap(nodes) after Init().

on the daemon side, WAN DHT filters (AddressFilter, QueryFilter,
RoutingTableFilter, RoutingTablePeerDiversityFilter) are lifted
to accept loopback peers when TEST_DHT_STUB is set.

depends on: github.com/libp2p/go-libp2p-kad-dht#1241
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from a858eb1 to 4a47439 Compare March 27, 2026 00:06
lidel added 2 commits March 27, 2026 22:41
add sweep reprovide tests for all strategies (all, pinned, roots,
mfs, pinned+mfs). each test waits for two reprovide cycles to
confirm the schedule runs repeatedly. sweep uses short
Provide.DHT.Interval and polls provide stat --enc=json.

harden negative assertions:
- roots: test excludes child blocks of a recursive pin (not just
  unpinned content), using --only-hash to learn the child CID
- mfs: test that pinned content outside MFS is not provided

fix: ipfs add --only-hash no longer triggers fast-provide or
pinning (was providing CIDs for data that was never stored)

rename SetStubBootstrap to BootstrapWithStubDHT with lazy-init
(ephemeral peers created on first call, not on harness creation)
…-roots-with-dedup

# Conflicts:
#	docs/changelogs/v0.41.md
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from d52b242 to 8ae795c Compare March 28, 2026 00:37
strategy tests for pinned+mfs+unique and pinned+mfs+entities,
covering both provide-at-add-time and reprovide (two cycles).
content uses a nested DAG (root/subdir/largefile with 1 MiB
chunks) to exercise the walker on multi-level structures.

BootstrapWithStubDHT is now self-contained: it always creates
20 ephemeral DHT peers on loopback and sets TEST_DHT_STUB=1 on
each node's environment so the daemon lifts WAN DHT filters.
no external env var needed. the sweep provider requires >=20
DHT peers to estimate network size (prefix length); without
enough peers it stays offline and never provides.

TEST_DHT_STUB on the daemon side lifts WAN DHT filters
(AddressFilter, QueryFilter, RoutingTableFilter,
RoutingTablePeerDiversityFilter) to accept loopback peers.
this is set automatically by BootstrapWithStubDHT.

other changes:
- Provide.DHT.Interval=30s in sweep reprovide tests (was 1m)
- uniq() helper for unique CIDs across parallel subtests
- ipfs add --only-hash disables fast-provide and pinning
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 8ae795c to 0243a1c Compare March 29, 2026 15:04
lidel added 2 commits March 29, 2026 18:03
ipfs add --help: rewrite fast-provide section with clear structure
(content discoverability, flag defaults, strategy=all behavior)

ipfs routing reprovide: mark as deprecated, note it returns an error
with sweep provider, log error with actionable guidance

changelog: fix missing --fast-provide-dag flag on pin commands,
use "routing system" instead of "DHT" where applicable, link to
docs/config.md as source of truth for defaults

environment-variables.md: note that BootstrapWithStubDHT sets
TEST_DHT_STUB automatically, no external env var needed
lidel added 4 commits March 29, 2026 21:43
the fork (NoopMessageSender, MsgSenderBuilder) is no longer used.
the ephemeral peer pool in BootstrapWithStubDHT replaced the
NoopMessageSender approach.
log providedCIDs and skippedBranches after each unique reprovide
cycle and fast-provide-dag walk.

tests verify exact counts with two dir pins sharing a 10 KiB file
(5 KiB chunks): fast-provide-dag asserts 5 provided + 1 skipped
branch, reprovide asserts 6 provided + 1 skipped branch (includes
empty MFS root pin). both assert bloom tracker created and no
autoscale.

updates boxo to pick up Deduplicated() counter, bloom
creation/autoscale logging, and review feedback fixes.
…-roots-with-dedup

# Conflicts:
#	docs/changelogs/v0.41.md
#	docs/examples/kubo-as-a-library/go.mod
#	docs/examples/kubo-as-a-library/go.sum
#	go.mod
#	go.sum
#	test/dependencies/go.mod
#	test/dependencies/go.sum
boxo#1124 landed on master; point to the merge commit
instead of the PR branch.
@lidel lidel marked this pull request as ready for review April 8, 2026 21:50
@lidel lidel requested a review from a team as a code owner April 8, 2026 21:50
lidel added 7 commits April 9, 2026 19:53
ipfs add --pin --fast-provide-dag wrapped the DAGService with
providingDagService, which announced every block as it was written
regardless of strategy modifiers. ExecuteFastProvideDAG ran in
parallel as the post-add walker. Net effect:

- pinned+entities: chunks reached the DHT despite +entities saying
  they should be skipped (correctness bug)
- pinned+unique: every block announced twice; the post-walk bloom
  only dedups against its own pass
- pinned (plain): every block announced twice

ExecuteFastProvideDAG already has bloom dedup, entity-roots support,
and unbuffered backpressure, so it is now the single mechanism for
--fast-provide-dag across ipfs add, dag import, pin add, and pin
update.

Provide.Strategy=all is untouched: every block is provided at the
blockstore level via the blockstore.Provider hook in
core/node/storage.go, which is independent of coreapi. The Pinned
strategy bit gated providingDagService and the parser rejects
combining "all" with other strategies, so "all" never set that bit
in the first place.

- core/coreapi/unixfs.go: drop the wrap, the providingDagService
  struct, and the now-unused mh and boxo/provider imports
- core/coreiface/options/unixfs.go: drop FastProvideDAG option
- core/coreapi/coreapi.go: drop now-dead providingStrategy field
- core/commands/add.go: drop the FastProvideDAG option pass-through
- test/cli/provider_test.go: regression test using ipfs add
  --fast-provide-dag with pinned+entities -- fails on the previous
  code and passes here
Operators tuning +unique or +entities strategies on memory-constrained
or extra-large repos previously had no way to trade bloom filter memory
against false-positive rate -- both the reprovide cycle and
fast-provide-dag walks hardcoded walker.DefaultBloomFPRate.

Provide.BloomFPRate is the target false positive rate (1/N) for the
shared bloom tracker. Has no effect on Provide.Strategy=all or other
strategies that do not walk DAGs through the tracker. Validation
rejects values below 1_000_000 (~1 in 1M); below that the bloom
becomes lossy enough to drop a meaningful fraction of CIDs from each
reprovide cycle.

The single source of truth for the default value is
config.DefaultProvideBloomFPRate; docs reference it descriptively
(~1 in 4.75M, ~4 bytes/CID) so the literal lives in exactly one place.

- config/provide.go: BloomFPRate field, DefaultProvideBloomFPRate
  and MinProvideBloomFPRate constants, validation
- config/provide_test.go: round-trip + validation cases
- core/node/provider.go: plumb fpRate through setReproviderKeyProvider
  and createKeyProvider
- core/commands/cmdenv/env.go: ExecuteFastProvideDAG takes fpRate
- core/commands/{add,dag/import,pin/pin}.go: resolve from cfg and
  pass through to ExecuteFastProvideDAG
- docs/config.md: new Provide.BloomFPRate section after Provide.DHT.*
  with memory tradeoff table and minimum-value note
- docs/changelogs/v0.41.md: link to the new option from the +unique/
  +entities section
readLastUniqueCount and persistUniqueCount were exercised only
indirectly via CLI tests, leaving the 8-byte length check and the
"missing key" fallback without direct coverage.

- empty datastore returns 0 (no previous cycle)
- round trip across the full uint64 range (0, 1, 1k, 1M, 1B, MaxUint64)
- overwrite returns the most recent value (matches per-cycle persist)
- corrupt length (empty, short, long, single byte) returns 0 instead
  of panicking
Background fast-provide goroutines were implicitly bound to
req.Context, which go-ipfs-cmds cancels on handler exit, so
async --fast-provide-dag (and --fast-provide-root parented on
context.Background) aborted or outlived the node. Parent both
paths off the IpfsNode lifetime context instead.

- ExecuteFastProvideRoot: async goroutine now derives from
  ipfsNode.Context(), so it cancels on daemon shutdown rather
  than potentially touching a closed DHT client.
- ExecuteFastProvideDAG: takes cmdCtx and nodeCtx; wait=true
  runs inline under cmdCtx (Ctrl+C still cancels the walk),
  wait=false runs in a goroutine under nodeCtx so the walk
  survives command exit but still stops on shutdown.
- add, dag import, pin add/update: pass node.Context() as
  the new nodeCtx argument.
- changelog: note the behavior change for opt-in strategies.
Adds TestProviderFastProvideDAGAsyncSurvives: ipfs add with
--fast-provide-dag=true but no --fast-provide-wait must walk the
full DAG in a background goroutine that outlives the command
handler, announce every block, and leave chunk CIDs findable by
peers via findprovs. A long Provide.DHT.Interval ensures the
scheduled reprovide cycle cannot be the source of the chunk
announcements.
…-roots-with-dedup

# Conflicts:
#	docs/examples/kubo-as-a-library/go.mod
#	docs/examples/kubo-as-a-library/go.sum
#	go.mod
#	go.sum
#	test/dependencies/go.mod
#	test/dependencies/go.sum
@lidel lidel force-pushed the feat/provide-entity-roots-with-dedup branch from 68ecb80 to cdd9103 Compare April 10, 2026 00:39
@lidel
Copy link
Copy Markdown
Member Author

lidel commented Apr 10, 2026

The headline feature is +entities (#8676 from 2022): announce only file/directory roots and HAMT shards instead of every block, which collapses provider record counts on big UnixFS repos.

Along the way, +unique adds bloom-dedup reprovide for overlapping pinsets (roughly an order of magnitude less reprovide-cycle RAM), --fast-provide-root and --fast-provide-dag are now wired consistently across add/dag import/pin add/pin update, and a new stub-DHT harness exercises the sweep provider end-to-end so tests are less flaky + the changelog claims are backed by real findprovs over loopback.

👉 Default behavior (Provide.Strategy=all) is unchanged.
This PR is effectively opt-in experiment + some perf. fixes, thus low risk.

Merging now and shipping in 0.41-rc1 for wider testing.

@lidel lidel merged commit 356d261 into master Apr 10, 2026
25 checks passed
@lidel lidel deleted the feat/provide-entity-roots-with-dedup branch April 10, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improved Reprovider.Strategy for entity DAGs (HAMT/UnixFS dirs, big files) Provide fewer nodes

1 participant