Skip to content

Add folder-path signal to community similarity matrix#46

Merged
raphasouthall merged 1 commit into
mainfrom
feat/folder-path-signal
Jun 3, 2026
Merged

Add folder-path signal to community similarity matrix#46
raphasouthall merged 1 commit into
mainfrom
feat/folder-path-signal

Conversation

@raphasouthall
Copy link
Copy Markdown
Owner

Problem

Embeddings treat "infrastructure" as one topic whether a note lives under work/ or home/, so distinct organisational areas collapsed into one community. The largest coarse cluster was a work+literature+home soup (258 notes: work 203, literature 26, home 16).

Change

Blend a fourth channel into _build_similarity_matrix: notes sharing a top-level folder prefix get a uniform similarity bump.

  • PATH_SIGNAL_WEIGHT = 0.3 (0 disables)
  • PATH_PREFIX_DEPTH = 1 (top-level: work/, home/, research/…)
  • S = 0.6·semantic + 0.25·cooc + 0.15·links + 0.3·path

Measurement (read-only sweep, ~490-note vault)

Purity = fraction of each community's notes from its dominant top-level folder.

config Qc nc purity_c Qf nf purity_f
baseline (δ=0) 0.308 9 0.68 0.257 13 0.76
depth1 δ=0.3 0.340 7 0.85 0.289 9 0.93
depth1 δ=0.5 0.424 5 0.94 0.317 9 0.94
depth2 δ=0.2 0.341 10 0.65 0.268 10 0.78
  • depth=1 is the work/home grain; depth=2 reduced purity (splitting home/ weakens reinforcement).
  • δ=0.3 lifts purity sharply while modularity holds/improves and counts stay sensible. δ≥0.5 over-purifies (clusters by folder, not topic) and collapses the count.
  • Largest coarse cluster goes from work:203, literature:26, home:16work:211, literature:8, home:0.

Safety

Root-level files (CLAUDE.md etc.) form no path edges; a flat single-folder vault yields an all-zero S_path; weight 0 disables. So it degrades gracefully as a default.

Tests

New TestPathSignal: same-folder bump, cross-folder isolation, root-file exclusion, disable path. Full suite: 566 passed.

Embeddings see "infrastructure" as one topic regardless of whether a note
lives under work/ or home/, so distinct organisational areas collapsed into a
single community (the largest coarse cluster was work+literature+home soup).

Blend a fourth channel into _build_similarity_matrix: notes sharing a
top-level folder prefix get a uniform similarity bump (PATH_SIGNAL_WEIGHT=0.3,
PATH_PREFIX_DEPTH=1). A read-only sweep on a ~490-note vault picked these:
depth=1 is the work/home grain (depth=2 reduced cohesion); delta=0.3 lifts
folder purity 0.68->0.85 coarse / 0.76->0.93 fine while modularity holds or
improves; delta>=0.5 over-purifies and collapses community count.

Degrades gracefully: root-level files form no path edges, a flat single-folder
vault yields an all-zero signal, weight 0 disables. Tests cover same-folder
bump, cross-folder isolation, root-file exclusion, and the disable path.
@raphasouthall raphasouthall merged commit 310cc62 into main Jun 3, 2026
5 checks passed
@raphasouthall raphasouthall deleted the feat/folder-path-signal branch June 3, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant