Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

### Fixed

- **Community hierarchy check no longer false-alarms on every healthy build.** The build asserted `Q(fine) > Q(coarse)` and logged "sanity check failed" whenever it didn't. But Newman modularity is resolution-dependent and maximised at a single scale, so a finer partition scores *lower* at the implicit γ=1 by construction — the assertion could never hold for a healthy hierarchy. Verified empirically: the fine partition only overtakes coarse at γ≈`BETA_FINE` (≈2.0). Replaced with `_hierarchy_health_warning()`, which checks what issue #33 is actually about — the fine level collapsing into *fewer* communities than coarse — plus a `MIN_HEALTHY_Q` floor for near-random partitions.

- **Community modularity no longer collapses toward random.** In a single-domain vault most note pairs share a moderate baseline embedding cosine (off-diagonal mean ≈0.36 on a ~490-note vault), so the semantic signal was a dense floor connecting nearly everything — Newman modularity sat at Q≈0.06, barely better than a random partition. `_build_similarity_matrix` now prunes that floor with an adaptive threshold (`SEMANTIC_THRESHOLD_K`, default mean + 0.5·std of the off-diagonal distribution), zeroing weak semantic edges before community detection. Measured: Q 0.06 → ~0.30 (coarse) / ~0.28 (fine) with stable community counts. The threshold is adaptive, not a fixed cosine, so it self-tunes per vault; set `SEMANTIC_THRESHOLD_K = None` to disable. Diagnosis showed the co-occurrence graph was *not* the cause (entity document-frequency is healthy — 90% of entities appear in a single note), so co-occurrence pruning was a dead end.

- **`neurostack index` now prunes notes deleted from disk.** A full index was upsert-only: it added and updated notes but never removed DB rows for files that no longer existed. The only deletion path was the live watcher's per-event handler, so any file removed while the watcher was down orphaned its rows forever — inflating note counts, polluting co-occurrence and community detection with ghost nodes, and dragging modularity down. A full scan sees the whole vault, so it can now reconcile: anything in the DB but not on disk is pruned (FK cascades drop chunks/summaries/triples; sqlite-vec rows are cleared explicitly). An empty scan is treated as a misconfigured/unmounted vault and skips pruning rather than wiping the index.
Expand Down
43 changes: 36 additions & 7 deletions src/neurostack/attractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,10 @@
# Set to None to disable thresholding.
SEMANTIC_THRESHOLD_K = 0.5

# Minimum Newman modularity for a partition to count as non-trivial structure.
# Below this a partition is barely distinguishable from random.
MIN_HEALTHY_Q = 0.05


def _build_similarity_matrix(
conn: sqlite3.Connection,
Expand Down Expand Up @@ -451,6 +455,35 @@ def _store_level_stats(
)


def _hierarchy_health_warning(
n_coarse: int, n_fine: int, q_coarse: float, q_fine: float,
) -> str | None:
"""Return a warning string if the community hierarchy looks unhealthy.

A healthy fine level REFINES the coarse one: more, smaller communities,
both non-trivial fits. We deliberately do NOT require Q(fine) > Q(coarse):
Newman modularity is resolution-dependent and maximised at a single scale,
so a finer partition scores LOWER at the implicit γ=1 by construction
(verified empirically — the fine partition only overtakes coarse at
γ≈β_fine). The real failure mode (issue #33) is the fine level COLLAPSING
into fewer communities than coarse; that, plus a minimum-quality floor, is
what we check. Returns None when the hierarchy is healthy.
"""
if n_fine < n_coarse:
return (
f"Community hierarchy inverted: n_fine={n_fine} < "
f"n_coarse={n_coarse}. The fine level collapsed into fewer basins "
f"than coarse (check β_fine / top_k_fine — see issue #33)."
)
if q_coarse <= MIN_HEALTHY_Q or q_fine <= MIN_HEALTHY_Q:
return (
f"Weak community structure: Q(coarse)={q_coarse:.4f}, "
f"Q(fine)={q_fine:.4f} (≤ {MIN_HEALTHY_Q:.2f} is barely better "
f"than random — the similarity matrix may be too dense or sparse)."
)
return None


def detect_communities(
conn: sqlite3.Connection | None = None,
db_path=None,
Expand Down Expand Up @@ -550,13 +583,9 @@ def detect_communities(
_store_level_stats(conn, 1, communities_fine, q_fine)
n_fine = len(communities_fine)

if q_fine <= q_coarse:
log.warning(
"Community hierarchy sanity check failed:"
f" Q(fine)={q_fine:.4f} <= Q(coarse)={q_coarse:.4f}."
" The fine partition is not a tighter fit than coarse —"
" expect n_fine > n_coarse and Q(fine) > Q(coarse)."
)
warning = _hierarchy_health_warning(n_coarse, n_fine, q_coarse, q_fine)
if warning:
log.warning(warning)

conn.commit()
log.info(
Expand Down
24 changes: 24 additions & 0 deletions tests/test_attractor.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
_assign_communities,
_attractor_convergence,
_build_similarity_matrix,
_hierarchy_health_warning,
_modularity,
_size_stats,
_sparsify_top_k,
Expand Down Expand Up @@ -656,3 +657,26 @@ def test_small_vault_not_thresholded(self, in_memory_db):
emb = np.ones(8, dtype=np.float32)
S = _build_similarity_matrix(conn, paths, np.stack([emb, emb]))
assert S[0, 1] == pytest.approx(ALPHA_SEMANTIC, abs=1e-4)


# ---------------------------------------------------------------------------
# Hierarchy health check (issue #33)
# ---------------------------------------------------------------------------

class TestHierarchyHealthWarning:
def test_healthy_hierarchy_no_warning(self):
# Finer partition, both Q healthy. Q(fine) < Q(coarse) is EXPECTED at
# gamma=1 and must NOT warn (the old check fired on every good build).
assert _hierarchy_health_warning(7, 11, 0.339, 0.281) is None

def test_inverted_count_warns(self):
w = _hierarchy_health_warning(10, 6, 0.30, 0.25)
assert w is not None and "inverted" in w

def test_weak_structure_warns(self):
w = _hierarchy_health_warning(7, 11, 0.04, 0.03)
assert w is not None and "Weak" in w

def test_equal_counts_ok(self):
# n_fine == n_coarse is a valid (non-collapsed) refinement boundary.
assert _hierarchy_health_warning(7, 7, 0.30, 0.20) is None
Loading