Improvement: co-occurrence table is 62% of DB and 82% noise

## Summary

\`entity_cooccurrence\` holds 1.72M rows and occupies 365 MB (~62% of the 592 MB SQLite DB). But 82% of pairs have weight ≤ 1.1 — i.e., pair co-occurred in exactly one note and possibly got one Hebbian bump. The top-weight pairs are polluted by tool-output tokens (\`\"MEMORY.md\" <-> \"System reminder\"\` at 12.0, \`\"0\" <-> \"1\"\` at 9.0) rather than vault content. The boost at query time is ≤ +10% and rarely differentiates.

## Evidence

Direct SQL on the DB:

| bucket | count | share |
|---|---:|---:|
| weight == 1.0 | 1,085,370 | 62.9% |
| weight == 1.1 | 325,920 | 18.9% |
| 1.1 < w < 2.0 | 301,671 | 17.5% |
| 2 ≤ w < 5 | 12,043 | 0.70% |
| 5 ≤ w < 10 | 114 | 0.007% |
| ≥10 | 3 | 0.0002% |

Top entities: literal tool-output fragments leaking from triple extraction.

## Proposed fix

1. **Prune \`weight <= 1.0\` after each rebuild.** In \`cooccurrence.py:142\`, after the \`executemany\`: \`conn.execute(\"DELETE FROM entity_cooccurrence WHERE weight <= 1.0\")\`. Frees ~225 MB. A single-note co-occurrence is noise.
2. **Switch score from raw count to NPMI.** In \`search.py:661-708\` and \`attractor.py:130-144\`, replace count-based weight with \`npmi(a,b) = pmi / -log p(a,b)\`. Cache per-entity note counts in a small companion table populated by \`persist_cooccurrence\`. Effect: \`Azure\` / \`MEMORY.md\` stop dominating; rare-but-specific pairs rise.
3. **Denylist + upstream filter.** Add a denylist in \`cooccurrence.py:113-124\` (single-digit strings, numeric-only tokens, known system tokens like \`System reminder\`, \`vault_search\`). Better: fix the triples extractor in \`triples.py\` so tool-output fragments never become entities.
4. **Hebbian LTD.** Add \`decay_cooccurrence(conn, factor=0.99, floor=1.0)\` in \`cooccurrence.py\`, call from \`cli/decay.py\`. Multiplies all weights by 0.99 then \`DELETE WHERE weight < floor\`. Prevents unbounded growth from the \`×1.1\` reinforcement at \`search.py:840-866\`.
5. **Wire into \`vault_related\`.** Currently \`related.py:93-102\` is pure cosine. Fold in co-occurrence via the same blend the attractor uses (\`0.6 cosine + 0.25 cooccur + 0.15 wikilinks\`). The signal becomes load-bearing instead of decorative.

## Expected effect

- DB shrinks ~225 MB.
- Top pairs reflect actual knowledge, not tool noise.
- NPMI rebalances the score toward specific associations.
- Self-limiting growth via decay.
- \`vault_related\` starts surfacing Hebbian-connected notes, not just embedding-similar ones — which is the neuroscience premise the project sells.

## Key files

- \`src/neurostack/cooccurrence.py:17, 20-88, 91-148, 257-275\`
- \`src/neurostack/search.py:661-708, 840-866\`
- \`src/neurostack/attractor.py:43, 67-177\`
- \`src/neurostack/related.py:19-125\` (currently doesn't read \`entity_cooccurrence\` at all)
- \`src/neurostack/config.py:53\`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement: co-occurrence table is 62% of DB and 82% noise #34

Summary

Evidence

Proposed fix

Expected effect

Key files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

bucket	count	share
weight == 1.0	1,085,370	62.9%
weight == 1.1	325,920	18.9%
1.1 < w < 2.0	301,671	17.5%
2 ≤ w < 5	12,043	0.70%
5 ≤ w < 10	114	0.007%
≥10	3	0.0002%

Improvement: co-occurrence table is 62% of DB and 82% noise #34

Description

Summary

Evidence

Proposed fix

Expected effect

Key files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions