Make the exported knowledge graph intuitive (curation pass, flagged off) by swati510 · Pull Request #352 · repowise-dev/repowise

swati510 · 2026-06-03T13:13:05Z

Summary

Adds a curation/presentation pass over the deterministic KG skeleton so the exported knowledge-graph.json is intuitive on every index: bounded dependency-ordered layers, capped/ranked real entry points, one canonical layer-aware tour, typed infra/CI/data nodes, and never-empty summaries — plus an optional self-validated portable artifact.

All behaviour is behind REPOWISE_KG_CURATION (default off). With the flag off, the exported KG is byte-identical to today's. The NetworkX graph, centrality, community detection, and DB tables are never mutated (curation only writes the returned KnowledgeGraphResult; a regression test guards node/edge counts pre/post).

New module: packages/core/src/repowise/core/analysis/kg_curation.py, wired at pipeline/orchestrator.py:527 (runs in both FAST and STANDARD, before generation/persistence).

Phases

0 — seam: no-op curate_knowledge_graph + flag.
1 — layers: generation/layers.py:infer_layer spine replaces raw-community layers → bounded, partitioned, dependency-ordered. Edge A: CLI hint added to _LAYER_HINTS/_CANONICAL_RANK. Edge B: mega-layers (core/ui) get a two-level primary→sub-group split.
2 — entry points: re-export barrels demoted in the presentation view only (graph is_entry_point untouched); survivors ranked by pagerank+betweenness, capped at 8; full ranked list in project.entry_candidates.
3 — tour: single canonical layer-aware tour via tour.py:build_tour, README-first, each step carries layer_id. LLM enrichment now keeps the curated tour instead of regenerating it.
4 — typing + summaries: infra/CI/data nodes typed; never-empty deterministic summary floor (rich wiki-page summaries still win — backfill kept fill-empty, floor deferred to post-backfill in generate mode).
5 — C4: curated layers flow to the architecture view automatically (locked by test); Mermaid groups externals by category past a threshold.
6 — portable: build_portable_kg → self-contained artifact with meta + embedded validation; save_knowledge_graph_json(..., portable=True).
7 — invariants: validate_kg pure checker + cross-repo invariant suite (many-isolates regression, flat single-package, deep monorepo), AST-untouched guard, determinism.

Tests

Full unit suite: 3261 passed, 2 xfailed, 0 failures. New tests in tests/unit/analysis/test_kg_curation.py, test_kg_invariants.py, kg_fixtures.py, and tests/unit/server/services/test_c4_curation.py.

Empirical validation

repowise itself: 103 → 11 layers, 0 singletons, largest layer 25.7%, Application catch-all 14.1%, CLI surfaced as its own layer.

Multi-repo (§7.2 acceptance gate): core wins hold on every repo (no layer explosion, 0 singletons, partition intact, count ≤15). But the >35% mega-layer / <20% catch-all balance targets are not met on test-heavy repos (django Test layer = 63%) and flat single-package libraries (requests/flask catch-all 42–59%, everything in one package dir → Application). validate_kg correctly warns on these.

Why the flag stays OFF

The acceptance gate requires no repo exceeding the 35%/20% balance targets; that currently fails on the repo shapes above, so the default is intentionally left off (no flip commit). Two follow-ups before default-on:

Path-only infer_layer can't subdivide flat libraries (no directory signal) — add a filename-based hint fallback (models.py→Data, etc.).
Consider exempting the Test layer from the largest-primary-layer metric (tests legitimately dominate some repos).

Deliberate divergence from the plan

The plan said to extend to_dict() for the portable artifact; I kept to_dict() byte-identical and added build_portable_kg separately instead, otherwise a new meta key would break the flag-off byte-identity rule.

…en curated

… export

…ression

repowise-bot · 2026-06-05T07:36:48Z

✅ Health: 7.0 (unchanged)
_{5 hotspots · 5 hidden couplings · 2 with fix history}

🚨 Change risk: high _{(riskier than 86% of this repo's commits · raw 9.7/10)}
This change's risk is driven by:

large diff (many lines added)
scattered, high-entropy change

🩹 Review priority _{(files here with the most recent bug-fix history — defects cluster, so review these first)}

.../pipeline/orchestrator.py — fixed 11× in the last ~6 months
.../c4_builder/mermaid.py — fixed once in the last ~6 months

🔥 Hotspots touched (5)

.../c4_builder/mermaid.py — 1 commits/90d, 1 dependents · primary owner: Raghav Chamadiya (100%)
.../pipeline/orchestrator.py — 33 commits/90d, 15 dependents · primary owner: Raghav Chamadiya (82%)
.../generation/knowledge_graph.py — 1 commits/90d, 3 dependents · primary owner: Swati Ahuja (100%)

2 more

.../generation/test_layers.py — 1 commits/90d, 0 dependents · primary owner: Raghav Chamadiya (100%)
.../generation/layers.py — 1 commits/90d, 3 dependents · primary owner: Raghav Chamadiya (100%)

🔗 Hidden coupling (1 file)

.../pipeline/orchestrator.py co-changes with these files (not in this PR):
- .../pipeline/persist.py (9× — 🟢 routine)
- .../commands/update_cmd.py (8× — 🟢 routine)
- .../persistence/models.py (7× — 🟢 routine)
- .../mcp_server/tool_overview.py (6× — 🟢 routine)
- README.md (6× — 🟢 routine)

📊 Full report · ⭐ Star Repowise · 📥 Install bot · Last updated 2026-06-05 07:36 UTC
_{Silence on a single PR with [skip repowise] in the title · Per-repo toggle on repowise.dev/settings?tab=bot}

swati510 added 13 commits June 3, 2026 17:39

feat(kg): add no-op curation seam in pipeline (flagged off)

81ce137

feat(kg): add CLI hint to layer spine (edge case A)

e04eb0b

feat(kg): curate layers from infer_layer spine with mega-layer sub-split

63f60c9

test(kg): layer count + partition + sub-split invariants

b640d7f

feat(kg): demote barrels, rank + cap entry points

2b72486

test(kg): entry-point precision invariants

e7dd8ef

feat(kg): export canonical layer-aware tour; keep it over LLM tour wh…

c7ca129

…en curated

test(kg): layer-aware tour invariants

411ae25

feat(kg): type infra/CI/data nodes and add never-empty summary floor

da00d42

test(kg): node typing + summary floor invariants

a2f1ad4

feat(kg): group C4 externals by category; lock curated-layer inheritance

ae12c41

feat(kg): add validate_kg invariant checker + portable self-validated…

d6f1069

… export

test(kg): cross-repo invariants, portable artifact, many-isolates reg…

8d5acf3

…ression

swati510 requested a review from RaghavChamadiya as a code owner June 3, 2026 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make the exported knowledge graph intuitive (curation pass, flagged off)#352

Make the exported knowledge graph intuitive (curation pass, flagged off)#352
swati510 wants to merge 13 commits into
mainfrom
kg-intuitiveness

swati510 commented Jun 3, 2026

Uh oh!

repowise-bot Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

swati510 commented Jun 3, 2026

Summary

Phases

Tests

Empirical validation

Why the flag stays OFF

Deliberate divergence from the plan

Uh oh!

repowise-bot Bot commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant