Make the exported knowledge graph intuitive (curation pass, flagged off)#352
Open
swati510 wants to merge 13 commits into
Open
Make the exported knowledge graph intuitive (curation pass, flagged off)#352swati510 wants to merge 13 commits into
swati510 wants to merge 13 commits into
Conversation
|
✅ Health: 7.0 (unchanged) 🚨 Change risk: high (riskier than 86% of this repo's commits · raw 9.7/10)
🩹 Review priority (files here with the most recent bug-fix history — defects cluster, so review these first)
🔥 Hotspots touched (5)
2 more
🔗 Hidden coupling (1 file)
📊 Full report · ⭐ Star Repowise · 📥 Install bot · Last updated 2026-06-05 07:36 UTC |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a curation/presentation pass over the deterministic KG skeleton so the exported
knowledge-graph.jsonis intuitive on every index: bounded dependency-ordered layers, capped/ranked real entry points, one canonical layer-aware tour, typed infra/CI/data nodes, and never-empty summaries — plus an optional self-validated portable artifact.All behaviour is behind
REPOWISE_KG_CURATION(default off). With the flag off, the exported KG is byte-identical to today's. The NetworkX graph, centrality, community detection, and DB tables are never mutated (curation only writes the returnedKnowledgeGraphResult; a regression test guards node/edge counts pre/post).New module:
packages/core/src/repowise/core/analysis/kg_curation.py, wired atpipeline/orchestrator.py:527(runs in both FAST and STANDARD, before generation/persistence).Phases
curate_knowledge_graph+ flag.generation/layers.py:infer_layerspine replaces raw-community layers → bounded, partitioned, dependency-ordered. Edge A:CLIhint added to_LAYER_HINTS/_CANONICAL_RANK. Edge B: mega-layers (core/ui) get a two-level primary→sub-group split.is_entry_pointuntouched); survivors ranked by pagerank+betweenness, capped at 8; full ranked list inproject.entry_candidates.tour.py:build_tour, README-first, each step carrieslayer_id. LLM enrichment now keeps the curated tour instead of regenerating it.build_portable_kg→ self-contained artifact withmeta+ embeddedvalidation;save_knowledge_graph_json(..., portable=True).validate_kgpure checker + cross-repo invariant suite (many-isolates regression, flat single-package, deep monorepo), AST-untouched guard, determinism.Tests
Full unit suite: 3261 passed, 2 xfailed, 0 failures. New tests in
tests/unit/analysis/test_kg_curation.py,test_kg_invariants.py,kg_fixtures.py, andtests/unit/server/services/test_c4_curation.py.Empirical validation
repowise itself: 103 → 11 layers, 0 singletons, largest layer 25.7%, Application catch-all 14.1%, CLI surfaced as its own layer.
Multi-repo (§7.2 acceptance gate): core wins hold on every repo (no layer explosion, 0 singletons, partition intact, count ≤15). But the >35% mega-layer / <20% catch-all balance targets are not met on test-heavy repos (django
Testlayer = 63%) and flat single-package libraries (requests/flask catch-all 42–59%, everything in one package dir →Application).validate_kgcorrectly warns on these.Why the flag stays OFF
The acceptance gate requires no repo exceeding the 35%/20% balance targets; that currently fails on the repo shapes above, so the default is intentionally left off (no flip commit). Two follow-ups before default-on:
infer_layercan't subdivide flat libraries (no directory signal) — add a filename-based hint fallback (models.py→Data, etc.).Testlayer from the largest-primary-layer metric (tests legitimately dominate some repos).Deliberate divergence from the plan
The plan said to extend
to_dict()for the portable artifact; I keptto_dict()byte-identical and addedbuild_portable_kgseparately instead, otherwise a newmetakey would break the flag-off byte-identity rule.