Cat 4c: edge count alongside component count, sparse annotation#37
Cat 4c: edge count alongside component count, sparse annotation#37jphein wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds per-edge-type edge counts to the ingestion integrity report and enhances the formatted output to annotate sparse edge types (Issue #15), with new tests to validate both the report field and rendering.
Changes:
- Added
per_edge_type_edge_countstoIngestionIntegrityReportand populated it inscore_ingestion_integrity - Updated
format_reportto render per-edge-type edge counts alongside component counts and mark sparse edge types - Added tests covering the new report field and sparse annotation behavior in formatted output
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
sme/categories/ingestion_integrity.py |
Extends report schema and updates formatting to include edge counts + sparse annotations |
tests/test_ingestion_integrity.py |
Adds regression tests for the new report field and format_report output |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| per_edge_type_edge_counts: dict[str, int] = field(default_factory=dict) | ||
|
|
||
|
|
| sparse_threshold = 5 | ||
| combined = sorted( | ||
| report.per_edge_type_components.items(), | ||
| key=lambda kv: -report.per_edge_type_edge_counts.get(kv[0], 0), |
| lines.append(" Per-edge-type edges + components (4c monoculture signal):") | ||
| sparse_threshold = 5 |
|
Edge counts alongside component counts + sparse annotation is exactly the right fix for #15 — the per-edge-type component count was structurally noisy on small populations and this makes the signal/noise tradeoff visible per-type. CI is red only on one ruff finding: ambiguous variable name Small follow-ups (non-blocking, worth in same touch):
|
- Rename ambiguous `l` to `line` in test (clears the only red ruff finding) - Pull sparse edge-count threshold 5 into named _SPARSE_EDGE_THRESHOLD constant - Tie-break the 4c render sort on edge_type name so ties are deterministic Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Pushed fa0ad5f:
🫏 |
|
Verified — One small forward thought, not for this PR: |
Summary
Closes #15.
Adds edge count alongside component count in the Cat 4c "Edge-type monoculture" section and annotates sparse types (<5 edges) to distinguish real structural signals from noise.
per_edge_type_edge_counts: dict[str, int]field toIngestionIntegrityReportformat_report()now shows both edge count and component count per type, sorted by edge count descending[sparse — <5 edges]Before:
After:
Test plan
test_per_edge_type_edge_counts_populated— verifies edge counts match ground truthtest_format_report_sparse_annotation— verifies sparse annotation appears for low-edge-count types and not for RELATED🫏 Generated with Claude Code