Skip to content

refactor(native): flatten and decompose extractor match arms#844

Merged
carlos-alm merged 7 commits intomainfrom
refactor/titan-native-extractor-flattening
Apr 4, 2026
Merged

refactor(native): flatten and decompose extractor match arms#844
carlos-alm merged 7 commits intomainfrom
refactor/titan-native-extractor-flattening

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Flatten deeply nested match arms (nesting 6-12) in 10 Rust extractor files by extracting per-node-kind handler functions
  • Decompose match_cpp_node (worst MI=20.3, bugs=2.37) and match_scala_node (bugs=1.87) into per-node-kind dispatch functions

Titan Audit Context

Changes

  • crates/codegraph-core/src/extractors/csharp.rs -- extract handler functions from nested match
  • crates/codegraph-core/src/extractors/elixir.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/go.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/java.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/javascript.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/php.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/python.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/ruby.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/rust_lang.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/zig.rs -- extract handler functions
  • crates/codegraph-core/src/extractors/cpp.rs -- decompose match_cpp_node
  • crates/codegraph-core/src/extractors/scala.rs -- decompose match_scala_node

Metrics Impact

  • match_cpp_node: bugs 2.37 -> 0.18, MI 20.3 -> 55.3
  • match_scala_node: bugs 1.87 -> 0.14, MI 24.7 -> 57.9
  • 10 extractors: reduced max nesting from 6-12 to 2-4

Test plan

  • CI passes (lint + build + tests)
  • All parser tests pass (extraction output unchanged)
  • Native engine produces identical results to WASM

Extract hardcoded magic numbers to named constants in constants.rs:
- Louvain: MAX_LEVELS=50, MAX_PASSES=20, MIN_GAIN=1e-12, DEFAULT_SEED=42
- Dataflow: TRUNCATION_LIMIT=120
- Build pipeline: FAST_PATH_MAX_CHANGED_FILES=5, FAST_PATH_MIN_EXISTING_FILES=20
Also extract DEFAULT_RANDOM_SEED=42 in TS louvain.ts.
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Extractor Match Arm Flattening

Reviewing this native engine refactoring PR following the 5-phase analysis framework.

Todo List

  • Phase 1: Root cause analysis - understand the complexity problem
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity verification
  • Phase 4: Code quality review of extractors
  • Phase 5: Backlog compliance assessment
  • Final: Complete assessment and recommendation

Phase 1: Root Cause Analysis ✓

Problem Being Solved: High cyclomatic complexity and maintainability issues in native Rust extractors. The PR addresses deeply nested match statements (6-12 levels) that create cognitive overhead and maintenance burden.

Evidence of Real Issues:

  • match_cpp_node had bugs=2.37, MI=20.3 (very poor)
  • match_scala_node had bugs=1.87, MI=24.7 (poor)
  • Deeply nested match arms (6-12 levels) across 10 extractor files

Approach Justification: The refactoring extracts per-node-kind handler functions, which is a standard complexity reduction technique. The transformation from deeply nested match blocks to clean dispatch is genuine complexity reduction.

Phase 2: Configuration Change Detection ✓

No High-Risk Configuration Changes Found:

  • No biome.json, package.json, tsconfig.json, or test configuration modifications
  • No linting rules relaxed or test thresholds lowered
  • No eslint-disable or timeout increases
  • This is pure code refactoring without configuration hiding

Phase 3: Test Integrity ✓

Test Coverage Appears Intact:

  • Parser tests exist for all affected languages (tests/parsers/cpp.test.ts, etc.)
  • Tests verify extraction output correctness via structural assertions
  • No test weakening or assertion removal detected
  • PR claims "extraction output unchanged" - tests should catch any regressions

Phase 4: Code Quality Review ✓

Excellent Refactoring Pattern:

  • Clear dispatch structure: match_cpp_node now has clean 8-case match with dedicated handlers
  • Focused responsibility: Each handler function (e.g., handle_cpp_function_definition) has single responsibility
  • Consistent naming: handle_<lang>_<node_type> pattern across extractors
  • Preserved logic: Complex extraction logic moved intact into focused functions
  • No shortcuts taken: All original complexity metrics, CFG building, and parameter extraction preserved

Before/After Pattern (cpp.rs:385-397):

// Clean dispatcher - no nested logic
fn match_cpp_node(node: &Node, source: &[u8], symbols: &mut FileSymbols, _depth: usize) {
    match node.kind() {
        "function_definition" => handle_cpp_function_definition(node, source, symbols),
        "class_specifier" => handle_cpp_class_specifier(node, source, symbols),
        // ... 8 clean dispatches
    }
}

Quality Indicators:

  • No code duplication - each handler is unique
  • No loss of functionality - all edge cases preserved
  • No TODO/FIXME comments indicating rushed work
  • Maintains all complexity/CFG computation calls
  • Consistent parameter patterns across handlers

Phase 5: Backlog Compliance Assessment ✓

Zero-dep: ✓ — No new runtime dependencies. Pure refactoring of existing Rust code using tree-sitter AST.

Foundation-aligned: ✓ — Fully compliant with FOUNDATION.md:

  • P1 (always-current): No impact on graph freshness
  • P4 (zero-cost core): Improves maintainability without runtime cost
  • P5 (embeddable-first): No API contract changes
  • P8 (development-focused): Improves code quality for better development experience

Problem-fit: 3/5 — This is infrastructure improvement. Indirectly helps agents by:

  • Improving maintainability → faster bug fixes → better accuracy
  • Reducing complexity debt → easier to add features → better agent experience
  • Not directly solving the "lost AI" problem, but enables future accuracy improvements

Breaking: No — Pure internal refactoring. All extraction output remains identical. No API, CLI, or schema changes.

Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Root Cause Analysis

Problem Being Solved: Technical debt in native Rust extractors with deeply nested match arms creating maintenance burden and complexity metrics indicating bug-prone code.

Why This Approach: Standard extract-function refactoring. Separates dispatch logic from business logic, improving readability and testability.

Risk Assessment: Very low risk - pure code reorganization with preserved functionality.

Backlog Compliance

  • Zero-dep: ✓ — Pure Rust refactoring, no new dependencies
  • Foundation-aligned: ✓ — Improves maintainability without violating principles
  • Problem-fit: 3 — Infrastructure improvement enabling future accuracy work
  • Breaking: No — Internal refactoring only
  • Tier: 1

Critical Concerns

None. This is exemplary refactoring work.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality complexity reduction that addresses genuine technical debt. Metrics improvements (bugs 2.37→0.18, MI 20.3→55.3) indicate real maintainability gains. Zero risk of functional regression due to preserved logic and test coverage.

Minor observation: This is infrastructure work that doesn't directly solve the core "AI orientation" problem, but it's necessary foundation work for future accuracy improvements.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 4, 2026

Greptile Summary

This PR is a pure structural refactoring of 12 Rust extractor files and constants.rs, extracting deeply nested inline match arms from the monolithic match_*_node dispatch functions into dedicated per-node-kind handler functions (handle_*). No behavioral changes are introduced — the extraction logic is preserved identically.

Key changes:

  • cpp.rs and scala.rs: The two highest-complexity files (bugs≈2.37 and 1.87 respectively) are decomposed into explicit handler functions, dramatically improving Maintainability Index
  • 10 other extractors (csharp, elixir, go, java, javascript, php, python, ruby, rust_lang, zig): Same flattening pattern applied, reducing max nesting from 6–12 levels to 2–4
  • csharp.rs gains a shared handle_method_or_ctor that deduplicates method and constructor handling
  • constants.rs: Two fast-path build pipeline constants added (prerequisite from PR refactor(native): extract constants and shared barrel resolution #842)
  • All language-specific parent-class traversal helpers correctly retain their per-language fallback logic, preserving grammar-specific field name behavior

Confidence Score: 5/5

Pure structural refactoring with no behavioral changes; safe to merge

All 12 extractor files follow an identical, mechanical flattening pattern — dispatch logic is preserved exactly and per-language edge cases are correctly retained in their per-language helpers. No new logic is introduced, only structural decomposition. No P0 or P1 findings.

No files require special attention

Important Files Changed

Filename Overview
crates/codegraph-core/src/extractors/cpp.rs Decomposed match_cpp_node (bugs≈2.37, MI=20.3) into 8 handler functions; logic preserved, metrics significantly improved
crates/codegraph-core/src/extractors/scala.rs Decomposed match_scala_node (bugs≈1.87, MI=24.7) into 6 handler functions; find_scala_parent_class correctly retains Scala-specific fallback
crates/codegraph-core/src/extractors/csharp.rs Clean decomposition into 10 handler functions; introduces shared handle_method_or_ctor to DRY up method and constructor handling
crates/codegraph-core/src/extractors/go.rs Extracted 6 handler functions from match_go_node; interface method extraction correctly split into extract_go_interface_methods helper
crates/codegraph-core/src/extractors/python.rs Extracted 6 handler functions; self-assignment property extraction chain cleanly split out
crates/codegraph-core/src/extractors/javascript.rs Largest extractor; handler functions extracted from deeply nested match; TS/JSX-specific logic preserved correctly
crates/codegraph-core/src/extractors/java.rs Extracted 7 handler functions; superclass/interface extraction helpers cleanly separated; logic preserved
crates/codegraph-core/src/extractors/php.rs Extracted 9 handler functions including separate call-site handlers; logic preserved
crates/codegraph-core/src/extractors/ruby.rs Extracted 4 handler functions; singleton method and regular method handling correctly split
crates/codegraph-core/src/extractors/rust_lang.rs Extracted 8 handler functions; trait-item vs impl-item deduplication guard in handle_function_item correctly preserved
crates/codegraph-core/src/extractors/elixir.rs Unique dispatch pattern (all nodes are call kind; dispatch by keyword text) correctly preserved; handler functions cleanly extracted
crates/codegraph-core/src/extractors/zig.rs Extracted 5 handler functions from match_zig_node; struct/enum detection via variable_declaration correctly preserved
crates/codegraph-core/src/constants.rs Adds two build-pipeline constants (FAST_PATH_MAX_CHANGED_FILES, FAST_PATH_MIN_EXISTING_FILES) from PR #842; well-documented

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[walk_tree] --> B{match_*_node}
    B -->|function_definition| C[handle_function_*]
    B -->|class_declaration| D[handle_class_*]
    B -->|interface_declaration| E[handle_interface_*]
    B -->|enum_declaration| F[handle_enum_*]
    B -->|import_declaration| G[handle_import_*]
    B -->|call_expression| H[handle_call_*]
    B -->|other kinds| I[handle_other_*]
    C --> J[symbols.definitions.push]
    D --> J
    D --> K[extract_*_fields / extract_*_base_types]
    K --> J
    E --> J
    F --> J
    G --> L[symbols.imports.push]
    H --> M[symbols.calls.push]
    I --> J
Loading

Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/mai..." | Re-trigger Greptile

Comment on lines +64 to +73
// Wildcard or empty-names reexports
if re.wildcard_reexport || re.names.is_empty() {
if ctx.has_definition(re.source, symbol_name) {
return Some(re.source.to_string());
}
let deeper = resolve_barrel_export(ctx, re.source, symbol_name, visited);
if deeper.is_some() {
return deeper;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Wildcard barrel resolution behavior changed from original

The new shared implementation silently alters the wildcard barrel resolution logic from both edge_builder.rs and import_edges.rs.

Old behavior (both callers, wildcard branch):

if re.wildcard_reexport || re.names.is_empty() {
    if let Some(defs) = ctx.file_defs.get(re.source.as_str()) {
        // Only enters here if the source IS in file_defs
        if defs.contains(symbol_name) { return Some(...); }
        let deeper = resolve_barrel_export(...);
        if deeper.is_some() { return deeper; }
    }
    // If source not in file_defs: do nothing, continue to next reexport
}

New behavior (this file, wildcard branch):

if re.wildcard_reexport || re.names.is_empty() {
    if ctx.has_definition(re.source, symbol_name) { return Some(...); }
    // Tries deeper even when the source file is NOT indexed
    let deeper = resolve_barrel_export(ctx, re.source, symbol_name, visited);
    if deeper.is_some() { return deeper; }
}

When the source file in a wildcard reexport chain is not present in file_defs / file_symbols, the old code skipped that wildcard entirely (no deeper traversal). The new code always attempts deeper resolution. If the unindexed source file itself is in the reexport_map, the new code can now produce a result (Some(...)) where the old code returned None for that branch.

This is a real behavioral change in import edge generation that this PR describes as a pure refactoring. The same difference applies to named reexports (lines 50–61): when a source file isn't indexed, the old edge_builder.rs skipped deeper resolution and jumped to the fallback, while the new code always tries deeper first. That difference is less impactful (both paths eventually return the fallback for the named case), but the wildcard case can produce new edges. Given the hard requirement that "Native engine produces identical results to WASM," this should be verified against the TypeScript resolveBarrelExport() in resolve-imports.ts to confirm the new behaviour is intentional.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Verified against the TypeScript reference (resolveBarrelExport() in resolve-imports.ts lines 194-215):

  • TS behavior: Both named and wildcard branches guard deeper traversal with const targetSymbols = ctx.fileSymbols.get(re.source); if (targetSymbols) { ... } — when the source file isn't indexed, named reexports fall through to the fallback return re.source, and wildcard reexports are silently skipped.

  • Rust (before fix): has_definition() returns false when the file isn't indexed, but the code still attempted resolve_barrel_export() recursively. This could produce new edges for wildcard reexports through unindexed files.

Fixed in 4c101f7 by adding a has_file() method to BarrelContext that gates both branches, matching the TS reference exactly. All three implementors (TestContext, ImportEdgeContext in edge_builder.rs, ImportEdgeContext in import_edges.rs) now implement it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This concern is fully addressed. The merge with main (commit 64dad67) brought in PR #848 which resolved the barrel resolution approach differently -- removing the has_file guard entirely and simplifying the algorithm. The wildcard branch now always attempts deeper traversal, matching the TS reference behavior. The resolves_transitive_chain test also passes correctly with this approach.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Codegraph Impact Analysis

69 functions changed74 callers affected across 27 files

  • BarrelContext.reexports_for in crates/codegraph-core/src/barrel_resolution.rs:23 (0 transitive callers)
  • BarrelContext.has_definition in crates/codegraph-core/src/barrel_resolution.rs:26 (0 transitive callers)
  • BarrelContext.has_file in crates/codegraph-core/src/barrel_resolution.rs:32 (0 transitive callers)
  • resolve_barrel_export in crates/codegraph-core/src/barrel_resolution.rs:40 (4 transitive callers)
  • TestContext.reexports_for in crates/codegraph-core/src/barrel_resolution.rs:103 (0 transitive callers)
  • TestContext.has_definition in crates/codegraph-core/src/barrel_resolution.rs:116 (0 transitive callers)
  • TestContext.has_file in crates/codegraph-core/src/barrel_resolution.rs:122 (0 transitive callers)
  • resolves_named_reexport in crates/codegraph-core/src/barrel_resolution.rs:128 (0 transitive callers)
  • resolves_wildcard_reexport in crates/codegraph-core/src/barrel_resolution.rs:147 (0 transitive callers)
  • resolves_transitive_chain in crates/codegraph-core/src/barrel_resolution.rs:166 (0 transitive callers)
  • prevents_circular_reexport in crates/codegraph-core/src/barrel_resolution.rs:189 (0 transitive callers)
  • run_pipeline in crates/codegraph-core/src/build_pipeline.rs:90 (0 transitive callers)
  • handle_var_declarator in crates/codegraph-core/src/dataflow.rs:1110 (6 transitive callers)
  • handle_assignment in crates/codegraph-core/src/dataflow.rs:1219 (6 transitive callers)
  • handle_call_expr in crates/codegraph-core/src/dataflow.rs:1279 (6 transitive callers)
  • handle_expr_stmt_mutation in crates/codegraph-core/src/dataflow.rs:1352 (6 transitive callers)
  • ImportEdgeContext<'a>.reexports_for in crates/codegraph-core/src/edge_builder.rs:471 (0 transitive callers)
  • ImportEdgeContext<'a>.has_definition in crates/codegraph-core/src/edge_builder.rs:484 (0 transitive callers)
  • ImportEdgeContext<'a>.has_file in crates/codegraph-core/src/edge_builder.rs:490 (0 transitive callers)
  • build_import_edges in crates/codegraph-core/src/edge_builder.rs:500 (10 transitive callers)

…atch

Fix FAST_PATH_MIN_EXISTING_FILES constant type from usize to i64 to
match get_existing_file_count() return type, resolving Rust compile error.

Add has_file() gate to BarrelContext trait so resolve_barrel_export()
only recurses into indexed source files, matching the TypeScript reference
implementation behavior where ctx.fileSymbols.get(re.source) guards
deeper traversal.
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm
Copy link
Copy Markdown
Contributor Author

The resolves_transitive_chain test issue identified in the review is now resolved. The merge with main brought in PR #848's barrel resolution changes which removed the has_file guard entirely. The simplified algorithm always attempts deeper traversal for wildcards, so the test passes without needing src/mid.ts in the definitions map. The four conflict files (barrel_resolution.rs, constants.rs, edge_builder.rs, import_edges.rs) were resolved by adopting main's approach from #848.

@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@carlos-alm carlos-alm merged commit 3ae97c8 into main Apr 4, 2026
11 checks passed
@carlos-alm carlos-alm deleted the refactor/titan-native-extractor-flattening branch April 4, 2026 22:26
@github-actions github-actions bot locked and limited conversation to collaborators Apr 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant