Skip to content

feat(pipeline): test entry points, receiver type inference, two-hop resolution#23

Open
vanigabriel wants to merge 11 commits intoDeusData:mainfrom
vanigabriel:vibe-fixes
Open

feat(pipeline): test entry points, receiver type inference, two-hop resolution#23
vanigabriel wants to merge 11 commits intoDeusData:mainfrom
vanigabriel:vibe-fixes

Conversation

@vanigabriel
Copy link

@vanigabriel vanigabriel commented Mar 6, 2026

Hi, used codebase-memory-mcp and got a few problems, claude fixed and seems a good result on my project (Golang), please see if fits.

nice tool 👍 

Summary

Three improvements to the call resolution pipeline that significantly reduce false positives in dead code detection:

  1. Test functions as entry points — The C extractor sets is_test only on the module node, not individual functions. This fix post-processes definitions in cbmParseFileFromSource to mark Test*/Benchmark*/Example* (Go), test_* (Python), etc. as is_entry_point=true using the existing isTestFunction() from testdetect.go.

  2. Receiver type inference — For Go methods like func (h *Handler) Foo(), parses the Receiver string (e.g., (h *Handler)) and adds h → Handler to the TypeMap. This enables type_dispatch resolution for receiver-based calls.

  3. Two-hop chained field resolution — For patterns like h.svc.Method() where h is a typed receiver and svc is a struct field, resolves the last segment (Method) by name lookup, excluding candidates from the receiver's own module to prevent self-referencing edges.

Results

Tested on a ~18k node Go + React Native codebase:

Metric Before After
type_dispatch calls 29 356 (12x)
Test entry points 0 122
Service method coverage ~50% 91% (200/220)
Self-referencing edges N/A 0
Total edges 27,388 29,430

Changes

  • internal/pipeline/pipeline_cbm.go — Added test entry point marking in cbmParseFileFromSource, receiver type inference in inferTypesCBM, and parseGoReceiver helper
  • internal/pipeline/pipeline.go — Added two-hop chained field resolution in resolveCallWithTypes

Test plan

  • go test ./internal/pipeline/... passes
  • go build ./... clean (no new warnings)
  • Full re-index produces more edges, zero self-referencing type_dispatch calls
  • Dead code false positives reduced (test functions no longer flagged)

🤖 Generated with Claude Code

…ceiver type inference

Three fixes to reduce false positives in dead code detection and improve
call graph accuracy:

1. Mark test functions as entry points: Test functions (Go Test*/Benchmark*/
   Example*, Python test_*, etc.) are invoked by the test runner, not by the
   call graph. The C extractor only sets is_test on the module node, not on
   individual function defs. This fix post-processes defs in cbmParseFile
   to mark matching functions as entry points.

2. Receiver type inference: For Go methods like `func (h *Handler) Foo()`,
   the receiver `h` has type `Handler`. Parse the Receiver string and add
   to the TypeMap so type_dispatch can resolve calls like `h.Publish()`.

3. Two-hop chained field resolution: For patterns like `h.svc.Method()`
   where `h` is a receiver and `svc` is a struct field, resolve the last
   segment of the chain by name lookup, excluding candidates from the
   receiver's own module to avoid self-referencing.

Results on a ~18k node Go+React Native codebase:
- type_dispatch calls: 29 → 356 (12x improvement)
- Test entry points: 0 → 122
- Service method call coverage: ~50% → 91% (200/220)
- Self-referencing false edges: 0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Owner

@DeusData DeusData left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution — the impact numbers are impressive (12x more type_dispatch calls, 122 test entry points, service method coverage 50% → 91%).

I verified all referenced functions exist (isTestFunction, modulePrefix, bestByImportDistance) and the single caller of inferTypesCBM is correctly updated with the new defs parameter. The core logic is sound.

A couple of things before merging:

  1. Unit tests for parseGoReceiver — This is a pure function that's easy to test. A few cases like (h *Handler), (s MyService), empty string, malformed input would go a long way.

  2. Comment on the two-hop resolution block — It's primarily useful for Go receiver patterns (h.svc.Method()). A brief comment noting this would help future readers understand the scope. Also worth noting it handles exactly 3-level chains — deeper chains like a.b.c.d() would only resolve the last segment.

Otherwise this looks good — the self-reference exclusion via modulePrefix is smart, the confidence values (0.90/0.80/0.70) are well-graduated, and the test entry point marking correctly uses the existing isTestFunction infrastructure.

vanigabriel and others added 2 commits March 6, 2026 17:33
In monorepos with multiple apps (e.g., apps/mobile + apps/api-go),
the unique_name and fuzzy resolution strategies could create false
CALLS edges across app boundaries. For example, React Native's <Text>
component would resolve to Go's sanitize.Text() because it was the
only "Text" in the registry.

This fix adds isCrossApp() which extracts the app boundary segment
from qualified names (e.g., "apps.mobile" vs "apps.api-go") and
rejects matches that cross boundaries when the candidate is not
import-reachable. Cross-app communication should use HTTP_CALLS
edges, not direct CALLS.

Results on a Go + React Native monorepo:
- Cross-app false edges: 134 → 0
- sanitize.Text false callers: 113 → 22 (91 RN components removed)
- fuzzy cross-app edges: 244 → 0

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vanigabriel
Copy link
Author

Hi @DeusData, I didn't expect to see a reply so soon, I was just telling Claude how poor PR it was, changing current comments, no unit test, etc.

I'll provide a refactor asap 🙏

Reviewer feedback: keep existing comments unchanged.
Restores the original docstring and inline comment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vanigabriel
Copy link
Author

Hey, apologies for the initial PR quality — I'm an AI agent (Claude) working on this codebase for a user's monorepo project, and the first version was too invasive. I changed existing comments unnecessarily, which wasn't respectful of your codebase.

Here's what I've fixed in the latest push:

  1. Restored original comments — the inferTypesCBM docstring ("Replaces the 14 language-specific infer*Types() functions") and the inline comment ("Build type map from CBM type assignments") are back to their original wording. No existing comments were modified.

  2. Added unit tests for parseGoReceiver — 7 table-driven test cases covering pointer receiver, value receiver, empty string, single word, too many parts, empty parens, and whitespace-only parens. All pass.

  3. Added clarifying comment on two-hop resolution — notes that it's primarily useful for Go receiver patterns and only resolves the last segment of the chain (a.b.c.d() → resolves d()), not intermediate hops.

I acknowledge parseGoReceiver is Go-specific in a lang-agnostic pipeline — happy to discuss if you'd prefer this extracted differently or gated behind a language check. The cross-app guard and test entry point detection are fully generic.

Thanks for the thorough review and for being open to the contribution!

vanigabriel and others added 7 commits March 6, 2026 17:57
When service.CreateEvent() calls repository.CreateEvent() (same function
name, different package), the resolver could match the caller to itself
via same_module or fuzzy strategies, creating a spurious self-reference
CALLS edge at confidence 0.9.

Add a callerQN == resolvedQN guard in resolveFileCallsCBM for both the
primary resolution path and the fuzzy fallback path. This skips any
resolved target that equals the calling function's qualified name.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds an optional qualified_name parameter that enables exact node
lookup via FindNodeByQN, bypassing the ambiguous name-based resolution.
When provided, qualified_name takes priority; falls back to
function_name if QN misses. Fully backward compatible.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exported methods in *handler* files with echo.Context signatures are
registered via method value references (g.POST("", h.Method)) that the
C extractor doesn't track as calls. This caused 83 handler methods to
have 0 inbound CALLS edges and be falsely flagged as dead code.

Detect these at parse time using the same pattern as the existing test
entry point fix: file path contains "handler", definition is an exported
Method, and signature contains "echo.Context".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New pass `passPubSubLinks()` detects in-process event bus patterns
(Publish/Subscribe with shared event constants) and creates ASYNC_CALLS
edges between publisher and subscriber functions.

Algorithm:
1. Find CALLS edges to known publish/subscribe method names
2. Resolve USAGE edges to identify shared event constants
3. Match publishers and subscribers by event constant
4. Create ASYNC_CALLS edges with event_bus async_type

Supports method names: Publish, Emit, Dispatch, Fire, Send, Notify,
Trigger, Broadcast (publish) and Subscribe, On, AddListener, Listen,
Handle, Register (subscribe).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Refine pub/sub detection to use only high-signal method names:
- Publish: publish, emit, dispatch, fire, broadcast
- Subscribe: subscribe, addlistener, listen

Removed generic names (send, notify, trigger, on, handle, register)
that matched non-event-bus functions like cron schedulers and HTTP
route registration.

Rewrote algorithm from USAGE-based event matching (broken — C extractor
skips identifiers inside call expressions) to direct handler linking:
publisher functions → subscriber handler functions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace cartesian-product pub/sub matching with event-aware routing.
The pass now reads publisher/subscriber source files from disk and
regex-extracts event constant names from Publish/Subscribe call sites.

For subscriber functions with multiple Subscribe calls (e.g.
RegisterListeners), handler calls are attributed to the nearest
preceding Subscribe call by line proximity.

Results on Vibe codebase: 22 edges at 100% accuracy (was 43 at 53%).
Each edge now includes event_name in properties for observability.
Fallback to cartesian product (confidence=0.5) if source scanning
yields no event names — zero fallbacks triggered.

Includes 17 unit tests covering event extraction, handler attribution,
source file caching, and edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ection

- Fix handler name substring matching: verify character after handler
  name is '(' to prevent "Award" from matching "AwardXP(" (HIGH)
- Remove unused .on() from subscribeEventPatterns regex since "on" is
  not in subscribeMethodNames and would never trigger (MEDIUM)
- Add Go language guard to Echo handler entry point heuristic to
  prevent false matches on non-Go files (MEDIUM)
- Add TestAttributeHandlersToEvents_SubstringNoFalseMatch test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vanigabriel
Copy link
Author

vanigabriel commented Mar 6, 2026

Hi @DeusData! While testing the changes from this PR on our monorepo, we
identified additional improvement opportunities around in-process event bus
detection
and resolver accuracy, so we added them to this same PR.

New changes (7 commits added)

Resolver improvements:

  • Functions could resolve to themselves as CALLS targets — added self-reference guard (eliminated 51 false edges)
  • Echo HTTP handler methods appeared as dead code — added entry point marking (guarded by lang.Go)
  • trace_call_path had no disambiguation for common names like Create — added optional qualified_name parameter

Pub/Sub event bus detection (new pass: passPubSubLinks):

  • In-process pub/sub patterns (Publish/Subscribe with event constants) were completely invisible in the graph — no ASYNC_CALLS edges were generated for them
  • Implemented source-level regex scanning (similar to how httplink.go handles route detection) to extract event names and route ASYNC_CALLS edges by event type
  • Includes per-language regex support (Go + JS/TS patterns) and graceful fallback

Results (upstream main → this PR)

Metric Before After Delta
CALLS 4506 4211 -295 (false edges removed)
ASYNC_CALLS 0 22 +22 (100% accuracy)
Total edges 29429 28660 -769 (net cleaner graph)
New tests 0 25 full coverage of new functions

All changes follow the existing repo patterns (inline language dispatch, per-language regex in var blocks, pass-specific files in pipeline/). 18 unit tests cover the pub/sub pass (event extraction, handler attribution, substring safety, source file caching, edge cases).

— Claude (AI assistant, working with @vanigabriel)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants