feat: embedding-based entry point resolution by VictorGjn · Pull Request #130 · VictorGjn/modular-patchbay

VictorGjn · 2026-03-27T23:53:12Z

Problem

The context graph resolver is purely lexical. When the user queries "how does authentication work?" but no file contains "auth" in its path, symbol names, or headings, the resolver returns 0 entry points. The graph traversal never starts.

This is the #1 limitation documented in the architecture analysis.

Solution

Add an embedding-based resolution layer that bridges the vocabulary gap.

New file: `embeddingResolver.ts`

buildIdentity() — compact semantic fingerprint per FileNode (~100 tokens): path + exports + headings + first sentence
buildEmbeddingCache() — batch embed identities via OpenAI text-embedding-3-small (512 dims). Only re-embeds when content hash changes.
resolveHybridEntryPoints() — merge lexical + semantic scores. Drop-in replacement for resolveEntryPoints()
serializeCache() / deserializeCache() — persist cache between sessions

Updated: `index.ts`

New exports for embedding resolver
ContextGraphEngine gains:
- queryHybrid(): semantic+lexical entry points → graph traversal → packed context
- buildEmbeddings(): build/refresh embedding cache
- loadEmbeddingCache() / saveEmbeddingCache(): persistence
- Falls back to lexical-only when no embedding cache available

Updated: `types.ts`

HybridEntryPoint extends EntryPoint with lexicalScore + semanticScore
EmbeddingCacheData for serialization

Hybrid scoring

combined = lexical * 0.4 + semantic * 0.6

The 0.6 semantic weight ensures vocabulary-gap queries get resolved, while lexical still contributes for exact matches.

Usage

const engine = new ContextGraphEngine();
engine.scan(rootPath, files);

// Build embeddings once (persists, only re-embeds changed files)
await engine.buildEmbeddings(apiKey);

// Hybrid query
const packed = await engine.queryHybrid("how does auth work?", apiKey, 8000);

Cost

~$0.01 per 500 files indexed
~$0.0001 per query
Cache is content-hash-aware: incremental updates only re-embed changed files

Adds semantic resolution to bridge the vocabulary gap in the lexical-only resolver. When query terms don't appear literally in file paths/symbols/headings, embeddings find related files via cosine similarity. - buildIdentity(): compact semantic fingerprint per file - embedTexts(): batch OpenAI text-embedding-3-small (512 dims) - resolveHybridEntryPoints(): merge lexical + semantic scores - Serializable cache, only re-embeds on content hash change

- Export embedding resolver functions from public API - Add embeddingCache to ContextGraphEngine - Add queryHybrid() method: semantic+lexical → graph traversal → packed - Add buildEmbeddings(), loadEmbeddingCache(), saveEmbeddingCache() - Falls back to lexical-only when no cache available

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 47b65a2f63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-27T23:56:39Z

src/graph/embeddingResolver.ts

+  for (const [fileId, entry] of cache.entries) {
+    if (entry.embedding?.length > 0) {
+      semanticScores.set(fileId, cosineSimilarity(queryEmbedding, entry.embedding));
+    }


Ignore cache entries not present in current graph

resolveHybridEntryPoints scores every cached embedding without checking whether that fileId still exists in graph.nodes. If a stale or wrong cache is loaded (for example after switching repos or loading an old cache file), unrelated IDs can occupy the top-K results, and traverseGraph later drops them as missing nodes, which can leave queryHybrid() with little or no usable context even when lexical matches exist. Filter semantic scoring/merging to IDs that are present in the current graph before ranking.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-27T23:56:39Z

src/graph/embeddingResolver.ts

+    if (root.firstSentence) {
+      parts.push(`Purpose: ${root.firstSentence}`);


Read purpose sentence from tree metadata

buildIdentity reads root.firstSentence, but TreeNode stores that value under root.meta.firstSentence. As written, this branch never adds the file purpose text, so identity strings lose a key semantic signal and hybrid retrieval quality drops for files that rely on prose context. This should reference root.meta?.firstSentence.

Useful? React with 👍 / 👎.

VictorGjn added 3 commits March 28, 2026 00:51

feat: add HybridEntryPoint and EmbeddingCacheData types

49fb822

chatgpt-codex-connector bot reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: embedding-based entry point resolution#130

feat: embedding-based entry point resolution#130
VictorGjn wants to merge 3 commits intomasterfrom
feat/embedding-resolver

VictorGjn commented Mar 27, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (root.firstSentence) {
		parts.push(`Purpose: ${root.firstSentence}`);

Conversation

VictorGjn commented Mar 27, 2026

Problem

Solution

New file: embeddingResolver.ts

Updated: index.ts

Updated: types.ts

Hybrid scoring

Usage

Cost

Related

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New file: `embeddingResolver.ts`

Updated: `index.ts`

Updated: `types.ts`