Skip to content

Commit ece0d99

Browse files
authored
Merge pull request #18 from optave/feat/registry-hardening
feat: harden multi-repo registry and add structural analysis
2 parents 3fd551a + 85484e8 commit ece0d99

19 files changed

Lines changed: 1968 additions & 57 deletions

CLAUDE.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ JS source is plain JavaScript (ES modules) in `src/`. No transpilation step. The
4545
| `queries.js` | Query functions: symbol search, file deps, impact analysis, diff-impact; `SYMBOL_KINDS` constant defines all node kinds |
4646
| `embedder.js` | Semantic search with `@huggingface/transformers`; multi-query RRF ranking |
4747
| `db.js` | SQLite schema and operations (`better-sqlite3`) |
48-
| `mcp.js` | MCP server exposing graph queries to AI agents |
48+
| `mcp.js` | MCP server exposing graph queries to AI agents; single-repo by default, `--multi-repo` to enable cross-repo access |
4949
| `cycles.js` | Circular dependency detection |
5050
| `export.js` | DOT/Mermaid/JSON graph export |
5151
| `watcher.js` | Watch mode for incremental rebuilds |
@@ -66,6 +66,7 @@ JS source is plain JavaScript (ES modules) in `src/`. No transpilation step. The
6666
- Non-required parsers (all except JS/TS/TSX) fail gracefully if their WASM grammar is unavailable
6767
- Import resolution uses a 6-level priority system with confidence scoring (import-aware → same-file → directory → parent → global → method hierarchy)
6868
- Incremental builds track file hashes in the DB to skip unchanged files
69+
- **MCP single-repo isolation:** `startMCPServer` defaults to single-repo mode — tools have no `repo` property and `list_repos` is not exposed. Passing `--multi-repo` or `--repos` to the CLI (or `options.multiRepo` / `options.allowedRepos` programmatically) enables multi-repo access. `buildToolList(multiRepo)` builds the tool list dynamically; the backward-compatible `TOOLS` export equals `buildToolList(true)`
6970
- **Credential resolution:** `loadConfig` pipeline is `mergeConfig → applyEnvOverrides → resolveSecrets`. The `apiKeyCommand` config field shells out to an external secret manager via `execFileSync` (no shell). Priority: command output > env var > file config > defaults. On failure, warns and falls back gracefully
7071

7172
**Database:** SQLite at `.codegraph/graph.db` with tables: `nodes`, `edges`, `metadata`, `embeddings`
@@ -94,9 +95,25 @@ Releases are triggered via the `publish.yml` workflow (`workflow_dispatch`). By
9495

9596
The workflow can be overridden with a specific version via the `version-override` input. Locally, `npm run release:dry-run` previews the bump and changelog.
9697

98+
## Dogfooding — codegraph on itself
99+
100+
Codegraph is **our own tool**. Use it to analyze this repository before making changes:
101+
102+
```bash
103+
node src/cli.js build . # Build/update the graph
104+
node src/cli.js cycles # Check for circular dependencies
105+
node src/cli.js map --limit 20 # Module overview & coupling hotspots
106+
node src/cli.js diff-impact main # See impact of current branch changes
107+
node src/cli.js fn <name> # Trace function-level dependency chains
108+
node src/cli.js deps src/<file>.js # See what imports/depends on a file
109+
```
110+
111+
If codegraph reports an error, crashes, or produces wrong results when analyzing itself, **fix the bug in the codebase** — don't just work around it. This is the best way to find and resolve real issues.
112+
97113
## Git Conventions
98114

99115
- Never add AI co-authorship lines (`Co-Authored-By` or similar) to commit messages.
116+
- Never add "Built with Claude Code", "Generated with Claude Code", or any variation referencing Claude Code or Anthropic to commit messages, PR descriptions, code comments, or any other output.
100117

101118
## Node Version
102119

README.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ codegraph deps src/index.ts # file-level import/export map
128128
| 📤 | **Export** | DOT (Graphviz), Mermaid, and JSON graph export |
129129
| 🧠 | **Semantic search** | Embeddings-powered natural language search with multi-query RRF ranking |
130130
| 👀 | **Watch mode** | Incrementally update the graph as files change |
131-
| 🤖 | **MCP server** | 12-tool MCP server with multi-repo support for AI assistants |
131+
| 🤖 | **MCP server** | 13-tool MCP server for AI assistants; single-repo by default, opt-in multi-repo |
132132
| 🔒 | **Fully local** | No network calls, no data exfiltration, SQLite-backed |
133133

134134
## 📦 Commands
@@ -215,7 +215,7 @@ The model used during `embed` is stored in the database, so `search` auto-detect
215215

216216
### Multi-Repo Registry
217217

218-
Manage a global registry of codegraph-enabled projects. AI agents can query any registered repo from a single MCP session using the `repo` parameter.
218+
Manage a global registry of codegraph-enabled projects. The registry stores paths to your built graphs so the MCP server can query them when multi-repo mode is enabled.
219219

220220
```bash
221221
codegraph registry list # List all registered repos
@@ -230,9 +230,13 @@ codegraph registry remove <name> # Unregister
230230
### AI Integration
231231

232232
```bash
233-
codegraph mcp # Start MCP server for AI assistants
233+
codegraph mcp # Start MCP server (single-repo, current project only)
234+
codegraph mcp --multi-repo # Enable access to all registered repos
235+
codegraph mcp --repos a,b # Restrict to specific repos (implies --multi-repo)
234236
```
235237

238+
By default, the MCP server only exposes the local project's graph. AI agents cannot access other repositories unless you explicitly opt in with `--multi-repo` or `--repos`.
239+
236240
### Common Flags
237241

238242
| Flag | Description |
@@ -324,13 +328,17 @@ Benchmarked on a ~3,200-file TypeScript project:
324328

325329
### MCP Server
326330

327-
Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server with 12 tools, so AI assistants can query your dependency graph directly:
331+
Codegraph includes a built-in [Model Context Protocol](https://modelcontextprotocol.io/) server with 13 tools, so AI assistants can query your dependency graph directly:
328332

329333
```bash
330-
codegraph mcp
334+
codegraph mcp # Single-repo mode (default) — only local project
335+
codegraph mcp --multi-repo # Multi-repo — all registered repos accessible
336+
codegraph mcp --repos a,b # Multi-repo with allowlist
331337
```
332338

333-
All MCP tools accept an optional `repo` parameter to target any registered repository. Use `list_repos` to see available repos. When `repo` is omitted, the local `.codegraph/graph.db` is used (backwards compatible).
339+
**Single-repo mode (default):** Tools operate only on the local `.codegraph/graph.db`. The `repo` parameter and `list_repos` tool are not exposed to the AI agent.
340+
341+
**Multi-repo mode (`--multi-repo`):** All tools gain an optional `repo` parameter to target any registered repository, and `list_repos` becomes available. Use `--repos` to restrict which repos the agent can access.
334342

335343
### CLAUDE.md / Agent Instructions
336344

docs/dogfooding-guide.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# Codegraph Dogfooding Guide
2+
3+
Codegraph analyzing its own codebase. This guide documents findings from self-analysis and lists improvements — both automated fixes already applied and items requiring human judgment.
4+
5+
## Running the Self-Analysis
6+
7+
```bash
8+
# Build the graph (from repo root)
9+
node src/cli.js build .
10+
11+
# Core analysis commands
12+
node src/cli.js cycles # Circular dependency check
13+
node src/cli.js cycles --functions # Function-level cycles
14+
node src/cli.js map --limit 20 --json # Module coupling overview
15+
node src/cli.js diff-impact main --json # Impact of current branch
16+
node src/cli.js deps src/<file>.js # File dependency inspection
17+
node src/cli.js fn <name> # Function call chain trace
18+
node src/cli.js fn-impact <name> # What breaks if function changes
19+
```
20+
21+
## Action Items
22+
23+
These findings require human judgment to address properly:
24+
25+
### HIGH PRIORITY
26+
27+
#### 1. parser.js is a 2200+ line monolith (47 function definitions)
28+
**Found by:** `codegraph deps src/parser.js` and `codegraph map`
29+
30+
`parser.js` has the highest fan-in (14 files import it) and contains extractors for **all 11 languages** in a single file. Each language extractor (Python, Go, Rust, Java, C#, PHP, Ruby, HCL) has its own `walk()` function, creating duplicate names that confuse function-level analysis.
31+
32+
**Recommendation:** Split per-language extractors into separate files under `src/extractors/`:
33+
```
34+
src/extractors/
35+
javascript.js # JS/TS/TSX extractor (currently inline)
36+
python.js # extractPythonSymbols + findPythonParentClass + walk
37+
go.js # extractGoSymbols + walk
38+
rust.js # extractRustSymbols + extractRustUsePath + walk
39+
java.js # extractJavaSymbols + findJavaParentClass + walk
40+
csharp.js # extractCSharpSymbols + extractCSharpBaseTypes + walk
41+
ruby.js # extractRubySymbols + findRubyParentClass + walk
42+
php.js # extractPHPSymbols + findPHPParentClass + walk
43+
hcl.js # extractHCLSymbols + walk
44+
```
45+
**Impact:** Would improve codegraph's own function-level analysis (no more ambiguous `walk` matches), make each extractor independently testable, and reduce the cognitive load of the file.
46+
47+
**Trade-off:** The Rust native engine already has this structure (`crates/codegraph-core/src/extractors/`). Aligning the WASM extractors would create parity.
48+
49+
50+
### MEDIUM PRIORITY
51+
52+
#### 3. builder.js has the highest fan-out (7 dependencies)
53+
**Found by:** `codegraph map`
54+
55+
`builder.js` imports from 7 modules: config, constants, db, logger, parser, resolve, and structure. As the build orchestrator this is somewhat expected, but it also means any change to builder.js has wide blast radius.
56+
57+
**Recommendation:** Consider whether the `structure.js` integration (already lazy-loaded via dynamic import) pattern could apply to other optional post-build steps.
58+
59+
#### 4. watcher.js fan-out vs fan-in imbalance (5 out, 2 in)
60+
**Found by:** `codegraph map`
61+
62+
The watcher depends on 5 modules but only 2 modules reference it. This suggests it might be pulling in more than it needs.
63+
64+
**Recommendation:** Review whether watcher.js can use more targeted imports or lazy-load some dependencies.
65+
66+
#### 5. diff-impact runs git in temp directories (test fragility)
67+
**Found by:** Integration test output showing `git diff --no-index` errors in temp directories
68+
69+
The `diff-impact` command runs `git diff` which fails in non-git temp directories used by tests. The error output is noisy but doesn't fail the test.
70+
71+
**Recommendation:** Guard the git call or skip gracefully when not in a git repo.
72+
73+
### LOW PRIORITY
74+
75+
#### 6. Consider adding a `codegraph stats` command
76+
There's no single command that shows a quick overview of graph health: node/edge counts, cycle count, top coupling hotspots, fan-out outliers. Currently you need to run `map`, `cycles`, and read the build output separately.
77+
78+
#### 7. Embed and search the codebase itself
79+
Running `codegraph embed .` and then `codegraph search "build dependency graph"` on the codegraph repo would exercise the embedding pipeline and could surface naming/discoverability issues in the API.
80+
81+
## Known Environment Issue
82+
83+
On this workstation, changes to files not already tracked as modified on the current git branch (`docs/architecture-audit`) get reverted by an external process (likely a VS Code extension). If you're applying the parser.js cycle fix, do it from a fresh branch or commit immediately.
84+
85+
## Periodic Self-Check Routine
86+
87+
Run this after significant changes:
88+
89+
```bash
90+
# 1. Rebuild the graph
91+
node src/cli.js build .
92+
93+
# 2. Check for regressions
94+
node src/cli.js cycles # Should be 0 file-level cycles
95+
node src/cli.js map --limit 10 # Verify no new coupling hotspots
96+
97+
# 3. Check impact of your changes
98+
node src/cli.js diff-impact main
99+
100+
# 4. Run tests
101+
npm test
102+
```

docs/recommended-practices.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,16 @@ Speed up CI by caching `.codegraph/`:
132132
Start the MCP server so AI assistants can query your graph:
133133

134134
```bash
135-
codegraph mcp
135+
codegraph mcp # Single-repo mode (default) — only local project
136+
codegraph mcp --multi-repo # Multi-repo — all registered repos accessible
137+
codegraph mcp --repos a,b # Multi-repo with allowlist
136138
```
137139

138-
The server exposes tools for `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, and `module_map`.
140+
By default, the MCP server runs in **single-repo mode** — the AI agent can only query the current project's graph. The `repo` parameter and `list_repos` tool are not exposed, preventing agents from silently accessing other codebases.
141+
142+
Enable `--multi-repo` to let the agent query any registered repository, or use `--repos` to restrict access to a specific set of repos.
143+
144+
The server exposes tools for `query_function`, `file_deps`, `impact_analysis`, `find_cycles`, `module_map`, `fn_deps`, `fn_impact`, `diff_impact`, `semantic_search`, `export_graph`, `list_functions`, `structure`, and `hotspots`.
139145

140146
### CLAUDE.md for your project
141147

src/builder.js

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -10,18 +10,20 @@ import { computeConfidence, resolveImportPath, resolveImportsBatch } from './res
1010

1111
export { resolveImportPath } from './resolve.js';
1212

13-
export function collectFiles(dir, files = [], config = {}) {
13+
export function collectFiles(dir, files = [], config = {}, directories = null) {
14+
const trackDirs = directories !== null;
1415
let entries;
1516
try {
1617
entries = fs.readdirSync(dir, { withFileTypes: true });
1718
} catch (err) {
1819
warn(`Cannot read directory ${dir}: ${err.message}`);
19-
return files;
20+
return trackDirs ? { files, directories } : files;
2021
}
2122

2223
// Merge config ignoreDirs with defaults
2324
const extraIgnore = config.ignoreDirs ? new Set(config.ignoreDirs) : null;
2425

26+
let hasFiles = false;
2527
for (const entry of entries) {
2628
if (entry.name.startsWith('.') && entry.name !== '.') {
2729
if (IGNORE_DIRS.has(entry.name)) continue;
@@ -32,12 +34,16 @@ export function collectFiles(dir, files = [], config = {}) {
3234

3335
const full = path.join(dir, entry.name);
3436
if (entry.isDirectory()) {
35-
collectFiles(full, files, config);
37+
collectFiles(full, files, config, directories);
3638
} else if (EXTENSIONS.has(path.extname(entry.name))) {
3739
files.push(full);
40+
hasFiles = true;
3841
}
3942
}
40-
return files;
43+
if (trackDirs && hasFiles) {
44+
directories.add(dir);
45+
}
46+
return trackDirs ? { files, directories } : files;
4147
}
4248

4349
export function loadPathAliases(rootDir) {
@@ -163,7 +169,9 @@ export async function buildGraph(rootDir, opts = {}) {
163169
);
164170
}
165171

166-
const files = collectFiles(rootDir, [], config);
172+
const collected = collectFiles(rootDir, [], config, new Set());
173+
const files = collected.files;
174+
const discoveredDirs = collected.directories;
167175
console.log(`Found ${files.length} files to parse`);
168176

169177
// Check for incremental build
@@ -179,23 +187,28 @@ export async function buildGraph(rootDir, opts = {}) {
179187

180188
if (isFullBuild) {
181189
db.exec(
182-
'PRAGMA foreign_keys = OFF; DELETE FROM edges; DELETE FROM nodes; PRAGMA foreign_keys = ON;',
190+
'PRAGMA foreign_keys = OFF; DELETE FROM node_metrics; DELETE FROM edges; DELETE FROM nodes; PRAGMA foreign_keys = ON;',
183191
);
184192
} else {
185193
console.log(`Incremental: ${changed.length} changed, ${removed.length} removed`);
186-
// Remove nodes/edges for changed and removed files
194+
// Remove metrics/edges/nodes for changed and removed files
187195
const deleteNodesForFile = db.prepare('DELETE FROM nodes WHERE file = ?');
188196
const deleteEdgesForFile = db.prepare(`
189197
DELETE FROM edges WHERE source_id IN (SELECT id FROM nodes WHERE file = @f)
190198
OR target_id IN (SELECT id FROM nodes WHERE file = @f)
191199
`);
200+
const deleteMetricsForFile = db.prepare(
201+
'DELETE FROM node_metrics WHERE node_id IN (SELECT id FROM nodes WHERE file = ?)',
202+
);
192203
for (const relPath of removed) {
193204
deleteEdgesForFile.run({ f: relPath });
205+
deleteMetricsForFile.run(relPath);
194206
deleteNodesForFile.run(relPath);
195207
}
196208
for (const item of changed) {
197209
const relPath = item.relPath || normalizePath(path.relative(rootDir, item.file));
198210
deleteEdgesForFile.run({ f: relPath });
211+
deleteMetricsForFile.run(relPath);
199212
deleteNodesForFile.run(relPath);
200213
}
201214
}
@@ -539,6 +552,30 @@ export async function buildGraph(rootDir, opts = {}) {
539552
});
540553
buildEdges();
541554

555+
// Build line count map for structure metrics
556+
const lineCountMap = new Map();
557+
for (const [relPath] of fileSymbols) {
558+
const absPath = path.join(rootDir, relPath);
559+
try {
560+
const content = fs.readFileSync(absPath, 'utf-8');
561+
lineCountMap.set(relPath, content.split('\n').length);
562+
} catch {
563+
lineCountMap.set(relPath, 0);
564+
}
565+
}
566+
567+
// Build directory structure, containment edges, and metrics
568+
const relDirs = new Set();
569+
for (const absDir of discoveredDirs) {
570+
relDirs.add(normalizePath(path.relative(rootDir, absDir)));
571+
}
572+
try {
573+
const { buildStructure } = await import('./structure.js');
574+
buildStructure(db, fileSymbols, rootDir, lineCountMap, relDirs);
575+
} catch (err) {
576+
debug(`Structure analysis failed: ${err.message}`);
577+
}
578+
542579
const nodeCount = db.prepare('SELECT COUNT(*) as c FROM nodes').get().c;
543580
console.log(`Graph built: ${nodeCount} nodes, ${edgeCount} edges`);
544581
console.log(`Stored in ${dbPath}`);

0 commit comments

Comments
 (0)