CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Hooks enforce code quality. This project uses Claude Code hooks (.claude/hooks/) to automatically inject file-level dependency context on reads, rebuild the graph after edits, block commits with cycles or dead exports, run lint on staged files, and show diff-impact before commits. If codegraph reports an error or produces wrong results when analyzing itself, that's a real bug — don't work around it or ignore it. Flag it to the user and, if it's blocking the current task, fix it.

Never fabricate facts. Do not state licenses, version numbers, feature claims, or any factual information without first verifying it (read the file, run the command, check the source). If you don't know, say so — do not guess.

Never document bugs as expected behavior. If two engines (native vs WASM) produce different results, that is a bug in the less-accurate engine — not an acceptable "parity gap." Adding comments or tests that frame wrong output as "expected" blocks future agents from ever fixing it. Instead: identify the root cause, file an issue, and fix the extraction/resolution layer that produces incorrect results. The correct response to "engine A reports 8 cycles, engine B reports 11" is to fix the 3 false cycles in engine B, not to document why the difference is okay.

Never silently skip verification. If tests, builds, or any verification step cannot run or fails for any reason (compilation errors, platform issues, missing dependencies), STOP and report the issue to the user immediately. Never silently proceed with unverified changes. Let the user decide whether to proceed — do not make that decision yourself.

Codegraph Workflow

Hooks handle: file-level deps on reads, graph rebuild after edits, commit-time checks (cycles, dead exports, diff-impact, lint). Use these for function-level understanding when modifying source code:

Before modifying code:

Pick the commands that fit the situation — don't run all four mechanically:

codegraph where <name> — find where the symbol lives
codegraph audit --quick <target> — understand the structure
codegraph context <name> -T — get full context (source, deps, callers)
codegraph fn-impact <name> -T — check blast radius before editing

Skip the above commands for non-code files, trivial edits, or when you already have sufficient context.

After modifying code:

codegraph diff-impact --staged -T — verify impact before committing

Navigation

codegraph where --file <path> — file inventory (symbols, imports, exports)
codegraph query <name> -T — function call chain (callers + callees)
codegraph path <from> <to> -T — shortest call path between two symbols
codegraph exports <file> -T — per-symbol export consumers
codegraph children <name> -T — sub-declarations (parameters, properties, constants)
codegraph search "<query>" — semantic search (requires codegraph embed)
codegraph ast --kind call <name> -T — find all call sites of a function

Impact & analysis

codegraph diff-impact main -T — impact of branch vs main
codegraph audit <target> -T — structural summary + impact + health in one report
codegraph triage -T — ranked audit priority queue
codegraph complexity -T — per-function complexity metrics
codegraph batch t1 t2 t3 -T --json — batch query multiple targets

Overview & health

codegraph map — module overview (most-connected files)
codegraph stats — graph health and quality score
codegraph structure --depth 2 — directory tree with cohesion scores
codegraph roles --role dead -T — find dead code (unreferenced symbols)
codegraph roles --role core -T — find core symbols (high fan-in)
codegraph branch-compare main HEAD -T — structural diff between refs

Flags

-T — exclude test files (use by default) · -j — JSON output
-f, --file <path> — scope to file · -k, --kind <kind> — filter kind

Project Overview

Codegraph (@optave/codegraph) is a local code dependency graph CLI. It parses codebases with tree-sitter (WASM), builds function-level dependency graphs stored in SQLite, and supports semantic search with local embeddings. No cloud services required.

Languages supported (23): JavaScript, TypeScript, TSX, Python, Go, Rust, Java, C#, Ruby, PHP, C, C++, Kotlin, Swift, Scala, Bash, Elixir, Lua, Dart, Zig, Haskell, OCaml, Terraform/HCL. LANGUAGE_REGISTRY in domain/parser.ts is the single source of truth — check there for the current list.

Commands

npm install                      # Install dependencies
npm test                         # Run all tests (vitest)
npm run test:watch               # Watch mode
npm run test:coverage            # Coverage report
npx vitest run tests/parsers/javascript.test.ts   # Single test file
npx vitest run -t "finds cycles"                  # Single test by name
npm run build:wasm               # Rebuild WASM grammars from devDeps (built automatically on npm install)

Linter/Formatter: Biome — config in biome.json, scoped to src/ and tests/.

npm run lint                     # Check for lint + format issues
npm run lint:fix                 # Auto-fix lint + format issues
npm run format                   # Auto-format only
npm run release                  # Bump version, update CHANGELOG, create tag (auto-detects semver from commits)
npm run release:dry-run          # Preview what release would do without writing anything

Architecture

Pipeline: Source files → tree-sitter parse → extract symbols → resolve imports → SQLite DB → query/search

Source is TypeScript in src/, compiled via tsup. The Rust native engine lives in crates/codegraph-core/.

Path	Role
`cli.ts`	Commander CLI entry point (`bin.codegraph`)
`index.ts`	Programmatic API exports
`shared/`	Cross-cutting constants and utilities
`shared/constants.ts`	`EXTENSIONS` (derived from parser registry) and `IGNORE_DIRS` constants
`shared/errors.ts`	Domain error hierarchy (`CodegraphError`, `ConfigError`, `ParseError`, etc.)
`shared/kinds.ts`	Symbol and edge kind constants (`CORE_SYMBOL_KINDS`, `EVERY_SYMBOL_KIND`, `VALID_ROLES`)
`shared/paginate.ts`	Pagination helpers for bounded query results
`infrastructure/`	Platform and I/O plumbing
`infrastructure/config.ts`	`.codegraphrc.json` loading, env overrides, `apiKeyCommand` secret resolution
`infrastructure/logger.ts`	Structured logging (`warn`, `debug`, `info`, `error`)
`infrastructure/native.ts`	Native napi-rs addon loader with WASM fallback
`infrastructure/registry.ts`	Global repo registry (`~/.codegraph/registry.json`) for multi-repo MCP
`infrastructure/update-check.ts`	npm update availability check
`db/`	Database layer
`db/index.ts`	SQLite schema and operations (`better-sqlite3`)
`domain/`	Core domain logic
`domain/parser.ts`	tree-sitter WASM wrapper; `LANGUAGE_REGISTRY` + per-language extractors for functions, classes, methods, imports, exports, call sites
`domain/queries.ts`	Query functions: symbol search, file deps, impact analysis, diff-impact
`domain/graph/builder.ts`	Graph building: file collection, parsing, import resolution, incremental hashing
`domain/graph/cycles.ts`	Circular dependency detection (delegates to `graph/` subsystem)
`domain/graph/resolve.ts`	Import resolution (supports native batch mode)
`domain/graph/watcher.ts`	Watch mode for incremental rebuilds
`domain/graph/journal.ts`	Change journal for incremental builds
`domain/graph/change-journal.ts`	Change event tracking (NDJSON)
`domain/analysis/`	Query-layer analysis: context, dependencies, exports, impact, module-map, roles, symbol-lookup
`domain/search/`	Embedding subsystem: model management, vector generation, semantic/keyword/hybrid search, CLI formatting
`features/`	Composable feature modules
`features/audit.ts`	Composite audit command: explain + impact + health in one call
`features/batch.ts`	Batch querying for multi-agent dispatch
`features/boundaries.ts`	Architecture boundary rules with onion architecture preset
`features/cfg.ts`	Control-flow graph generation
`features/check.ts`	CI validation predicates (cycles, complexity, blast radius, boundaries)
`features/communities.ts`	Louvain community detection, drift analysis (delegates to `graph/` subsystem)
`features/complexity.ts`	Cognitive, cyclomatic, Halstead, MI computation from AST
`features/dataflow.ts`	Dataflow analysis
`features/export.ts`	Graph export orchestration: loads data from DB, delegates to `presentation/` serializers
`features/manifesto.ts`	Configurable rule engine with warn/fail thresholds; CI gate
`features/owners.ts`	CODEOWNERS integration for ownership queries
`features/sequence.ts`	Sequence diagram data generation (BFS traversal)
`features/snapshot.ts`	SQLite DB backup and restore
`features/structure.ts`	Codebase structure analysis
`features/triage.ts`	Risk-ranked audit priority queue (delegates scoring to `graph/classifiers/`)
`features/graph-enrichment.ts`	Data enrichment for HTML viewer (complexity, communities, fan-in/out)
`presentation/`	Pure output formatting + CLI command wrappers
`presentation/viewer.ts`	Interactive HTML renderer with vis-network
`presentation/queries-cli/`	CLI display wrappers for query functions, split by concern: `path.ts`, `overview.ts`, `inspect.ts`, `impact.ts`, `exports.ts`
`presentation/*.ts`	Command formatters (audit, batch, check, communities, complexity, etc.) — call `features/*.ts`, format output, set exit codes
`presentation/export.ts`	DOT/Mermaid/GraphML/Neo4j serializers
`presentation/sequence-renderer.ts`	Mermaid sequence diagram rendering
`presentation/table.ts`, `result-formatter.ts`, `colors.ts`	CLI table formatting, JSON/NDJSON output, color constants
`graph/`	Unified graph model
`graph/`	`CodeGraph` class (`model.ts`), algorithms (Tarjan SCC, Louvain, BFS, shortest path, centrality), classifiers (role, risk), builders (dependency, structure, temporal)
`mcp/`	MCP server
`mcp/`	MCP server exposing graph queries to AI agents; single-repo by default, `--multi-repo` to enable cross-repo access
`ast-analysis/`	Unified AST analysis framework: shared DFS walker (`visitor.ts`), engine orchestrator (`engine.ts`), extracted metrics (`metrics.ts`), and pluggable visitors for complexity, dataflow, and AST-store

Key design decisions:

Dual-engine architecture: Native Rust parsing via napi-rs (crates/codegraph-core/) with automatic fallback to WASM. Controlled by --engine native|wasm|auto (default: auto). Both engines must produce identical results. If they diverge, the less-accurate engine has a bug — fix it, don't document the gap
Platform-specific prebuilt binaries published as optional npm packages (@optave/codegraph-{platform}-{arch})
WASM grammars are built from devDeps on npm install (via prepare script) and not committed to git — used as fallback when native addon is unavailable
Language parser registry: LANGUAGE_REGISTRY in domain/parser.ts is the single source of truth for all supported languages — maps each language to { id, extensions, grammarFile, extractor, required }. EXTENSIONS in shared/constants.ts is derived from the registry. Adding a new language requires one registry entry + extractor function
Node kinds: SYMBOL_KINDS in domain/queries.ts lists all valid kinds: function, method, class, interface, type, struct, enum, trait, record, module. Language-specific types use their native kind (e.g. Go structs → struct, Rust traits → trait, Ruby modules → module) rather than mapping everything to class/interface
@huggingface/transformers and @modelcontextprotocol/sdk are optional dependencies, lazy-loaded
Non-required parsers (all except JS/TS/TSX) fail gracefully if their WASM grammar is unavailable
Import resolution uses a 6-level priority system with confidence scoring (import-aware → same-file → directory → parent → global → method hierarchy)
Incremental builds track file hashes in the DB to skip unchanged files
MCP single-repo isolation: startMCPServer defaults to single-repo mode — tools have no repo property and list_repos is not exposed. Passing --multi-repo or --repos to the CLI (or options.multiRepo / options.allowedRepos programmatically) enables multi-repo access. buildToolList(multiRepo) builds the tool list dynamically; the backward-compatible TOOLS export equals buildToolList(true)
Credential resolution: loadConfig pipeline is mergeConfig → applyEnvOverrides → resolveSecrets. The apiKeyCommand config field shells out to an external secret manager via execFileSync (no shell). Priority: command output > env var > file config > defaults. On failure, warns and falls back gracefully

Configuration: All tunable behavioral constants live in DEFAULTS in src/infrastructure/config.ts, grouped by concern (analysis, risk, search, display, community, structure, mcp, check, coChange, manifesto). Users override via .codegraphrc.json — mergeConfig deep-merges recursively so partial overrides preserve sibling keys. Env vars override LLM settings (CODEGRAPH_LLM_*). When adding new behavioral constants, always add them to DEFAULTS and wire them through config — never introduce new hardcoded magic numbers in individual modules. Category F values (safety boundaries, standard formulas, platform concerns) are the only exception.

Database: SQLite at .codegraph/graph.db with tables: nodes, edges, metadata, embeddings, function_complexity

Test Structure

Tests use vitest with 30s timeout and globals enabled.

tests/
├── integration/          # buildGraph + all query commands
├── graph/                # Cycle detection, DOT/Mermaid export
├── parsers/              # Language parser extraction
├── search/               # Semantic search + embeddings
├── fixtures/sample-project/  # ES module fixture (math.js, utils.js, index.js)
└── benchmarks/resolution/
    ├── resolution-benchmark.test.ts   # Static resolution precision/recall vs expected-edges manifests
    ├── fixtures/<lang>/       # Hand-annotated fixture projects (34 languages) with expected-edges.json
    └── tracer/                # Dynamic call tracers — per-language runtime instrumentation
        ├── loader-hook.mjs    #   ESM loader hook (JS/TS): @babel/parser AST → position-based instrumentation
        ├── run-tracer.mjs     #   CLI: imports entry module, calls exports, dumps edges as JSON
        └── <lang>-tracer.*    #   Language-specific tracers (python-tracer.py, ruby-tracer.rb, etc.)

Integration tests create a temp copy of the fixture project for isolation.

Release Process

Releases are triggered via the publish.yml workflow (workflow_dispatch). By default, commit-and-tag-version auto-detects the semver bump from commit history since the last tag:

BREAKING CHANGE footer or type!: → major
feat: → minor
everything else → patch

The workflow can be overridden with a specific version via the version-override input. Locally, npm run release:dry-run previews the bump and changelog.

Hooks

Codegraph is our own tool — hooks in .claude/hooks/ use it to enforce quality automatically:

Hook	What it does
`enrich-context.sh`	Injects file deps on every Read/Grep (passive context)
`pre-commit.sh`	Blocks commits with cycles or dead exports; warns on signature changes; shows diff-impact
`lint-staged.sh`	Blocks commits with lint errors in session-edited files
`guard-git.sh`	Blocks dangerous git commands; validates commits against edit log
`update-graph.sh`	Rebuilds graph after edits

See docs/examples/claude-code-hooks/README.md for details.

Parallel Sessions

Multiple Claude Code instances may run concurrently in this repo. Before making any code changes, run /worktree to get an isolated copy. This prevents cross-session interference. Skip the worktree for read-only tasks (reviews, analysis, questions).

Safety hooks enforce these rules automatically — see the Hooks section above.

Rules:

Run /worktree before making changes (not needed for read-only tasks)
Sync with origin/main before starting new feature work. Run git fetch origin && git log --oneline origin/main -10 to check recent merges. If starting fresh and the branch is behind main, create a new branch from origin/main. When continuing existing PR work, sync only if needed.
Stage only files you explicitly changed
Commit with specific file paths: git commit <files> -m "msg"
Ignore unexpected dirty files — they belong to another session
Do not clean up lint/format issues in files you aren't working on

Git Conventions

Never add AI co-authorship lines (Co-Authored-By or similar) to commit messages.
Never add "Built with Claude Code", "Generated with Claude Code", or any variation referencing Claude Code or Anthropic to commit messages, PR descriptions, code comments, or any other output.
One PR = one concern. Each pull request should address a single feature, fix, or refactor. Do not pile unrelated changes into an existing PR — open a new branch and PR instead. If scope grows during implementation, split the work into separate PRs before pushing.

PR Reviews (Greptile)

This repo uses Greptile for automated PR reviews. After pushing fixes that address review feedback, trigger a re-review by commenting @greptileai on the PR. Do not use the GitHub "re-request review" API — Greptile only responds to the comment trigger.

Node Version

Requires Node >= 22.6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Codegraph Workflow

Before modifying code:

After modifying code:

Navigation

Impact & analysis

Overview & health

Flags

Project Overview

Commands

Architecture

Test Structure

Release Process

Hooks

Parallel Sessions

Git Conventions

PR Reviews (Greptile)

Node Version

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Codegraph Workflow

Before modifying code:

After modifying code:

Navigation

Impact & analysis

Overview & health

Flags

Project Overview

Commands

Architecture

Test Structure

Release Process

Hooks

Parallel Sessions

Git Conventions

PR Reviews (Greptile)

Node Version