Skip to content

feat: support .codegraphignore for excluding files from index#360

Open
Krislu1221 wants to merge 1 commit into
colbymchenry:mainfrom
Krislu1221:feat/add-codegraphignore-support
Open

feat: support .codegraphignore for excluding files from index#360
Krislu1221 wants to merge 1 commit into
colbymchenry:mainfrom
Krislu1221:feat/add-codegraphignore-support

Conversation

@Krislu1221
Copy link
Copy Markdown

Motivation

CodeGraph currently indexes every file in the project (aside from what .gitignore excludes). In many real-world projects this means significant noise:

  • Monorepos with .sandbox-home/, dist/, build/ directories
  • Python projects with .venv/, venv/, __pycache__/
  • Node projects with node_modules/
  • Large test fixtures or generated assets that should not be in the code graph

Users do not want to add these to .gitignore because they are either git-tracked or serve a different purpose. They want CodeGraph to skip them independently.

In my case, 97% of the 7175 indexed files were third-party dependencies under .sandbox-home/ — making codegraph context and codegraph query results dominated by noise.

Solution

Add .codegraphignore support using the same gitignore syntax (the ignore npm package is already a dependency). Patterns from .gitignore and .codegraphignore are OR-d together — a file is excluded if it matches a pattern in either file.

Changes

  1. scanDirectoryWalk (filesystem fallback path): loadIgnore() now reads both .gitignore and .codegraphignore from each directory level. When both exist, patterns from both are merged into one matcher.

  2. scanDirectory (git fast path): Applies root .codegraphignore as a second-pass filter after git ls-files. Since the git path already respects .gitignore via git ls-files, only .codegraphignore patterns need to be checked additionally.

  3. scanDirectoryAsync: Same treatment as scanDirectory.

Usage

Create a .codegraphignore file in the project root (or any subdirectory for the walk path):

# .codegraphignore — CodeGraph-specific ignores
# Uses standard .gitignore syntax

# Python virtual environments
.venv/
venv/
*.venv/*

# Node.js dependencies
node_modules/

# Build artifacts
dist/
build/
__pycache__/

# Sandbox directories
.sandbox-home/

Then run codegraph index --force to rebuild the index with the new exclusion rules.

Testing

  • Builds successfully (tsc + copy-assets)
  • Existing test suite: 31/35 test files pass (710/778 tests). The 6 failures are pre-existing flaky tests unrelated to this change (MCP roots timeout, framework integration tests).
  • Manual verification: created .codegraphignore with .sandbox-home/ pattern in a test project, confirmed the directory is excluded from codegraph files --json output.

Adds .codegraphignore support alongside .gitignore for fine-grained
control over which files CodeGraph indexes.

Motivation:
  - .gitignore affects git tracking, not just CodeGraph
  - Users often want to exclude build artifacts, sandboxes, or large
    dependency directories from the code graph without changing their
    .gitignore
  - Example: monorepo with .sandbox-home/, node_modules/ that should
    not be indexed

Changes:
  - scanDirectoryWalk (fs fallback): loadIgnore() now reads both
    .gitignore and .codegraphignore, OR-ing patterns from both
  - scanDirectory (git fast path): applies root .codegraphignore as
    a second-pass filter after git ls-files
  - scanDirectoryAsync: same treatment as scanDirectory

.gitignore syntax rules are used for .codegraphignore, consistent
with how the tool already parses .gitignore files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants