This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
codetree is a Python MCP (Model Context Protocol) server that gives coding agents structured code understanding via tree-sitter. Instead of reading entire files, an agent can ask "what classes are in this file?" or "what does this function call?" and get precise, structured answers.
It exposes 23 tools over MCP:
| Tool | Purpose | Returns |
|---|---|---|
get_file_skeleton(file_path, format?) |
Classes, functions, methods with line numbers + doc comments; format="compact" omits doc lines |
class Foo → line 5, "A calculator.", def bar(x, y) (in Foo) → line 7 |
get_symbol(file_path, symbol_name) |
Full source of a function/class | # path:line\n<source code> |
find_references(symbol_name) |
All usages across the repo | file.py:12, other.py:34 |
get_call_graph(file_path, function_name) |
What it calls + what calls it | → callee, ← file.py:20 |
get_imports(file_path) |
Import/use statements with line numbers | 1: import os, 2: from pathlib import Path |
get_skeletons(file_paths, format?) |
Skeletons for multiple files in one call; format="compact" omits doc lines |
=== calc.py ===\nclass Foo → line 1 |
get_symbols(symbols) |
Full source of multiple symbols | # calc.py:1\nclass Foo: |
get_complexity(file_path, function_name) |
Cyclomatic complexity of a function | Complexity of foo() in calc.py: 5\n if: 2, for: 1 |
find_dead_code(file_path?) |
Symbols defined but never referenced | Dead code in calc.py:\n function unused() → line 15 |
get_blast_radius(file_path, symbol_name) |
Transitive impact analysis | Direct callers:\n main.py: run() → line 4 |
detect_clones(file_path?, min_lines?) |
Duplicate/near-duplicate functions | Clone group 1 (2 functions, 12 lines each): |
search_symbols(query?, type?, parent?, ..., format?) |
Flexible symbol search; format="compact" omits doc lines |
calc.py: class Calculator → line 1 |
find_tests(file_path, symbol_name) |
Find test functions for a symbol | test_calc.py: test_add() → line 3 (name match) |
| Tool | Purpose | Returns |
|---|---|---|
index_status() |
Graph index freshness and stats | {graph_exists, files, symbols, edges, last_indexed_at} |
get_repository_map(max_items?) |
Compact repo overview for agent onboarding | {languages, entry_points, hotspots, start_here, test_roots, stats} |
resolve_symbol(query, kind?, path_hint?) |
Disambiguate short symbol names into qualified matches | {matches: [{qualified_name, name, kind, file, line}]} |
search_graph(query?, kind?, file_pattern?, ...) |
Structured graph search with pagination and degree filtering | {total, results: [{qualified_name, kind, in_degree, out_degree}]} |
get_change_impact(symbol_query?, diff_scope?) |
Git-aware change impact with risk classification | {changed_symbols, impact: {CRITICAL, HIGH, MEDIUM}, affected_tests} |
analyze_dataflow(file_path, function_name, mode?, depth?) |
Variable dataflow (mode="flow"), taint analysis ("taint"), or cross-function taint ("cross_taint") |
{variables, sinks} or {paths: [{verdict, risk}]} |
find_hot_paths(top_n?) |
High-complexity × high-call-count optimization targets | file:line — name (complexity=N, callers=M, score=S) |
get_dependency_graph(file_path?, format?) |
File-level dependency graph as Mermaid or list | graph LR\n main.py --> calc.py |
git_history(mode?, file_path?, top_n?, since?, min_commits?) |
Git blame (mode="blame"), file churn ("churn"), or change coupling ("coupling") |
Author summary, churn list, or coupled file pairs |
suggest_docs(file_path?, symbol_name?) |
Find undocumented functions with context for doc generation | file:line — name(params), calls: [...], callers: [...] |
get_file_skeleton also warns about syntax errors (WARNING: File has syntax errors — skeleton may be incomplete).
All file_path arguments are relative to the repo root (e.g., "src/main.py").
| Extension(s) | Plugin | Key node types |
|---|---|---|
.py |
PythonPlugin | function_definition, class_definition, decorated_definition |
.js, .jsx |
JavaScriptPlugin | function_declaration, class_declaration, arrow_function, generator_function_declaration |
.ts |
TypeScriptPlugin | function_declaration, class_declaration, abstract_class_declaration, interface_declaration, type_alias_declaration |
.tsx |
TSXPlugin | Same as TS, uses tsts.language_tsx() |
.go |
GoPlugin | function_declaration, method_declaration, struct_type, interface_type |
.rs |
RustPlugin | function_item, struct_item, enum_item, trait_item, impl_item |
.java |
JavaPlugin | class_declaration, interface_declaration, enum_declaration, method_declaration, constructor_declaration |
.c, .h |
CPlugin | function_definition, struct_specifier, type_definition |
.cpp, .cc, .cxx, .hpp, .hh |
CppPlugin | class_specifier, function_definition, struct_specifier, namespace_definition |
.rb |
RubyPlugin | class, module, method, singleton_method |
# Activate venv (required before all commands)
source .venv/bin/activate
# Run all tests (~1058 tests, ~35s)
pytest
# Run a single test file
pytest tests/languages/test_python.py -v
# Run a single test
pytest tests/languages/test_python.py::test_skeleton_finds_class -v
# Run only comprehensive tests for one language
pytest tests/languages/test_rust_comprehensive.py -v
# Run the MCP server
codetree --root /path/to/repo
# Install in dev mode
pip install -e .No linter or formatter is configured.
MCP tool call → server.py → indexer.py → FileEntry.plugin → tree-sitter parse → structured result
→ graph/ package → SQLite .codetree/graph.db → onboarding, search, impact, dataflow
| File | Responsibility |
|---|---|
server.py |
FastMCP 3.1.0 server — defines the 23 tools, wires cache + indexer + graph at startup. Language-unaware. |
indexer.py |
Discovers files, stores a FileEntry per file (with its plugin + has_errors flag), routes all queries through the stored plugin. Builds a definition index and lazy call graph for dead code, blast radius, and clone detection. Skips .venv, node_modules, __pycache__, .git, etc. |
cache.py |
.codetree/index.json — stores pre-computed skeletons with mtime-based invalidation. Language-unaware. |
registry.py |
Maps file extensions → plugin instances. The only place languages are registered. |
| File | Responsibility |
|---|---|
models.py |
SymbolNode, Edge dataclasses and make_qualified_name() — the data model for the persistent graph. |
store.py |
GraphStore — SQLite CRUD for symbols, edges, files, meta tables. Stores graph at .codetree/graph.db. |
builder.py |
GraphBuilder — Incremental graph build from indexer data. Uses sha256 content hashing to skip unchanged files. Creates CALLS (with import-aware weight), CONTAINS, and IMPORTS edges. |
queries.py |
GraphQueries — repository_map(), resolve_symbol(), search_graph(), change_impact(), find_hot_paths(), get_dependency_graph(), suggest_docs(). Powers the onboarding/search/impact/visualization tools. |
dataflow.py |
extract_dataflow(), extract_taint_paths(), extract_cross_function_taint() — Intra- and cross-function variable flow tracking and security taint analysis using tree-sitter AST. |
git_analysis.py |
get_blame(), get_churn(), get_change_coupling() — Git history analysis tools. |
| File | Responsibility |
|---|---|
base.py |
LanguagePlugin ABC with 5 abstract methods + check_syntax(), compute_complexity(), normalize_source_for_clones(), get_ast_sexp() defaults + shared _matches(), _clean_doc(), _fill_docs_from_siblings() helpers |
python.py |
PythonPlugin |
javascript.py |
JavaScriptPlugin (also provides _arrow_params() used by TS) |
typescript.py |
TypeScriptPlugin + TSXPlugin |
go.py |
GoPlugin |
rust.py |
RustPlugin |
java.py |
JavaPlugin |
c.py |
CPlugin |
cpp.py |
CppPlugin (inherits CPlugin) |
ruby.py |
RubyPlugin |
_template.py |
Boilerplate for adding new languages |
Each plugin implements:
extract_skeleton(source: bytes) -> list[dict]— top-level classes/functions/methods with{type, name, line, parent, params, doc}extract_symbol_source(source: bytes, name: str) -> tuple[str, int] | None— full source text + start lineextract_calls_in_function(source: bytes, fn_name: str) -> list[str]— sorted callee namesextract_symbol_usages(source: bytes, name: str) -> list[dict]— all occurrences with{line, col}extract_imports(source: bytes) -> list[dict]— import statements with{line, text}check_syntax(source: bytes) -> bool— True if file has syntax errors (non-abstract, default False)compute_complexity(source: bytes, fn_name: str) -> dict | None— cyclomatic complexity{total, breakdown}(non-abstract, default None)normalize_source_for_clones(source: bytes) -> str— AST-normalized source for clone detection (non-abstract, default in base; requires_get_parser())get_ast_sexp(source: bytes, symbol_name?, max_depth?) -> str | None— S-expression AST output (non-abstract, default in base; requires_get_parser())extract_variables(source: bytes, fn_name: str) -> list[dict]— local variables with{name, line, type, kind}(non-abstract, default[])
| File | What it tests |
|---|---|
test_server.py |
Original 4 MCP tools via FastMCP, output format, line accuracy, cross-language |
test_indexer.py |
Build, skip-dirs, skeleton/symbol/refs/callgraph through indexer layer |
test_cache.py |
Cache load/save/invalidation |
tests/languages/test_<lang>.py |
Per-language core tests |
tests/languages/test_<lang>_comprehensive.py |
Exhaustive code pattern coverage per language |
test_new_features.py |
Method extraction, Rust traits/enums, Java enums, TS type aliases, indexer fixes |
test_edge_cases.py |
Empty files, syntax errors, comment-only files, unicode, nested code |
test_server_new_types.py |
MCP server formatting of trait/enum/type skeleton types |
test_imports.py |
Import extraction per-language + get_imports MCP tool |
test_docstrings.py |
Doc comment extraction per-language + skeleton doc display |
test_syntax_errors.py |
Syntax error detection per-language + skeleton warning |
test_dead_code.py |
Definition index, find_dead_code indexer method + MCP tool |
test_blast_radius.py |
Lazy call graph, get_blast_radius indexer method + MCP tool |
test_clones.py |
Clone normalization, detect_clones indexer method + MCP tool |
test_ast.py |
get_ast_sexp plugin method + get_ast indexer method |
test_search.py |
search_symbols indexer method + MCP tool |
test_token_opt.py |
Compact format for skeleton/search tools |
test_importance.py |
PageRank symbol importance (indexer-level) |
test_discovery.py |
Test function discovery |
test_variables.py |
Variable extraction per-language + indexer integration |
test_graph_store.py |
SQLite graph store CRUD — symbols, edges, files, meta, stats |
test_graph_builder.py |
Incremental graph builder — full build, incremental, changed/deleted files, test detection |
test_graph_queries.py |
Graph queries — repository map, resolve symbol, search graph |
test_onboarding_tools.py |
MCP tools: index_status, get_repository_map, resolve_symbol, search_graph |
test_change_impact.py |
Change impact — symbol-based, git-diff-based, transitive callers, risk classification |
test_dataflow.py |
Dataflow engine — variable tracking, dependency edges, taint sources/sinks, cross-function taint |
test_dataflow_tools.py |
MCP tool: analyze_dataflow (flow, taint, cross_taint modes) |
test_git_analysis.py |
Git blame, churn, change coupling — analysis functions + git_history MCP tool |
test_doc_suggestions.py |
Auto-documentation suggestions — undocumented function detection + context assembly |
Fixtures in conftest.py: sample_repo (Python-only), rich_py_repo (decorators/dataclasses), multi_lang_repo (5 languages).
The tree-sitter Python bindings have breaking changes from older docs:
- Use
Query(LANGUAGE, "...")notLANGUAGE.query(...) - Use
QueryCursor(query).matches(node)notquery.matches(node) - Match captures are
list[Node]— unwrap withnodes[0]or use the shared_matches()helper fromlanguages/base.py - All
.decode()calls must useerrors="replace"
- Python: Decorated functions/classes are wrapped in
decorated_definition— query both decorated and plain patterns. Checkdecorated_definitionFIRST inextract_symbol_sourceto include decorator lines.from __future__isfuture_import_statement(notimport_from_statement). Docstrings are in the function body (expression_statement > string), unlike all other languages where doc comments areprev_named_sibling. - JavaScript:
new Foo()is anew_expression, notcall_expression. Exported classes wrap inexport_statement. Generators usegenerator_function_declaration. Arrow functions:lexical_declaration → arrow_function. JSDoc/** */is acommentnode (same type as//). - TypeScript: Class names use
type_identifier(notidentifier). Grammar API:tsts.language_typescript()/tsts.language_tsx(). Abstract classes useabstract_class_declaration(separate fromclass_declaration). Sameexport_statementwrapping as JS. - Go/Rust/Java: Struct/type names are
type_identifier— search bothidentifierandtype_identifierin usage queries. - Go: Each
//line is a separatecommentnode. Multi-line doc comments require walking back through consecutivecommentsiblings. - Rust: Associated function calls (
Server::new()) usescoped_identifier name: (identifier)incall_expression. Trait method signatures usefunction_signature_item(notfunction_item). Enums useenum_item. Each///line is a separateline_commentnode (same walk-back needed as Go). - Java: Constructors use
constructor_declaration(notmethod_declaration). Interfaces useinterface_declarationwith methods ininterface_body. Enum methods live underenum_body > enum_body_declarations > method_declaration(not directly inenum_body). Javadoc/** */isblock_commentasprev_named_sibling. - C: Root node is
translation_unit. Functions usefunction_definition → function_declarator → identifier. Structs usestruct_specifier → type_identifier. Typedef structs usetype_definition → type_identifier. Includes usepreproc_include. Doc comments are///comments asprev_named_sibling. - C++: Inherits from CPlugin. Classes use
class_specifier → type_identifier. Methods inside classes usefield_identifier(notidentifier). Namespaces usenamespace_definition → namespace_identifier. Also searchesusing_declarationfor imports. - Ruby: Root node is
program. Classes/modules useconstantfor names (notidentifier). Methods without params don't have amethod_parameterschild. Singleton methods (def self.foo) usesingleton_methodnode. Imports arecallnodes where method isrequire/require_relative.
pip install tree-sitter-LANGand add topyproject.toml- Copy
src/codetree/languages/_template.py→languages/LANG.py, implement 5 abstract methods +check_syntax - Register in
registry.py - Add tests (use existing
tests/languages/test_python.pyas reference)
- Plugin classes:
{Lang}Plugin(e.g.,PythonPlugin,GoPlugin) - Module-level parser/language globals:
_PARSER,_LANGUAGE - Skeleton results are deduplicated by
(name, line)and sorted by line number - Indexer
SKIP_DIRSincludes.venv,node_modules,__pycache__,.git— without this, crawling.venvcauses Claude Code timeout - FastMCP tool access in tests:
mcp.local_provider._components[f"tool:{name}@"].fn - Doc sync rule: When tools are added, removed, or changed, update all 5 doc files:
README.md,TOOLS_GUIDE.md,LANDING_PAGE.md,CLAUDE.md,AGENTS.md
codetree Production Hardening
codetree is a Python MCP server that gives coding agents structured code understanding via tree-sitter. This hardening effort fixes critical and high-priority bugs discovered during a codebase audit — issues that cause agents to receive wrong data, crash the server, or expose security holes.
Core Value: Every MCP tool call returns correct, trustworthy data — agents can rely on codetree without worrying about stale state, silent failures, or wrong results.
- Testing: All fixes must have tests; existing 1070 tests must continue passing
- Backward compat: No changes to MCP tool signatures (agents already use them)
- Performance: Server startup must stay under ~2s for typical repos
- Python 3.10+ - Core application language; MCP server and all indexing/analysis logic
- Python (
.py) - JavaScript/JSX (
.js,.jsx) - TypeScript (
.ts) - TypeScript JSX (
.tsx) - Go (
.go) - Rust (
.rs) - Java (
.java) - C (
.c,.h) - C++ (
.cpp,.cc,.cxx,.hpp,.hh) - Ruby (
.rb)
- Python 3.10+ as base interpreter
- Standard library modules:
pathlib,json,subprocess,sqlite3,argparse,atexit,hashlib,re,dataclasses - pip (standard Python package manager)
- Optional:
uvfor faster installation (recommended in README for Quick Start) - Lockfile:
.venv/contains installed packages; norequirements.txtorpyproject.lockcommitted
- FastMCP 3.1.0 (or later
>=2.0.0) - MCP (Model Context Protocol) server framework - tree-sitter 0.23.0+ - AST parsing library (language-agnostic)
- pytest (via GitHub Actions workflow, not explicitly in pyproject.toml dependencies but installed in CI)
- hatchling (build backend)
- tree-sitter (0.23.0+) - Core AST parsing; blocks everything else
- fastmcp (2.0.0+) - MCP server registration and tool transport
- tree-sitter-python, tree-sitter-javascript, tree-sitter-typescript, tree-sitter-go, tree-sitter-rust, tree-sitter-java, tree-sitter-c, tree-sitter-cpp, tree-sitter-ruby
- No explicit environment variables required for normal operation
.codetree/directory created in repository root for persistent data:- Startup: Command-line argument
--root /path/to/repospecifies target codebase (default: current directory) pyproject.toml- Single source of truth for dependencies, project metadata, build config- Python wheel built via hatchling (not in committed dist/)
- CLI entrypoint:
codetree = "codetree.__main__:main"
- Python 3.10+ interpreter
- pip or uv for package installation
.venv/virtual environment (created and activated viasource .venv/bin/activate)- Git for accessing codebase metadata (used by
git_analysis.pymodule) - ~150MB disk for installed dependencies (tree-sitter + language grammars)
- Python 3.10+ on target system
- No external services or databases required — SQLite is embedded
- Runs as stdio-based MCP server in agent/IDE contexts (Claude Code, Cursor, VS Code, Windsurf)
- Network: Optional — git history analysis uses local
gitcommand; no outbound network calls - Memory: ~50-100MB for typical codebases; scales with repository size
- CPU: Single-threaded analysis; no async I/O (subprocess calls are blocking)
.codetree/index.json- JSON text file in target repo (human-readable, git-ignored).codetree/graph.db- SQLite 3 database (binary, git-ignored)- No cloud storage, no S3, no vector DB
- Cache invalidated on file modification time (mtime) changes
- Graph rebuilt incrementally on changes (sha256 content hashing)
- Both are .gitignore'd to avoid committing analysis artifacts
- Modules use lowercase with underscores:
indexer.py,registry.py,cache.py - Plugin modules follow pattern:
{language}.py(e.g.,python.py,javascript.py,base.py) - Test files follow pattern:
test_{module}.py(e.g.,test_indexer.py,test_server.py) - Template file for new code:
_template.pyas boilerplate - Lowercase with underscores:
get_plugin(),extract_skeleton(),extract_symbol_source() - Private/internal functions prefixed with underscore:
_matches(),_clean_doc(),_should_skip() - Boolean predicates start with
is_orcheck_:check_syntax(),is_valid() - Getter functions use
get_prefix:get_skeleton(),get_symbol(),get_imports() - Setter functions use
set_prefix:set()for simple assignment - Extraction functions use
extract_prefix:extract_skeleton(),extract_calls_in_function() - MCP tool functions decorated with
@mcp.tool()use clear action verbs:get_file_skeleton(),find_references() - Lowercase with underscores:
file_entry,rel_path,source,skeleton - Class instances:
indexer,plugin,cache,store - Dictionaries/collections singular or plural as appropriate:
results,definitions,call_graph - Constants: UPPERCASE with underscores:
SKIP_DIRS,_EXCLUDED_NAMES - Module-level parser/language globals:
_PARSER,_LANGUAGE - Private instance variables:
_index,_definitions,_call_graph,_root - Loop counters use full names not
i:for rel_path, entry in ...orfor item in skeleton: - Classes use PascalCase:
LanguagePlugin,FileEntry,Calculator - Plugin classes follow pattern:
{Language}Plugin(e.g.,PythonPlugin,JavaScriptPlugin,GoPlugin) - Abstract base class:
LanguagePlugin(ABC) - Dataclass fields documented inline with type hints and brief purpose
- Type unions use modern syntax:
str | PathnotUnion[str, Path]
- No automatic linter or formatter configured (
.eslintrc,.prettierrc,biome.jsonnot present) - Implicit convention: 4-space indentation (Python standard)
- Line length: no strict limit enforced, but code is reasonably sized
- Imports grouped: standard library, third-party, local
- Blank lines: 2 between top-level definitions, 1 between methods
- No linting tool configured in
pyproject.toml - Code quality maintained through convention and testing
- Type hints are used throughout:
extract_skeleton(source: bytes) -> list[dict]
- No path aliases configured (no
jsconfig.json,tsconfig.jsonpaths) - Relative imports used throughout:
from .indexer import ...,from ..graph.store import ... - All paths are relative to package root:
src/codetree/
- Graceful degradation: functions return
Noneor empty list on error, not exceptions extract_symbol_source(source, name) -> tuple[str, int] | Nonereturns None if symbol not foundget_plugin(path) -> LanguagePlugin | Nonereturns None for unsupported extensionsCache.load()catchesjson.JSONDecodeErrorandOSError, silently returns empty dict- Skeleton parsing catches no exceptions — invalid syntax captured via
plugin.check_syntax()flag - String methods use
.decode("utf-8", errors="replace")for safe UTF-8 handling across all languages - File not found cases return user-friendly strings:
f"File not found: {file_path}",f"Symbol '{symbol_name}' not found in {file_path}" _should_skip(path: Path) -> boolchecks directory names againstSKIP_DIRSsetis_valid(rel_path, current_mtime) -> boolverifies cache freshness by mtime matching- Skeleton results deduplicated by
(name, line)before returning - All paths validated as relative using pattern matching: no absolute paths in results
- Print-based debugging in utility functions
- No structured logging
- Docstrings used for user-facing documentation of tool behavior
- Class docstrings describe purpose and key responsibilities
- Method docstrings describe what it does, args, return value, and any side effects
- Inline comments rare — code is self-documenting via clear naming
- Section separators used in large files:
# ── Section Name ────────────────────────── - Not used (Python codebase)
- Docstrings use triple quotes:
"""description.""" - Parameter and return documentation in docstring body
- Functions are small and focused: 10-50 lines typical
- Extract helpers for complex operations:
_matches(),_fill_docs_from_siblings(),_clean_doc() - Core extraction methods in plugins are 50-150 lines (complex query logic)
- Main orchestration methods:
build(),create_server()in 30-60 lines - Positional parameters for required inputs:
extract_skeleton(source: bytes) - Keyword arguments with defaults for optional behavior:
format: str = "full" - Path parameters as
str | Pathfor flexibility, converted toPathinternally - Multiple related params grouped:
extract_calls_in_function(source, fn_name)not spread across calls - Return meaningful types:
list[dict],tuple[str, int] | None,dict[str, Any] - Return early on error/not-found:
if entry is None: return None - Return collections always (not None):
extract_calls_in_function() -> list[str](empty list if none) - Tuples for related values:
extract_symbol_source() -> tuple[str, int] | None(source + line)
- No explicit
__all__lists; modules export all public (non-_) names - Plugin classes instantiated at module level:
PythonPlugin(), shared via registry - Plugin registry:
PLUGINS: dict[str, LanguagePlugin]inregistry.py - No barrel/index files (
__init__.pyis minimal) src/codetree/__init__.pyis empty- Language plugins imported individually:
from .languages.python import PythonPlugin indexer.py— file discovery, parsing, skeleton building, cross-file analysislanguages/*.py— language-specific AST parsing and extractionserver.py— MCP tool registration and output formattingcache.py— skeleton caching with mtime invalidationregistry.py— extension → plugin mappinggraph/*.py— persistent graph building, queries, analysis
- FastMCP 3.1.0 framework - MCP tools exposed as JSON-RPC endpoints over stdio
- Multi-language plugin system - Tree-sitter-based parsers for 10 languages (Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, Ruby)
- Three-tier indexing - File discovery → skeleton extraction → graph construction
- Persistent SQLite graph -
.codetree/graph.dbfor cross-session analysis without re-parsing - Cache optimization -
.codetree/index.jsonwith mtime-based invalidation to skip unchanged files
- Purpose: Parse command-line arguments and invoke the server
- Location:
src/codetree/__main__.py - Contains: argparse setup, root directory resolution
- Triggers: Called by
codetree --root /path/to/repocommand - Responsibilities: Accept
--rootargument, invokeserver.run() - Purpose: Expose 23 MCP tools over FastMCP protocol
- Location:
src/codetree/server.py - Contains: Tool definitions, result formatting, caching/indexing/graph initialization
- Depends on: Indexer, Cache, GraphStore, GraphQueries
- Used by: Claude Code via stdio MCP transport
- Key function:
create_server(root: str) → FastMCP,run(root: str)entry point - Purpose: Discover all supported files, extract symbol skeletons, build definition/call graphs
- Location:
src/codetree/indexer.py - Contains:
Indexerclass with methods for skeleton, symbol source, references, call graphs, dead code, blast radius, clones, search - Depends on: Language plugins, registry
- Used by: Server, GraphBuilder
- Key dataclass:
FileEntry(path, source, skeleton, mtime, language, plugin, has_errors) - Purpose: Store pre-computed skeletons with modification time checks to skip unchanged files
- Location:
src/codetree/cache.py - Contains:
Cacheclass (load, save, get, set, is_valid methods) - Stores:
.codetree/index.json(JSON dict ofrel_path → {mtime, skeleton}) - Used by: Server on startup to inject cached entries into indexer
- Purpose: Abstract language-specific AST parsing behind common interface
- Location:
src/codetree/languages/ - Contains: 10 plugin classes inheriting from
LanguagePluginbase - Each plugin implements:
- Plugins:
PythonPlugin,JavaScriptPlugin,TypeScriptPlugin,TSXPlugin,GoPlugin,RustPlugin,JavaPlugin,CPlugin,CppPlugin,RubyPlugin - Purpose: Route files to correct language plugin
- Location:
src/codetree/registry.py - Contains:
PLUGINSdict (extension → singleton plugin instance),get_plugin(path) → LanguagePlugin | None - Used by: Indexer during file discovery
- Purpose: Build and query a persistent SQLite symbol graph for cross-session analysis
- Location:
src/codetree/graph/ - Components:
_index: dict[rel_path → FileEntry]- all indexed files (held in memory)_definitions: dict[name → list[(file, line)]]- definition locations for all symbols_call_graph, _reverse_graph- lazy-built, invalidated when files change_call_graph_built: bool- flag to defer call graph construction until first use- Persistent SQLite database:
.codetree/graph.db - Tables:
meta,files,symbols,edges,file_symbols_index - Indices on:
symbols.name,symbols.file,symbols.kind,edges.source_qn,edges.target_qn,edges.type - Schema version tracked in
metatable - JSON file:
.codetree/index.json - Structure:
{rel_path → {mtime: float, skeleton: list[dict]}} - Invalidation: re-parse file if
stat().st_mtimediffers from cached mtime
- Purpose: Define language-agnostic interface for code analysis
- Methods: 5 abstract (skeleton, symbol_source, calls, usages, imports) + 5 optional (syntax, complexity, variables, ast, normalize_for_clones)
- Example:
PythonPluginqueriesfunction_definitionandclass_definitiontree-sitter nodes - Pattern: Tree-sitter 0.25.x API with
Query(),QueryCursor(),_matches()unwrapper - Purpose: Hold all parsed information for a single file
- Fields: path, source (bytes), skeleton, mtime, language, plugin, has_errors
- Lifetime: Created during indexing, reused for all lookups without re-parsing
- Purpose: Define persistent graph schema
- SymbolNode: qualified_name, name, kind, file_path, start_line, end_line, parent_qn, doc, params, is_test, is_entry_point
- Edge: source_qn, target_qn, type (CALLS, IMPORTS, CONTAINS), weight
- Qualified names:
file_path::ClassName.method_nameorfile_path::function_name - Purpose: Execute AST queries via S-expression patterns
- Pattern: Define queries as strings:
(function_definition name: (identifier) @name) - Matches returned as dicts with capture names unwrapped to nodes
- Helper:
_matches(query, node)inlanguages/base.pyfor convenient capture unwrapping
- Location:
src/codetree/__main__.py::main() - Triggers:
codetree --root /path/to/repo - Responsibilities: Parse args, invoke
server.run(root) - Structural:
get_file_skeleton,get_symbol,find_references,get_call_graph,get_imports,get_skeletons,get_symbols,get_complexity,find_dead_code,get_blast_radius,detect_clones,search_symbols,find_tests - Graph & Onboarding:
index_status,get_repository_map,resolve_symbol,search_graph,get_change_impact,analyze_dataflow,find_hot_paths,get_dependency_graph,git_history,suggest_docs - All registered as
@mcp.tool()decorators inserver.py
- File not found:
"File not found or empty: {file_path}" - Symbol not found:
"Symbol '{name}' not found in {file_path}" - Syntax error: Skeleton includes warning header if
entry.has_errors == True - Empty results:
"No {X} found..."
- File paths: Checked for existence in indexer during build
- Symbol names: Case-sensitive searches; substring matching optional in
search_symbols - Imports: Extracted as raw text; no semantic resolution (cross-module imports tracked by edge weights)
- Skeleton cache: mtime-based invalidation per file
- Call graph: Lazy-built once, invalidated on file change via
_call_graph_builtflag - Graph store: Content-hashed (sha256) per file to detect changes
- Import resolution: Graph builder caches file imports in
_file_importsdict - Excludes dunder methods (
__init__,__str__, etc.), test functions,__init__.pyexports - Counts external references only (same-file definitions at definition line don't count as usage)
Before using Edit, Write, or other file-changing tools, start work through a GSD command so planning artifacts and execution context stay in sync.
Use these entry points:
/gsd:quickfor small fixes, doc updates, and ad-hoc tasks/gsd:debugfor investigation and bug fixing/gsd:execute-phasefor planned phase work
Do not make direct repo edits outside a GSD workflow unless the user explicitly asks to bypass it.
Profile not yet configured. Run
/gsd:profile-userto generate your developer profile. This section is managed bygenerate-claude-profile-- do not edit manually.