spec-08: additional general-purpose language support (C#, Kotlin, PHP, C, Scala, Dart, Lua, Elixir, Bash)#93
Merged
Conversation
…Dart, Lua, Elixir, Bash) Fixes the phantom-language bug (C#/Kotlin/PHP/C were detected but never graphed) and adds 5 new languages, all on the existing tree-sitter extractor pattern — no schema/MCP/tool changes. - loadGrammarSoft: lazy, cached, soft-failing grammar loader (graceful degradation — a missing/ABI-incompatible grammar warns once and skips that language without aborting analyze or any other language). - extractByQueries: shared query-driven extractor (C#/Kotlin/PHP/C/Scala/Dart/ Lua/Bash) parameterized by FN/CALL queries + grouping node types; bespoke walk for Elixir (its grammar models everything as `call` nodes). - Name-based call resolution (matches the repo's best-effort approach). - detectLanguage extensions; .h C/C++ heuristic (resolveHeaderLanguage); CALL_GRAPH_LANGS extended; ambient types for untyped grammars. - Grammar versions pinned to ABI-14 prebuilds where required (c-sharp 0.23.1, php 0.23.12, c 0.23.6, bash 0.23.3); Lua/Dart ship ABI-15 only and exercise the graceful-degradation path in this environment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…glot, degradation, determinism) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…itter) Previously Lua/Dart only degraded (no ABI-compatible native grammar for the pinned host tree-sitter). They now extract for real via tree-sitter-wasms + web-tree-sitter, so all nine spec-08 languages produce graphs. - Unified GrammarHandle (withTree) abstracts native tree-sitter and the WASM backend behind one interface; the 7 native languages are unchanged. - loadWasmGrammarSoft loads the grammar's WASM bytes ourselves and hands a Uint8Array to Language.load (avoids web-tree-sitter's ESM-unfriendly internal fs require); withTree disposes tree/queries per parse to protect the WASM heap. - Dart uses a custom walk (its function_body is a sibling of function_signature, so a generic query attributes no calls); Lua uses the generic query path with the bundled grammar's node types, incl. t.f()/t:m() and table-name className. - Validated against the real grammars; web-tree-sitter pinned to 0.25.0, native tree-sitter-lua/-dart deps dropped (WASM comes from tree-sitter-wasms). - Dart/Lua tests split into their own files (vitest's sandbox corrupts the shared WASM heap across grammars in one file; production node does not). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-1 signatures
- Kotlin extension-function CALL now asserted ("hi".shout() resolves).
- Elixir remote Mod.fun() resolves to an in-project module (cross-module edge);
emit the function name only so name-based resolution matches.
- ClassNode grouping: assert Service.methodIds references both methods (C#).
- Explicit phantom-regression test: C#/Kotlin/PHP/C always non-empty nodes+edges.
- Cross-cutting interop tests: SCIP export emits new-language nodes (no-enum →
UnspecifiedLanguage), federation manifest languages[] includes the new tags.
- Best-effort Stage-1 search signatures for all nine languages so they are
searchable via BM25 even when a grammar can't load (graceful degradation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a committed graph snapshot for a representative TypeScript fixture (classes, generics, async methods, free functions, calls). Any change to an existing-language extractor would diff this snapshot — the spec's "before/after byte-identical repo graph" guard, made maintainable as a committed baseline. Also asserts byte-identical output across rebuilds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds call-graph extraction for nine general-purpose languages on the
existing tree-sitter extractor pattern — no changes to the
FunctionNode/CallEdge/ClassNodeschema, MCP tools,orient, the searchindex, SCIP export, or the federation manifest.
detectLanguagebut had no dispatch branch, so they were counted butproduced an empty graph. They now emit real nodes and edges.
queries, dispatch branch, detection). Structurally similar languages share one
query-driven extractor (
extractByQueries); Elixir uses a bespoke walk (itsgrammar models everything as
callnodes).loadGrammarSoftwarns once and skips a missing/ABI-incompatible grammar without aborting
analyzeor any other language. Files of that language are still indexed forsearch.
.hC/C++ heuristic (resolveHeaderLanguage, tested): C-only project → C;any C++ source → C++; standalone → C++ (superset default).
this.M(),Class.M(),$this->m(),Obj.m(),Mod.fun(), etc.Native grammar dependencies (justification)
All ship prebuilt binaries; none required a source compile here. Pinned to
ABI-14 prebuilds where needed to match the host
tree-sitter(0.22.4, ABI 14):tree-sitter-c-sharp@0.23.1,tree-sitter-php@0.23.12,tree-sitter-c@0.23.6,tree-sitter-bash@0.23.3;tree-sitter-kotlin/-scala/-elixirat latest.tree-sitter-luaandtree-sitter-dartship ABI-15-only builds, so in thisenvironment (node 25 cannot build
tree-sitter0.25 from source) they exercisethe graceful-degradation path; their extractors + fixtures are in place and
graph wherever an ABI-15 host binding is available. 7 of 9 extract fully in
CI today; all 9 are wired end to end.
Test plan
.hheuristic (3-case table)vi.doMock)lint/typecheck/test:run/buildall passFollow-ups (
TODO(spec-08-followup))tree-sitteris buildable on theCI node version (code + fixtures already present).
source/.as file-level dependency edges.🤖 Generated with Claude Code