Fix/go error root parity#112
Conversation
…nTokenNamed params After simplifying optional_chain normalization to strip the child directly, the token symbol and named parameters became dead code. Remove them from the function signature and call site. Simplify enableOptionalChain to only check hasOptionalChainSym since the token symbol lookup is no longer needed. Addresses PR odvcencio#101 review feedback.
… check Prevent using an uninitialized importSym (zero value) when creating the leaf node for dynamic import() expressions. If the grammar lacks the anonymous import symbol, the normalization is now correctly skipped. Addresses PR odvcencio#102 review feedback.
… private field - Add defensive nil check on goTree before defer releaseGoTree to prevent panic if parseWithGo returns nil tree/root (high priority). - Remove unused fileSummary struct and results slice that were declared and appended to but never read (medium priority). - Rename #cache to _cache in javascript_sample.js: JS private fields are strictly private to the declaring class and inaccessible from subclasses. AdvancedParser extends BaseParser, so this.#cache would throw at runtime. Use _cache (protected convention) instead (medium priority). Addresses PR odvcencio#104 review feedback.
normalizeGoDotLeafChildren and normalizeGoCompatibilitySubtree both recurse through the AST without any cycle or depth protection. When the arena materialises nodes that form cycles (e.g. shared sub-trees or self-referencing sidecar entries), the recursion becomes unbounded and causes a stack overflow crash. Add a maxWalkDepth=2048 guard to both functions. This is far beyond any legitimate Go AST depth while preventing unbounded recursion. Fixes stack overflow in xgrep graph builder during multi-file scans.
There was a problem hiding this comment.
Code Review
This pull request introduces recursion depth guards to prevent stack overflows in Go AST normalization, aligns synthetic root node wrapping with C tree-sitter behavior, and cleans up unused variables and structures. The review feedback highlights critical issues with the new synthetic root wrapping logic, which could break downstream AST consumers by hiding valid top-level nodes under an ERROR container. Additionally, the feedback identifies potential resource leaks in the parity tests where early returns occur before the tree cleanup functions are deferred.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if rootHasError && rootSymbol == b.expectedRootSymbol && b.hasExpectedRoot { | ||
| rootChildren = b.wrapChildrenInError(rootChildren) | ||
| } |
There was a problem hiding this comment.
Unconditionally wrapping all rootChildren in a single ERROR node when rootHasError is true destroys the AST structure of partially valid files. For example, if a Go file has a valid package_clause but a syntax error later in the file, both the valid package_clause and the error fragment will be wrapped inside a top-level ERROR node. This makes valid nodes children of ERROR, which breaks downstream consumers (like linters, static analysis tools, or AST queries) that expect to find valid top-level nodes directly under source_file.
To match C tree-sitter and preserve AST correctness, we should only wrap the children in ERROR if there are no valid (non-ERROR, non-extra) top-level nodes.
| if rootHasError && rootSymbol == b.expectedRootSymbol && b.hasExpectedRoot { | |
| rootChildren = b.wrapChildrenInError(rootChildren) | |
| } | |
| if rootHasError && rootSymbol == b.expectedRootSymbol && b.hasExpectedRoot && b.shouldWrapSyntheticRoot(rootChildren) { | |
| rootChildren = b.wrapChildrenInError(rootChildren) | |
| } |
| func (b *resultRootBuild) wrapChildrenInError(children []*Node) []*Node { | ||
| if len(children) == 0 { | ||
| return children | ||
| } | ||
| // If there's already a single ERROR child, no wrapping needed. | ||
| if len(children) == 1 && children[0] != nil && children[0].symbol == errorSymbol { | ||
| return children | ||
| } | ||
| errNode := newParentNodeInArena(b.arena, errorSymbol, true, children, nil, 0) | ||
| errNode.setHasError(true) | ||
| return b.singleChildSlice(errNode) | ||
| } |
There was a problem hiding this comment.
To support the refined wrapping logic and avoid wrapping valid top-level nodes in ERROR, we should define a helper method shouldWrapSyntheticRoot that checks if there are any valid (non-ERROR, non-extra) top-level nodes in the synthetic root children.
func (b *resultRootBuild) wrapChildrenInError(children []*Node) []*Node {
if len(children) == 0 {
return children
}
// If there's already a single ERROR child, no wrapping needed.
if len(children) == 1 && children[0] != nil && children[0].symbol == errorSymbol {
return children
}
errNode := newParentNodeInArena(b.arena, errorSymbol, true, children, nil, 0)
errNode.setHasError(true)
return b.singleChildSlice(errNode)
}
func (b *resultRootBuild) shouldWrapSyntheticRoot(children []*Node) bool {
for _, child := range children {
if child == nil || child.isExtra() {
continue
}
if child.symbol != errorSymbol && !child.IsError() {
return false
}
}
return true
}| if goTree == nil || goTree.RootNode() == nil { | ||
| t.Error("Go parser returned nil tree") | ||
| langFail[langName]++ | ||
| return | ||
| } |
There was a problem hiding this comment.
If goTree is successfully allocated (non-nil) but goTree.RootNode() is nil, the subtest returns early on line 74. Since defer releaseGoTree(goTree) is only registered on line 76 (after this check), the allocated goTree is never released, leading to a resource/memory leak. Splitting the check and manually releasing goTree when RootNode() is nil prevents this leak.
if goTree == nil {
t.Error("Go parser returned nil tree")
langFail[langName]++
return
}
if goTree.RootNode() == nil {
t.Error("Go parser returned tree with nil root node")
releaseGoTree(goTree)
langFail[langName]++
return
}| if cTree == nil || cTree.RootNode() == nil { | ||
| msg := "C parser returned nil tree" | ||
| t.Error(msg) | ||
| results = append(results, fileSummary{filename: de.Name(), lang: langName, errMsg: msg}) | ||
| t.Error("C parser returned nil tree") | ||
| langFail[langName]++ | ||
| return | ||
| } |
There was a problem hiding this comment.
Similar to the goTree check, if cTree is non-nil but cTree.RootNode() is nil, the subtest returns early on line 102. Since defer cTree.Close() is only registered on line 104, the allocated cTree is never closed, leaking C memory. Splitting the check and manually calling cTree.Close() when RootNode() is nil prevents this leak.
| if cTree == nil || cTree.RootNode() == nil { | |
| msg := "C parser returned nil tree" | |
| t.Error(msg) | |
| results = append(results, fileSummary{filename: de.Name(), lang: langName, errMsg: msg}) | |
| t.Error("C parser returned nil tree") | |
| langFail[langName]++ | |
| return | |
| } | |
| if cTree == nil { | |
| t.Error("C parser returned nil tree") | |
| langFail[langName]++ | |
| return | |
| } | |
| if cTree.RootNode() == nil { | |
| t.Error("C parser returned tree with nil root node") | |
| cTree.Close() | |
| langFail[langName]++ | |
| return | |
| } |
C tree-sitter always uses the grammar's start symbol (e.g. source_file) as the root node, even when the parse result contains errors. Previously gotreesitter returned ERROR (65535) as the root symbol for Go error cases, making the root an ERROR node instead of source_file. This change adds Go to the language-specific override list in syntheticRootSymbol, matching the existing pattern for SQL and Swift. Note: this does NOT match the full error-recovery tree structure of C tree-sitter (e.g. fragment grouping under ERROR children). Those are deeper GLR parser differences tracked separately. This change only ensures the root node type is consistent. Also adds TestRootShapeParity in cgo_harness to empirically compare C tree-sitter vs gotreesitter tree shapes for various error scenarios.
cd69544 to
650eb2b
Compare
What does this PR do?
Returns
source_fileas the root node type for Go error-recovery trees, matching C tree-sitter behavior. Previously gotreesitter returnedERROR(65535) as the root symbol, making the entire root an ERROR node instead ofsource_file.Also adds
TestRootShapeParityincgo_harness/— an empirical test that parses 7 inputs (valid, partially valid, fully invalid) through both C tree-sitter and gotreesitter, comparing tree shapes side by side.Why this approach?
C tree-sitter always uses the grammar start symbol (
source_file) as the root, even for fully invalid input. The change follows the existing pattern already used for SQL and Swift — a one-line addition tosyntheticRootSymbol.Rejected alternative: wrapping all error-recovery children in a single ERROR node (
source_file → ERROR(children...)). Empirical testing showed C tree-sitter does NOT do this unconditionally — for partially valid files it keeps valid top-level nodes as direct children ofsource_file. The wrapping also masked a deeper parser-level difference: C tree-sitter successfully reduces expression-like patterns intoexpression_statement → call_expression(...), while gotreesitter produces fragments. That fragmentation is a GLR parser difference, not a root assembly issue.Correctness
go test -run "TestParser|TestTree|TestRootNode|TestResult|TestBuildResult" -count=1— all passTestRootShapeParity(cgo_harness, build tagtreesitter_c_parity) covering 7 error scenariosbash cgo_harness/docker/run_single_grammar_parity.sh gox.Query("..." + input)parses asexpression_statementin C but fragments in gotreesitter) is a parser-level difference, not addressed here. Downstream consumers (xgrep) handle this with a fragmentation guard.Performance
isLanguage("go")string comparison on the error-recovery path only. Normal valid-Go parses are unaffected.Maintainability
Self-review
Crux of the solution
syntheticRootSymbol(6 lines): Adds Go to the list of languages that returnexpectedRootSymbol(usuallysource_file) even whenrootHasErroris true. This follows the identical pattern already established for SQL and Swift.Empirical findings from
TestRootShapeParitysource_file → pkg, func"x"(single ident)source_file → ERROR(ident)package main\nfunc brokensource_file → pkg_clause, ERROR(func,ident)source_file → pkg_clause, ERROR(func), identx.Query("..." + input)source_file → expr_stmt(call_expr)source_file → sel_expr, (, binary_expr, )Root type is now correct for all cases. Fragment grouping differences are deeper GLR parser issues tracked separately.
Open question: generalize vs per-language
Recommendation: Keep per-language for now. Python (
syntheticRootCanDropError) and Dart (dartProgramChildrenLookComplete) have custom logic that depends on seeing specific root shapes.Due diligence