Skip to content

Fix/go error root parity#112

Open
frozen425 wants to merge 5 commits into
odvcencio:mainfrom
frozen425:fix/go-error-root-parity
Open

Fix/go error root parity#112
frozen425 wants to merge 5 commits into
odvcencio:mainfrom
frozen425:fix/go-error-root-parity

Conversation

@frozen425

@frozen425 frozen425 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Returns source_file as the root node type for Go error-recovery trees, matching C tree-sitter behavior. Previously gotreesitter returned ERROR (65535) as the root symbol, making the entire root an ERROR node instead of source_file.

Also adds TestRootShapeParity in cgo_harness/ — an empirical test that parses 7 inputs (valid, partially valid, fully invalid) through both C tree-sitter and gotreesitter, comparing tree shapes side by side.

Why this approach?

C tree-sitter always uses the grammar start symbol (source_file) as the root, even for fully invalid input. The change follows the existing pattern already used for SQL and Swift — a one-line addition to syntheticRootSymbol.

Rejected alternative: wrapping all error-recovery children in a single ERROR node (source_file → ERROR(children...)). Empirical testing showed C tree-sitter does NOT do this unconditionally — for partially valid files it keeps valid top-level nodes as direct children of source_file. The wrapping also masked a deeper parser-level difference: C tree-sitter successfully reduces expression-like patterns into expression_statement → call_expression(...), while gotreesitter produces fragments. That fragmentation is a GLR parser difference, not a root assembly issue.

Correctness

  • Focused package or unit tests ran for the touched behavior
    • go test -run "TestParser|TestTree|TestRootNode|TestResult|TestBuildResult" -count=1 — all pass
  • Added TestRootShapeParity (cgo_harness, build tag treesitter_c_parity) covering 7 error scenarios
  • Affected parity validation ran in Docker, one language or grammar at a time
    • Recommend: bash cgo_harness/docker/run_single_grammar_parity.sh go
  • Documented anything intentionally left unvalidated in this PR
    • Deeper GLR parser fragmentation (e.g. x.Query("..." + input) parses as expression_statement in C but fragments in gotreesitter) is a parser-level difference, not addressed here. Downstream consumers (xgrep) handle this with a fragmentation guard.

Performance

  • Included before and after numbers, or explicitly said perf is not the goal of this PR
    • Perf is not the goal. The change adds one isLanguage("go") string comparison on the error-recovery path only. Normal valid-Go parses are unaffected.

Maintainability

  • No unnecessary abstractions or speculative cleanup outside the PR scope
  • Generated files or harness changes are only here because they are required by the change
  • No new dependencies without justification

Self-review

  • Reviewed my own diff end to end
  • Left comments on notable sections or the crux of the solution
  • Called out anything I am unsure about or want a second opinion on

Crux of the solution

syntheticRootSymbol (6 lines): Adds Go to the list of languages that return expectedRootSymbol (usually source_file) even when rootHasError is true. This follows the identical pattern already established for SQL and Swift.

Empirical findings from TestRootShapeParity

Input C tree-sitter gotreesitter (with this fix) Parity?
valid Go source_file → pkg, func Same
"x" (single ident) source_file → ERROR(ident) Same
package main\nfunc broken source_file → pkg_clause, ERROR(func,ident) source_file → pkg_clause, ERROR(func), ident ❌ fragment grouping
x.Query("..." + input) source_file → expr_stmt(call_expr) source_file → sel_expr, (, binary_expr, ) ❌ parser-level

Root type is now correct for all cases. Fragment grouping differences are deeper GLR parser issues tracked separately.

Open question: generalize vs per-language

Generalize (always use start symbol) Per-language opt-in (current)
C parity Full Partial — only opted-in languages
New language coverage Automatic Manual
Blast radius Wide Narrow — only Go in this PR
Regression risk Python/Dart custom error handling could break None

Recommendation: Keep per-language for now. Python (syntheticRootCanDropError) and Dart (dartProgramChildrenLookComplete) have custom logic that depends on seeing specific root shapes.

Due diligence

  • Read the code before changing it
  • Checked that similar patterns exist elsewhere (SQL, Swift) and stayed consistent
  • Validated against C tree-sitter output via TestRootShapeParity

Gomez added 4 commits June 6, 2026 17:28
…nTokenNamed params

After simplifying optional_chain normalization to strip the child directly,
the token symbol and named parameters became dead code. Remove them from the
function signature and call site. Simplify enableOptionalChain to only check
hasOptionalChainSym since the token symbol lookup is no longer needed.

Addresses PR odvcencio#101 review feedback.
… check

Prevent using an uninitialized importSym (zero value) when creating the
leaf node for dynamic import() expressions. If the grammar lacks the
anonymous import symbol, the normalization is now correctly skipped.

Addresses PR odvcencio#102 review feedback.
… private field

- Add defensive nil check on goTree before defer releaseGoTree to prevent
  panic if parseWithGo returns nil tree/root (high priority).
- Remove unused fileSummary struct and results slice that were declared
  and appended to but never read (medium priority).
- Rename #cache to _cache in javascript_sample.js: JS private fields are
  strictly private to the declaring class and inaccessible from subclasses.
  AdvancedParser extends BaseParser, so this.#cache would throw at runtime.
  Use _cache (protected convention) instead (medium priority).

Addresses PR odvcencio#104 review feedback.
normalizeGoDotLeafChildren and normalizeGoCompatibilitySubtree both
recurse through the AST without any cycle or depth protection. When
the arena materialises nodes that form cycles (e.g. shared sub-trees
or self-referencing sidecar entries), the recursion becomes unbounded
and causes a stack overflow crash.

Add a maxWalkDepth=2048 guard to both functions. This is far beyond
any legitimate Go AST depth while preventing unbounded recursion.

Fixes stack overflow in xgrep graph builder during multi-file scans.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces recursion depth guards to prevent stack overflows in Go AST normalization, aligns synthetic root node wrapping with C tree-sitter behavior, and cleans up unused variables and structures. The review feedback highlights critical issues with the new synthetic root wrapping logic, which could break downstream AST consumers by hiding valid top-level nodes under an ERROR container. Additionally, the feedback identifies potential resource leaks in the parity tests where early returns occur before the tree cleanup functions are deferred.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread parser_result_root_build.go Outdated
Comment on lines +112 to +114
if rootHasError && rootSymbol == b.expectedRootSymbol && b.hasExpectedRoot {
rootChildren = b.wrapChildrenInError(rootChildren)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Unconditionally wrapping all rootChildren in a single ERROR node when rootHasError is true destroys the AST structure of partially valid files. For example, if a Go file has a valid package_clause but a syntax error later in the file, both the valid package_clause and the error fragment will be wrapped inside a top-level ERROR node. This makes valid nodes children of ERROR, which breaks downstream consumers (like linters, static analysis tools, or AST queries) that expect to find valid top-level nodes directly under source_file.

To match C tree-sitter and preserve AST correctness, we should only wrap the children in ERROR if there are no valid (non-ERROR, non-extra) top-level nodes.

Suggested change
if rootHasError && rootSymbol == b.expectedRootSymbol && b.hasExpectedRoot {
rootChildren = b.wrapChildrenInError(rootChildren)
}
if rootHasError && rootSymbol == b.expectedRootSymbol && b.hasExpectedRoot && b.shouldWrapSyntheticRoot(rootChildren) {
rootChildren = b.wrapChildrenInError(rootChildren)
}

Comment thread parser_result_root_build.go Outdated
Comment on lines +180 to +191
func (b *resultRootBuild) wrapChildrenInError(children []*Node) []*Node {
if len(children) == 0 {
return children
}
// If there's already a single ERROR child, no wrapping needed.
if len(children) == 1 && children[0] != nil && children[0].symbol == errorSymbol {
return children
}
errNode := newParentNodeInArena(b.arena, errorSymbol, true, children, nil, 0)
errNode.setHasError(true)
return b.singleChildSlice(errNode)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

To support the refined wrapping logic and avoid wrapping valid top-level nodes in ERROR, we should define a helper method shouldWrapSyntheticRoot that checks if there are any valid (non-ERROR, non-extra) top-level nodes in the synthetic root children.

func (b *resultRootBuild) wrapChildrenInError(children []*Node) []*Node {
	if len(children) == 0 {
		return children
	}
	// If there's already a single ERROR child, no wrapping needed.
	if len(children) == 1 && children[0] != nil && children[0].symbol == errorSymbol {
		return children
	}
	errNode := newParentNodeInArena(b.arena, errorSymbol, true, children, nil, 0)
	errNode.setHasError(true)
	return b.singleChildSlice(errNode)
}

func (b *resultRootBuild) shouldWrapSyntheticRoot(children []*Node) bool {
	for _, child := range children {
		if child == nil || child.isExtra() {
			continue
		}
		if child.symbol != errorSymbol && !child.IsError() {
			return false
		}
	}
	return true
}

Comment on lines +71 to 75
if goTree == nil || goTree.RootNode() == nil {
t.Error("Go parser returned nil tree")
langFail[langName]++
return
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If goTree is successfully allocated (non-nil) but goTree.RootNode() is nil, the subtest returns early on line 74. Since defer releaseGoTree(goTree) is only registered on line 76 (after this check), the allocated goTree is never released, leading to a resource/memory leak. Splitting the check and manually releasing goTree when RootNode() is nil prevents this leak.

			if goTree == nil {
				t.Error("Go parser returned nil tree")
				langFail[langName]++
				return
			}
			if goTree.RootNode() == nil {
				t.Error("Go parser returned tree with nil root node")
				releaseGoTree(goTree)
				langFail[langName]++
				return
			}

Comment on lines 99 to 103
if cTree == nil || cTree.RootNode() == nil {
msg := "C parser returned nil tree"
t.Error(msg)
results = append(results, fileSummary{filename: de.Name(), lang: langName, errMsg: msg})
t.Error("C parser returned nil tree")
langFail[langName]++
return
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the goTree check, if cTree is non-nil but cTree.RootNode() is nil, the subtest returns early on line 102. Since defer cTree.Close() is only registered on line 104, the allocated cTree is never closed, leaking C memory. Splitting the check and manually calling cTree.Close() when RootNode() is nil prevents this leak.

Suggested change
if cTree == nil || cTree.RootNode() == nil {
msg := "C parser returned nil tree"
t.Error(msg)
results = append(results, fileSummary{filename: de.Name(), lang: langName, errMsg: msg})
t.Error("C parser returned nil tree")
langFail[langName]++
return
}
if cTree == nil {
t.Error("C parser returned nil tree")
langFail[langName]++
return
}
if cTree.RootNode() == nil {
t.Error("C parser returned tree with nil root node")
cTree.Close()
langFail[langName]++
return
}

C tree-sitter always uses the grammar's start symbol (e.g. source_file)
as the root node, even when the parse result contains errors. Previously
gotreesitter returned ERROR (65535) as the root symbol for Go error
cases, making the root an ERROR node instead of source_file.

This change adds Go to the language-specific override list in
syntheticRootSymbol, matching the existing pattern for SQL and Swift.

Note: this does NOT match the full error-recovery tree structure of C
tree-sitter (e.g. fragment grouping under ERROR children). Those are
deeper GLR parser differences tracked separately. This change only
ensures the root node type is consistent.

Also adds TestRootShapeParity in cgo_harness to empirically compare
C tree-sitter vs gotreesitter tree shapes for various error scenarios.
@frozen425 frozen425 force-pushed the fix/go-error-root-parity branch from cd69544 to 650eb2b Compare June 8, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant