Currently, Taker.ref()'s recursion protection (CheckParser) intercepts infinite cycles at the same input position by silently returning NoMatch (backtrack).
While this prevents StackOverflowError at runtime, it introduces a silent misparsing risk.
Imagine an unsuspecting developer created a left-recursive grammar rule like EXPR = EXPR + NUM | NUM, when parsing "1+2", the mentality model is that the parser should work like any EBNF or ANTLRv4, to successfully parse the left-associative grammar and evaluate to 3.
But since left recursion is unsupported by the library, the left branch is entered at position 0, recurses immediately, triggers cycle detection, and silently returns NoMatch.
The parser combinator (like .or() or oneOf()) catches this NoMatch and silently falls back to the right branch Num.
The parse succeeds, but only consumes "1" and returns a surprising result (1), silently leaving "+2" unconsumed.
Even if the EXPR grammar is composed in a larger grammar that will consume +2 later on, the result would be wrong already (and it breaks left associativity of the EBNF rule).
Suggestion: The recursion protection should fail fast and loud rather than backtracking silently.
Test to reproduce:
@Test
public void leftRecursion_silentlyBacktracksAndMisparses() {
// left-recursive grammar: expr ::= expr '+' number | number
Taker<Integer> expr = Taker.ref();
Taker<Integer> num = Numeric.integer;
expr.set(
expr.thenSkip(chr('+')).then(num).map((l, r) -> l + r)
.or(num)
);
// For input "1+2", the developer expects it to be parsed as 3 (1+2).
// But because of silent recursion protection, the left branch `expr.thenSkip('+').then(base)`
// is called recursively at position 0, which triggers cycle detection and silently returns NoMatch.
// The `or` combinator then falls back to the right branch `base`, which successfully parses only 1 at position 0.
// Thus, the parser succeeds but only consumes 1, silently leaving "+2" unconsumed!
Result<Integer> result = expr.parse("1+2");
// The parser reports success!
assertThat(result.matches()).isTrue();
// BUT it only parsed the base case 1, NOT the full evaluation of 3!
assertThat(result.value()).isEqualTo(1);
// And it silently left "+2" unconsumed at position 1 (did not reach EOF).
assertThat(result.input().position()).isEqualTo(1);
assertThat(result.input().isEof()).isFalse();
}
Currently, Taker.ref()'s recursion protection (CheckParser) intercepts infinite cycles at the same input position by silently returning NoMatch (backtrack).
While this prevents StackOverflowError at runtime, it introduces a silent misparsing risk.
Imagine an unsuspecting developer created a left-recursive grammar rule like
EXPR = EXPR + NUM | NUM, when parsing "1+2", the mentality model is that the parser should work like any EBNF or ANTLRv4, to successfully parse the left-associative grammar and evaluate to 3.But since left recursion is unsupported by the library, the left branch is entered at position 0, recurses immediately, triggers cycle detection, and silently returns
NoMatch.The parser combinator (like
.or()oroneOf()) catches this NoMatch and silently falls back to the right branchNum.The parse succeeds, but only consumes "1" and returns a surprising result (
1), silently leaving "+2" unconsumed.Even if the
EXPRgrammar is composed in a larger grammar that will consume+2later on, the result would be wrong already (and it breaks left associativity of the EBNF rule).Suggestion: The recursion protection should fail fast and loud rather than backtracking silently.
Test to reproduce: