Skip to content

Fail Fast on Left Recursion instead of Silent Backtracking #1

Description

@fluentfuture

Currently, Taker.ref()'s recursion protection (CheckParser) intercepts infinite cycles at the same input position by silently returning NoMatch (backtrack).

While this prevents StackOverflowError at runtime, it introduces a silent misparsing risk.

Imagine an unsuspecting developer created a left-recursive grammar rule like EXPR = EXPR + NUM | NUM, when parsing "1+2", the mentality model is that the parser should work like any EBNF or ANTLRv4, to successfully parse the left-associative grammar and evaluate to 3.

But since left recursion is unsupported by the library, the left branch is entered at position 0, recurses immediately, triggers cycle detection, and silently returns NoMatch.

The parser combinator (like .or() or oneOf()) catches this NoMatch and silently falls back to the right branch Num.

The parse succeeds, but only consumes "1" and returns a surprising result (1), silently leaving "+2" unconsumed.

Even if the EXPR grammar is composed in a larger grammar that will consume +2 later on, the result would be wrong already (and it breaks left associativity of the EBNF rule).

Suggestion: The recursion protection should fail fast and loud rather than backtracking silently.

Test to reproduce:

    @Test
    public void leftRecursion_silentlyBacktracksAndMisparses() {
        // left-recursive grammar: expr ::= expr '+' number | number
        Taker<Integer> expr = Taker.ref();
        Taker<Integer> num = Numeric.integer;

        expr.set(
            expr.thenSkip(chr('+')).then(num).map((l, r) -> l + r)
            .or(num)
        );

        // For input "1+2", the developer expects it to be parsed as 3 (1+2).
        // But because of silent recursion protection, the left branch `expr.thenSkip('+').then(base)`
        // is called recursively at position 0, which triggers cycle detection and silently returns NoMatch.
        // The `or` combinator then falls back to the right branch `base`, which successfully parses only 1 at position 0.
        // Thus, the parser succeeds but only consumes 1, silently leaving "+2" unconsumed!
        Result<Integer> result = expr.parse("1+2");

        // The parser reports success!
        assertThat(result.matches()).isTrue();

        // BUT it only parsed the base case 1, NOT the full evaluation of 3!
        assertThat(result.value()).isEqualTo(1);

        // And it silently left "+2" unconsumed at position 1 (did not reach EOF).
        assertThat(result.input().position()).isEqualTo(1);
        assertThat(result.input().isEof()).isFalse();
    }

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions