Skip to content

amb007/cl-squirrel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cl-squirrel

Ask DeepWiki

Squirrel Parser - A packrat parser with left recursion and error recovery, transpiled from Python to Common Lisp under supervision by OpenCode from the original implementation: Squirrel Parser.

Overview

cl-squirrel is a Common Lisp implementation of the Squirrel Parser, a packrat parser with support for:

  • Left recursion - Direct and indirect left recursion are handled correctly
  • Error recovery - Two-phase parsing with bounded error recovery
  • PEG grammars - Parsing Expression Grammars
  • Memoization - Packrat parsing ensures linear time complexity
  • Custom AST/CST - Build abstract or concrete syntax trees with factory functions

Installation

  1. Clone this repository:
git clone https://github.com/amb007/cl-squirrel.git
cd cl-squirrel
  1. Load the system in your Lisp implementation:
(push (truename ".") asdf:*central-registry*)
(asdf:load-system :cl-squirrel/test)
(fiveam:run! 'cl-squirrel/test::test-cl-squirrel)

Usage

Basic Parsing

Define a grammar and parse input:

(in-package :cl-squirrel) ;:cl-squirrel-user

(defparameter *simple-grammar*
  "S <- \"hello\" ' ' \"world\";")

(defun parse-hello-world (input)
  (squirrel-parse-ast :grammar-spec *simple-grammar*
                      :top-rule-name "S"
                      :input input))

;; Parse with no errors
;; SQUIRREL> (parse-hello-world "hello world")
;; => #<AST-Node S: pos=0 len=11>

Error Recovery

The parser has two-phase error recovery:

;; Phase 1: Try to parse without recovery
(defun parse-with-recovery (input)
  (let ((parse-result (squirrel-parse-pt :grammar-spec *simple-grammar*
                                         :top-rule-name "S"
                                         :input input)))
    (when (parse-result-has-syntax-errors parse-result)
      (format t "Syntax errors found:~%~{  ~A~%~}"
              (parse-result-get-syntax-errors parse-result)))
    parse-result))

;; SQUIRREL> (parse-with-recovery "hello world!")
;; Syntax errors found:
;;   #<SYNTAX-ERROR pos=11 len=1>
;; => #<PARSE-RESULT {701BCB0983}>

Custom CST Building

Build a Concrete Syntax Tree using factory functions:

;; Define a grammar with named rules
(defparameter *simple-grammar*
  "S <- \"hello\" ' ' \"world\";")

;; Factory for the top-level rule "S"
(defun make-s-factory (ast-node children)
  (declare (ignore ast-node))
  (list :rule "S" :children children))

;; Factory for terminals (string literals in the grammar)
(defun make-terminal-factory (ast-node children)
  (declare (ignore ast-node children))
  (list :terminal t))

(defun parse-with-cst (input)
  (let ((factories (make-hash-table :test #'equal)))
    ;; Key factories by rule names from the grammar
    (setf (gethash "S" factories) #'make-s-factory)
    ;; Use "<Terminal>" for all string/char literals
    (setf (gethash "<Terminal>" factories) #'make-terminal-factory)
    (squirrel-parse-cst :grammar-spec *simple-grammar*
                        :top-rule-name "S"
                        :factories factories
                        :input input
                        :allow-syntax-errors t)))

;; SQUIRREL> (parse-with-cst "hello world")
;; => (:rule "S" :children ((:terminal t) (:terminal t) (:terminal t)))

Transparent Rules

Prefix a rule name with ~ to make it transparent (it won't appear in the AST):

"S <- \"hello\" ~WS \"world\";
 ~WS <- [ \\t\\n\\r]*;"

Grammar Syntax

The meta-grammar for defining PEG grammars:

Rule          <- '~'? Identifier '<-' Expression ';'
Expression    <- Sequence ('/' Sequence)*
Sequence      <- Prefix+
Prefix        -> ('&' / '!' / '~')? Suffix
Suffix        -> Primary ('*' / '+' / '?')?
Primary        -> Identifier / StringLiteral / CharLiteral / CharClass / AnyChar / '(' Expression ')'
StringLiteral  <- '"' Character* '"'
CharLiteral   <- '\'' Character '\''
CharClass     <- '[' '^'? (CharRange / CharClassChar)+ ']'
CharRange     <- CharClassChar '-' CharClassChar
CharClassChar -> EscapeSequence / [^\\]\\-]
EscapeSequence <- '\\' [\"'\\\\/bnrt] / ('u' [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
AnyChar       <- '.'

Combinators

  • Seq - Sequence: matches all sub-clauses in order
  • First - Ordered choice: matches the first successful sub-clause
  • OneOrMore - One or more repetitions (+)
  • ZeroOrMore - Zero or more repetitions (*)
  • Optional - Optional: matches zero or one instance (?)
  • Ref - Reference to a named rule
  • NotFollowedBy - Negative lookahead: succeeds if sub-clause fails
  • FollowedBy - Positive lookahead: succeeds if sub-clause succeeds

Terminals

  • Str - Matches a literal string (e.g., "hello")
  • Char - Matches a single character (e.g., 'a')
  • CharSet - Matches a character class (e.g., [a-zA-Z])
  • AnyChar - Matches any single character (.)
  • Nothing - Matches nothing (always succeeds)

API Reference

Functions

  • squirrel-parse-pt - Parse and return raw parse tree
  • squirrel-parse-ast - Parse and return abstract syntax tree
  • squirrel-parse-cst - Parse and return concrete syntax tree
  • build-ast - Build AST from parse result
  • build-cst - Build CST from AST with factories
  • escape-string - Escape a string for display
  • unescape-string - Unescape a string literal
  • unescape-char - Unescape a single character

Classes

  • parser - The squirrel parser with bounded error recovery
  • parse-result - The result of parsing the input
  • match-result - Result of matching a clause
  • match - A successful match
  • syntax-error - A syntax error node
  • clause - Base class for all grammar clauses
  • terminal - Abstract base class for terminal clauses
  • seq - Sequence combinator
  • first - Ordered choice combinator
  • repetition - Repetition base class
  • one-or-more - One or more repetitions
  • zero-or-more - Zero or more repetitions
  • optional - Optional combinator
  • ref - Reference to a named rule
  • not-followed-by - Negative lookahead
  • followed-by - Positive lookahead
  • str - String terminal
  • char - Character terminal
  • char-set - Character set terminal
  • any-char - Any character terminal
  • nothing - Nothing terminal
  • node - Base class for AST and CST nodes
  • ast-node - An AST node
  • cst-node - Base class for CST nodes

Conditions

  • parse-error - Base parse error condition
  • grammar-error - Grammar-related error
  • rule-not-found-error - Rule not found in grammar
  • syntax-error-in-grammar - Syntax errors in grammar specification

Testing

cl-squirrel includes a comprehensive test suite with 52 test files covering all major functionality.

Running All Tests

Using SBCL (Recommended)

# Install dependencies and run tests
sbcl --noinform --non-interactive \
     --eval '(ql:quickload :cl-squirrel/test)' \
     --eval '(fiveam:run! :cl-squirrel/test::test-cl-squirrel)' \
     --quit

From REPL (Interactive)

;; 1. Load Quicklisp (adjust path if needed)
(load "~/quicklisp/setup.lisp")

;; 2. Tell ASDF where to find the system
(push #P"/path/to/cl-squirrel/" asdf:*central-registry*)
;; Or if you're already in the project directory:
(push (truename ".") asdf:*central-registry*)

;; 3. Load the test system
(ql:quickload :cl-squirrel/test)

;; 4. Run all tests
(fiveam:run! :cl-squirrel/test::test-cl-squirrel)

;; Or run a specific test suite by name
(fiveam:run! 'cl-squirrel/test::test-terminals)
(fiveam:run! 'cl-squirrel/test::test-lr)
(fiveam:run! 'cl-squirrel/test::test-stress)

;; Or run tests interactively and see results
(fiveam:explain! (fiveam:run :cl-squirrel/test::test-cl-squirrel))

Test Coverage

The test suite includes:

Category Files Description
Core Features 6 Terminals, combinators, sequences, first/alternatives, repetition, LR
Left Recursion 4 Basic LR, LR from paper figures, LR recovery, LR interactions
Meta-Grammar 9 Grammar parsing, syntax, character classes, precedence, stress
Error Recovery 11 Recovery patterns, error localization, boundary constraints
Stress Tests 3 Performance, linearity, advanced stress tests
CST/AST 3 Tree building, node creation, factory patterns
Integration 6 JSON parsing, complex interactions, cache clearing

Quick Verification

To quickly verify your setup works:

sbcl --load verify-setup.lisp

This will load Quicklisp, the system, and run a quick sanity check.

Performance

cl-squirrel implements packrat parsing with memoization, ensuring O(n·|G|) time complexity where:

  • n is the length of the input
  • |G| is the size of the grammar

The parser includes optional statistics tracking:

(cl-squirrel/internal:enable-parser-stats)
;; ... run parsing ...
(let ((stats cl-squirrel/internal:*parser-stats*))
  (format t "Total work: ~D~%" (cl-squirrel/internal:stats-total-work stats))
  (format t "Cache hits: ~D~%" (cl-squirrel/internal:stats-cache-hits stats))
  (format t "LR expansions: ~D~%" (cl-squirrel/internal:stats-lr-expansions stats))
  (format t "Recovery attempts: ~D~%" (cl-squirrel/internal:stats-recovery-attempts stats)))

Implementation Notes

This Common Lisp implementation follows idiomatic CL patterns:

  • CLOS classes and generic functions
  • Hash tables for efficient lookups
  • Proper error handling with conditions
  • Edi Weitz-style flat project structure
  • Minimal external dependencies

Acknowledgments

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

About

A packrat parser with left recursion and error recovery

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors