Squirrel Parser - A packrat parser with left recursion and error recovery, transpiled from Python to Common Lisp under supervision by OpenCode from the original implementation: Squirrel Parser.
cl-squirrel is a Common Lisp implementation of the Squirrel Parser, a packrat parser with support for:
- Left recursion - Direct and indirect left recursion are handled correctly
- Error recovery - Two-phase parsing with bounded error recovery
- PEG grammars - Parsing Expression Grammars
- Memoization - Packrat parsing ensures linear time complexity
- Custom AST/CST - Build abstract or concrete syntax trees with factory functions
- Clone this repository:
git clone https://github.com/amb007/cl-squirrel.git
cd cl-squirrel- Load the system in your Lisp implementation:
(push (truename ".") asdf:*central-registry*)
(asdf:load-system :cl-squirrel/test)
(fiveam:run! 'cl-squirrel/test::test-cl-squirrel)Define a grammar and parse input:
(in-package :cl-squirrel) ;:cl-squirrel-user
(defparameter *simple-grammar*
"S <- \"hello\" ' ' \"world\";")
(defun parse-hello-world (input)
(squirrel-parse-ast :grammar-spec *simple-grammar*
:top-rule-name "S"
:input input))
;; Parse with no errors
;; SQUIRREL> (parse-hello-world "hello world")
;; => #<AST-Node S: pos=0 len=11>The parser has two-phase error recovery:
;; Phase 1: Try to parse without recovery
(defun parse-with-recovery (input)
(let ((parse-result (squirrel-parse-pt :grammar-spec *simple-grammar*
:top-rule-name "S"
:input input)))
(when (parse-result-has-syntax-errors parse-result)
(format t "Syntax errors found:~%~{ ~A~%~}"
(parse-result-get-syntax-errors parse-result)))
parse-result))
;; SQUIRREL> (parse-with-recovery "hello world!")
;; Syntax errors found:
;; #<SYNTAX-ERROR pos=11 len=1>
;; => #<PARSE-RESULT {701BCB0983}>Build a Concrete Syntax Tree using factory functions:
;; Define a grammar with named rules
(defparameter *simple-grammar*
"S <- \"hello\" ' ' \"world\";")
;; Factory for the top-level rule "S"
(defun make-s-factory (ast-node children)
(declare (ignore ast-node))
(list :rule "S" :children children))
;; Factory for terminals (string literals in the grammar)
(defun make-terminal-factory (ast-node children)
(declare (ignore ast-node children))
(list :terminal t))
(defun parse-with-cst (input)
(let ((factories (make-hash-table :test #'equal)))
;; Key factories by rule names from the grammar
(setf (gethash "S" factories) #'make-s-factory)
;; Use "<Terminal>" for all string/char literals
(setf (gethash "<Terminal>" factories) #'make-terminal-factory)
(squirrel-parse-cst :grammar-spec *simple-grammar*
:top-rule-name "S"
:factories factories
:input input
:allow-syntax-errors t)))
;; SQUIRREL> (parse-with-cst "hello world")
;; => (:rule "S" :children ((:terminal t) (:terminal t) (:terminal t)))Prefix a rule name with ~ to make it transparent (it won't appear in the AST):
"S <- \"hello\" ~WS \"world\";
~WS <- [ \\t\\n\\r]*;"The meta-grammar for defining PEG grammars:
Rule <- '~'? Identifier '<-' Expression ';'
Expression <- Sequence ('/' Sequence)*
Sequence <- Prefix+
Prefix -> ('&' / '!' / '~')? Suffix
Suffix -> Primary ('*' / '+' / '?')?
Primary -> Identifier / StringLiteral / CharLiteral / CharClass / AnyChar / '(' Expression ')'
StringLiteral <- '"' Character* '"'
CharLiteral <- '\'' Character '\''
CharClass <- '[' '^'? (CharRange / CharClassChar)+ ']'
CharRange <- CharClassChar '-' CharClassChar
CharClassChar -> EscapeSequence / [^\\]\\-]
EscapeSequence <- '\\' [\"'\\\\/bnrt] / ('u' [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F])
AnyChar <- '.'
Seq- Sequence: matches all sub-clauses in orderFirst- Ordered choice: matches the first successful sub-clauseOneOrMore- One or more repetitions (+)ZeroOrMore- Zero or more repetitions (*)Optional- Optional: matches zero or one instance (?)Ref- Reference to a named ruleNotFollowedBy- Negative lookahead: succeeds if sub-clause failsFollowedBy- Positive lookahead: succeeds if sub-clause succeeds
Str- Matches a literal string (e.g.,"hello")Char- Matches a single character (e.g.,'a')CharSet- Matches a character class (e.g.,[a-zA-Z])AnyChar- Matches any single character (.)Nothing- Matches nothing (always succeeds)
squirrel-parse-pt- Parse and return raw parse treesquirrel-parse-ast- Parse and return abstract syntax treesquirrel-parse-cst- Parse and return concrete syntax treebuild-ast- Build AST from parse resultbuild-cst- Build CST from AST with factoriesescape-string- Escape a string for displayunescape-string- Unescape a string literalunescape-char- Unescape a single character
parser- The squirrel parser with bounded error recoveryparse-result- The result of parsing the inputmatch-result- Result of matching a clausematch- A successful matchsyntax-error- A syntax error nodeclause- Base class for all grammar clausesterminal- Abstract base class for terminal clausesseq- Sequence combinatorfirst- Ordered choice combinatorrepetition- Repetition base classone-or-more- One or more repetitionszero-or-more- Zero or more repetitionsoptional- Optional combinatorref- Reference to a named rulenot-followed-by- Negative lookaheadfollowed-by- Positive lookaheadstr- String terminalchar- Character terminalchar-set- Character set terminalany-char- Any character terminalnothing- Nothing terminalnode- Base class for AST and CST nodesast-node- An AST nodecst-node- Base class for CST nodes
parse-error- Base parse error conditiongrammar-error- Grammar-related errorrule-not-found-error- Rule not found in grammarsyntax-error-in-grammar- Syntax errors in grammar specification
cl-squirrel includes a comprehensive test suite with 52 test files covering all major functionality.
# Install dependencies and run tests
sbcl --noinform --non-interactive \
--eval '(ql:quickload :cl-squirrel/test)' \
--eval '(fiveam:run! :cl-squirrel/test::test-cl-squirrel)' \
--quit;; 1. Load Quicklisp (adjust path if needed)
(load "~/quicklisp/setup.lisp")
;; 2. Tell ASDF where to find the system
(push #P"/path/to/cl-squirrel/" asdf:*central-registry*)
;; Or if you're already in the project directory:
(push (truename ".") asdf:*central-registry*)
;; 3. Load the test system
(ql:quickload :cl-squirrel/test)
;; 4. Run all tests
(fiveam:run! :cl-squirrel/test::test-cl-squirrel)
;; Or run a specific test suite by name
(fiveam:run! 'cl-squirrel/test::test-terminals)
(fiveam:run! 'cl-squirrel/test::test-lr)
(fiveam:run! 'cl-squirrel/test::test-stress)
;; Or run tests interactively and see results
(fiveam:explain! (fiveam:run :cl-squirrel/test::test-cl-squirrel))The test suite includes:
| Category | Files | Description |
|---|---|---|
| Core Features | 6 | Terminals, combinators, sequences, first/alternatives, repetition, LR |
| Left Recursion | 4 | Basic LR, LR from paper figures, LR recovery, LR interactions |
| Meta-Grammar | 9 | Grammar parsing, syntax, character classes, precedence, stress |
| Error Recovery | 11 | Recovery patterns, error localization, boundary constraints |
| Stress Tests | 3 | Performance, linearity, advanced stress tests |
| CST/AST | 3 | Tree building, node creation, factory patterns |
| Integration | 6 | JSON parsing, complex interactions, cache clearing |
To quickly verify your setup works:
sbcl --load verify-setup.lispThis will load Quicklisp, the system, and run a quick sanity check.
cl-squirrel implements packrat parsing with memoization, ensuring O(n·|G|) time complexity where:
- n is the length of the input
- |G| is the size of the grammar
The parser includes optional statistics tracking:
(cl-squirrel/internal:enable-parser-stats)
;; ... run parsing ...
(let ((stats cl-squirrel/internal:*parser-stats*))
(format t "Total work: ~D~%" (cl-squirrel/internal:stats-total-work stats))
(format t "Cache hits: ~D~%" (cl-squirrel/internal:stats-cache-hits stats))
(format t "LR expansions: ~D~%" (cl-squirrel/internal:stats-lr-expansions stats))
(format t "Recovery attempts: ~D~%" (cl-squirrel/internal:stats-recovery-attempts stats)))This Common Lisp implementation follows idiomatic CL patterns:
- CLOS classes and generic functions
- Hash tables for efficient lookups
- Proper error handling with conditions
- Edi Weitz-style flat project structure
- Minimal external dependencies
- Original Python implementation: Squirrel Parser
- CL library patterns from cl-library-craft
- The bulk of the work done by OpenCode Zen (free) models Kimi K2.5, MiniMax M2.1 and others
- Packrat parsing research: Bryan Ford's "Packrat Parsing Can Support Left Recursion"
MIT License
Contributions are welcome! Please feel free to submit issues or pull requests.