Parsers and parser combinators are means of parsing a stream of some type into some other type. For example, one might start with a string whose contents are a mathematical expression such as 2 + 3. We want to get from this string to an array of tokens which can then be further processed to evaluate the expression itself or display it with syntax highlighting or for some other purpose.
I can’t do justice to this topic here. Wikipedia has the canonical article and your favorite search engine will have lots more information.
For those that already know a bit about parser combinators, a Parsimonious parser is a function with the following type signature:
typealias Parser<E, T> = Context<E>.() -> TA parser combinator, as opposed to a parser, is a function which takes zero or more parsers as input and returns another parser as output. For example, the infix function or is a combinator:
infix fun <E, T> Parser<E, T>.or(rhs: Parser<E, T>): Parser<E, T> {
val lhs = this
return {
try {
lhs(this)
} catch (e: ParseException) {
rhs(this)
}
}
}This is simpler than it looks. The or combinator returns a parser which first attempts the left-hand parser. If that fails, it tries the right-hard parser, and if that fails, parsing fails.
Parser combinators are a technique from functional programming, not object-oriented programming. As such, Parsimonious includes only one object, Context<E> and many functions declared at the top level. There are also many infix functions. This may feel strange and not very "Kotlin", but attempting to write parser combinators in an object-oriented style results in code that is difficult to read and extremely verbose.
The S suffix found on identifiers such as ParserS, countS and manyS is used to indicate that these are special variants that work on a Context<Char> and return Strings rather than Char or List<Char>.
While there are others, the three most fundamental combinators are satisfy, count and or. Almost all other combinators are built on top of these. If you understand these, you can understand everything.
The satisfy combinator takes (E) -> Boolean as its argument. If the element at the Context's index passes the test, it is returned. If not, a ParseException is thrown.
val e = satisfy<Char> { it == 'e' }
val result = Context.fromString("e") parseWith eThe count combinator takes three arguments: a minimum count, a maximum count, and a parser and attempts to match that parser a valid number of times.
val e = satisfy<Char> { it == 'e' }
val result = Context.fromString("eeeee") parseWith count(2, Int.MAX_VALUE, e)This attempts to march the e parser at least 2 times, which obviously succeeds.
The or combinator has already been described above.
In simple cases, you don't need them. Take this expression for instance:
x AND y AND NOT z
This can be parsed fairly easily using string splitting and regular expressions, although parser combinators are hundreds of times faster in most cases than regular expressions. But things fall apart quickly as soon as we allow parenthetic expressions and recursion:
x AND y AND NOT (z OR NOT (a AND b))
As soon as recursion and nesting are allowed, regular expressions and string splitting fall flat on their faces. The easiest way — really, the only reliable way — to parse the expression above is to use a formal parsing methodology such as a recursive descent parser. Really, we have two options: write a custom recursive descent parser or use a library that helps us implement a recursive descent parser. Parsimonious is one such library.
To see how to implement a parsing grammar for x AND y AND NOT (z OR NOT (a AND b)), look at ConditionTest, which implements and tests a complete grammar for expressions of this type.