Unifying Parsers, Printers & Grammars
Eitan Chatav
The Haskell programming language is well known to provide a rich environment in which to study parsing. It is in this setting that a major advance in understanding was explained by Conor McBride & Ross Paterson in Applicative Programming with Effects. This paper introduced the Applicative interface in programming, idiom brackets, and the Traversable interface. The final section gives a category theoretical perspective, identifying Applicative with lax monoidal functors. These ideas set off a bonanza of work, including on the closely related Alternative interface and "parser combinators".
Next, Tillmann Rendel & Klaus Ostermann, noticing that parsing & printing are inverse to one another set out to unify their interfaces in Invertible Syntax Descriptions. When designing a language syntax, programmers usually have to write both a parser and a printer, violating the Don't Repeat Yourself principle. The paper reasons that if a syntax is described using parser combinators, then one should be able to use the same combinators to generate both parsers and printers. But unifying the interfaces runs into an immediate issue; while the Parser type is a covariant Functor, the Printer type is instead a Contravariant functor.
newtype Parser a = Parser (String -> [(a,String)])
instance Functor Parser where
fmap f (Parser p) = Parser $ \str ->
[(f a, str') | (a, str') <- p str]
newtype Printer a = Printer (a -> Maybe String)
instance Contravariant Printer where
contramap f (Printer p) = Printer $ p . fAny covariant Functor which is also Contravariant is an "invariant", or "phantom", a.k.a. constant functor, which is too strong of a constraint for parsers. So, the paper instead drops the Functor constraint from the usual Applicative and Alternative parser combinators. At the time another natural interface from category theory had not yet found its way to Haskell, profunctors.
A profunctor is a bifunctor which is contravariant in its first argument and covariant in its second argument.
class Profunctor p where
dimap :: (s -> a) -> (b -> t) -> p a b -> p s t
lmap :: (s -> a) -> p a b -> p s b
lmap f = dimap f id
rmap :: (b -> t) -> p a b -> p a t
rmap g = dimap id gThe Profunctor interface was introduced to Haskell by Ed Kmett who noticed their utility in representing optics, more on which later. The prototypical example of a Profunctor is (->).
instance Profunctor (->) where
dimap sa bt ab = bt . ab . saLet's see how we can unify parsers & printers with a Profunctor interface.
newtype Parsor s f a b = Parsor {runParsor :: s -> f (b,s)}
instance Functor f => Profunctor (Parsor s f) where
dimap _ g (Parsor p) = Parsor (fmap (\(c,str) -> (g c, str)) . p)
newtype Printor s f a b = Printor {runPrintor :: a -> f (s -> s)}
instance Profunctor (Printor s f) where
dimap f _ (Printor p) = Printor (p . f)Besides changing a vowel in the names, we added a few generalizations to the parser & printer types. First, both types get a phantom type parameter, the a in Parsor and the b in Printor. Next, String has been polymorphized to s. After all, we might want to parse & print Text or ByteStrings or any streams of some lexical token type. Next, the printer type's "result" is upgraded to an endofunction s -> s. This makes it a more fully dualized definition to the parser type, and is a common trick used in printer types such as ShowS. Finally, the "container" for the result of a parse or print has been polymorphized to f. Usually this will be [] or Maybe but we should take a moment to address a subtle point here.
Printing is usually conceived as an exhaustive affair; one defines a single function X -> String by pattern matching on each case of an X. Parsing however is usually a partial affair; one defines a bunch of partial parsers for each case of the X and combines them into one with the Alternative combinator (<|>). By polymorphizing f in the definitions of the Printor, we can address either situation by restricting to only Applicative f when we want exhaustivity and allowing Alternative f when we want partiality.
Combinators are great for generating new printer-parsers from existing ones, but if we want to generate examples we need an interface which generates basic printer-parsers from scratch. For that purpose, we define the Tokenized interface.
class Tokenized a b p | p -> a, p -> b where
anyToken :: p a b
data Identical a b s t where
Identical :: Identical a b a b
instance Tokenized a b (Identical a b) where
anyToken = IdenticalThe Tokenized interface is so abstract it would lack any obvious motivation if not for its name. We want anyToken to sequence any single token from the head of the printer-parser stream type s. Recall that s appears as both input and output in the definitions of Parsor & Printor. If s were String then anyToken would correspond to the "cons" pattern (:) which conses (prints) or unconses (parses) any single Char to or from the head of the String. The lens library provides a Cons interface which generalizes the (:) pattern, letting us remain polymorphic over the stream type.
class Cons s t a b | s -> a, t -> b, s b -> t, t a -> s where
_Cons :: Prism s t (a,s) (b,t)
instance (Applicative f, Cons s s c c)
=> Tokenized c c (Printor s f) where
anyToken = Printor (pure . cons)
instance (Alternative f, Cons s s c c)
=> Tokenized c c (Parsor s f) where
anyToken = Parsor (maybe empty pure . uncons)Now we can see our first printer-parser in action.
>>> runParsor anyToken "xyz" :: [(Char,String)]
[('x',"yz")]
>>> runParsor anyToken "" :: [(Char,String)]
[]
>>> [pr "yz" | pr <- runPrintor anyToken 'x'] :: [String]
["xyz"]Optics are an ongoing field of advanced study in applied category theory and computer science which makes giving an exact definition challenging. In Haskell, research into "semantic editor combinators" led to noticing the composability of such combinators as record field accessors or data constructor patterns, leading to the discovery of optics. Much can and has been written about lenses, traversals & prisms that can't be written about here. The most important thing from our perspective will be the relation between profunctors and optics. Moreover the variety of optics that we will be most interested in will represent "patterns".
Let's start with two definitions of isomorphism optics. Isomorphism optics are pairs of functions, one in a "backwards" direction and one in a "forwards" direction, like a newtype constructor pattern.
data Exchange a b s t = Exchange (s -> a) (b -> t)
type Iso s t a b = forall p f.
(Profunctor p, Functor f) => p a (f b) -> p s (f t)The Exchange type is a "concrete representation" of the isomorphism optic and Iso is a "Kmett representation". They're equivalent and you can convert between representations. The lens library uses Kmett's representation as it provides a convenient specialization to a "Van Laarhoven representation" when p is (->). However, a pure profunctorial representation (without f), which is also equivalent, can be simpler. We can observe the profunctorial representation with a combinator.
mapIso :: Profunctor p => Iso s t a b -> p a b -> p s t
mapIso pattern = withIso pattern dimapNow, let's move on to our next optic example, prisms. Prisms encode exhaustive patterns. To encode them we first need the Choice interface.
class Profunctor p => Choice p where
left' :: p a b -> p (Either a c) (Either b c)
right' :: p a b -> p (Either c a) (Either c b)Now we can define a concrete prism type consisting of a pair of functions, a constructor and destructor for a pattern match, as well as defining the abstract Prism type.
data Market a b s t = Market (b -> t) (s -> Either t a)
type Prism s t a b = forall p f.
(Choice p, Applicative f) => p a (f b) -> p s (f t)
type Prism' s a = Prism s s a aThe Applicative f in the definition of Prism makes it so that every Prism is automatically a Traversal. But again, for our purpose we're most interested in the pure profunctor representation which we can observe with a new combinator.
(>?) :: Choice p => Prism s t a b -> p a b -> p s t
(>?) pattern = withPrism pattern $ \f g -> dimap g (either id f) . right'It turns out that both Parsor and Printor have Choice instances so that (>?) becomes both a parser combinator and a printer combinator. The lens library provides Template Haskell functions to make Prisms from the pattern constructors of algebraic datatypes. Or we can construct Prisms like only from a pair of functions using the prism' or prism smart constructors.
only :: Eq a => a -> Prism' a ()
only a = prism' (\() -> a) $ guard . (a ==)Next, we may consider the dual to prisms, coprisms. If Prisms encode pattern matching, then coprisms encode pattern filtering. To encode them we first need the Cochoice interface, which we get from Choice by reversing arrows.
class Profunctor p => Cochoice p where
unleft :: p (Either a c) (Either b c) -> p a b
unright :: p (Either c a) (Either c b) -> p a bActually, we don't really need to encode coprisms. Since they are dual to prisms we can just shuffle the indices of the Prism type to encode coprisms and observe the pure profunctor representation with our next combinator.
(?<) :: Cochoice p => Prism b a t s -> p a b -> p s t
(?<) pattern = withPrism pattern $ \f g -> unright . dimap (either id f) g
token :: (Cochoice p, Eq c, Tokenized c c p) => c -> p () ()
token c = only c ?< anyToken
>>> runParsor (token 'x') "xyz" :: [((),String)]
[((),"yz")]
>>> [pr "yz" | pr <- runPrintor (token 'x') ()] :: [String]
["xyz"]Both Parsor and Printor have Cochoice instances too. We will call a profunctor with both Choice & Cochoice instances a "partial profunctor". Partial profunctors support a combinator dimapMaybe.
dimapMaybe
:: (Choice p, Cochoice p)
=> (s -> Maybe a) -> (b -> Maybe t)
-> p a b -> p s t
dimapMaybe f g =
let
m2e h = maybe (Left ()) Right . h
fg = dimap (>>= m2e f) (>>= m2e g)
in
unright . fg . right'We can turn the pair of partial functions into another optic, partial isomorphisms.
data PartialExchange a b s t = PartialExchange (s -> Maybe a) (b -> Maybe t)
type PartialIso s t a b = forall p f.
( Choice p, Cochoice p
, Applicative f, Filterable f
) => p a (f b) -> p s (f t)
type PartialIso' s a = PartialIso s s a aYou may not have seen the Filterable interface before. It is a very simple interface from the witherable library that generalizes the list functions catMaybes, filter and mapMaybe.
class Functor f => Filterable f where
mapMaybe :: (a -> Maybe b) -> f a -> f b
mapMaybe f = catMaybes . fmap f
catMaybes :: f (Maybe a) -> f a
catMaybes = mapMaybe id
filter :: (a -> Bool) -> f a -> f a
filter f = mapMaybe $ \a -> if f a then Just a else NothingFilterable is dual to the Alternative interface, which can be seen by comparing the signature of catMaybes to optional.
optional :: Alternative f => f a -> f (Maybe a)We can now turn dimapMaybe into a combinator on PartialIsos.
(>?<) :: (Choice p, Cochoice p) => PartialIso s t a b -> p a b -> p s tThe prototypical example of a PartialIso is a subset which has satisfied a predicate which we can construct from a pair of functions with the smart constructor partialIso.
satisfied :: (a -> Bool) -> PartialIso' a a
satisfied f = partialIso satiate satiate where
satiate a = if f a then Just a else Nothing
satisfy :: (Choice p, Cochoice p, Tokenized c c p) => (c -> Bool) -> p c c
satisfy f = satisfied f >?< anyToken
>>> runParsor (satisfy isLower) "xyz" :: [(Char,String)]
[('x',"yz")]
>>> runParsor (satisfy isLower) "X" :: [(Char,String)]
[]
>>> [pr "" | pr <- runPrintor (satisfy isLower) 'X'] :: [String]
[]
>>> [pr "yz" | pr <- runPrintor (satisfy isLower) 'x'] :: [String]
["xyz"]Before profunctors found their way into Haskell, another interface Arrows were introduced. In Generalizing Monads to Arrows, John Hughes introduces the interface inspired by the example of parsers. In the Applicative Programming with Effects paper mentioned earlier they note about Arrows:
By fixing the first argument of an arrow type, we obtain an applicative functor. .. Indeed one of Hughes’s motivations was the parsers of Swierstra and Duponcheel (1996). It may turn out that applicative functors are more convenient for applications of the second class
So Arrows are already Applicative but they're also already (Strong) Profunctors, bearing the same relation to them as Monad bears to Functor. We want to find the missing spot in this table.
| (Type -> Type) -> Constraint | (Type -> Type -> Type) -> Constraint |
|---|---|
| Functor | Profunctor |
| Applicative | ??? |
| Monad | Arrow |
This interface has variously been called a product profunctor or a (lax) monoidal profunctor. It turns out it's equivalent to a constraint synonym, hinted at in the quote.
type Monoidal p = (Profunctor p, forall x. Applicative (p x))It's not new at all, just the combination of the two interfaces, enabled by the quantified constraints extension. Both Parsor and Printor support this interface since they have Applicative instances.
instance Monad f => Applicative (Parsor s f a) where
pure b = Parsor (\str -> return (b,str))
Parsor x <*> Parsor y = Parsor $ \str -> do
(f, str') <- x str
(a, str'') <- y str'
return (f a, str'')
instance Applicative f => Applicative (Printor s f a) where
pure _ = Printor (\_ -> pure id)
Printor p <*> Printor q = Printor (\a -> (.) <$> p a <*> q a)We can now define sequencing combinators for Monoidal profunctors.
oneP :: Monoidal p => p () ()
oneP = pure ()
(>*<) :: Monoidal p => p a b -> p c d -> p (a,c) (b,d)
x >*< y = (,) <$> lmap fst x <*> lmap snd y
(>*) :: Monoidal p => p () c -> p a b -> p a b
x >* y = lmap (const ()) x *> y
(*<) :: Monoidal p => p a b -> p () c -> p a b
x *< y = x <* lmap (const ()) yIn Applicative parsing one uses an "idiom style" with the functor mapping combinator <$> and sequencing combinators <*>, *> & <*. In Monoidal parsing one can use the same idiom style, with partial profunctor mapping combinator >?< together with sequencing combinators >*<, >* & *<.
We can also form the tokens combinator, which we can use to give IsString instances to Parsor and Printor.
tokens :: (Cochoice p, Monoidal p, Eq c, Tokenized c c p) => [c] -> p () ()
tokens [] = oneP
tokens (c:cs) = token c *> tokens cs
instance (Alternative f, Filterable f, Monad f, Cons s s Char Char)
=> IsString (Parsor s f () ()) where
fromString = tokens
instance (Applicative f, Cons s s Char Char)
=> IsString (Printor s f () ()) where
fromString = tokens
>>> :set -XOverloadedStrings
>>> runParsor "abc" "abcxyz" :: [((),String)]
[((),"xyz")]
>>> [pr "xyz" | pr <- runPrintor "abc" ()] :: [String]
["abcxyz"]More combinators can be added and monoidal profunctors have been well studied, including their corresponding optics called monocles, which are related to traversal & grate optics.
The Applicative interface is always the star of the show when it comes to parsing, while the Alternative interface by comparison gets too little attention. Let's revisit it.
class Applicative f => Alternative f where
empty :: f a
(<|>) :: f a -> f a -> f aWe can re-characterize Alternative with methods zeroF & (<+>) instead of empty & (<|>), in order to see more clearly how the Alternative interface relates to the nilary and binary coproducts Void and Either.
zeroF :: Alternative f => f Void
zeroF = empty
(<+>) :: Alternative f => f a -> f b -> f (Either a b)
a <+> b = Left <$> a <|> Right <$> b
prop> empty = absurd <$> zeroF
prop> a <|> b = either id id <$> (a <+> b)Unfortunately, the same trick used to define Monoidal as a combination of Profunctor and Applicative does not work for the Alternative interface. So we introduce the Distributor interface, which analogizes the above re-characterization of Alternative to profunctors.
class Monoidal p => Distributor p where
zeroP :: p Void Void
(>+<) :: p a b -> p c d -> p (Either a c) (Either b d)Just as Alternative has 0-or-more many and 0-or-1 optional combinators, we can define manyP and optionalP.
optionalP :: Distributor p => p a b -> p (Maybe a) (Maybe b)
optionalP p = mapIso eotMaybe (oneP >+< p)
manyP :: p a b -> p [a] [b]
manyP p = mapIso listEot (oneP >+< p >*< manyP p)
eotMaybe :: Iso (Maybe a) (Maybe b) (Either () a) (Either () b)
listEot
:: (Cons s s a a, AsEmpty t, Cons t t b b)
=> Iso s t (Either () (a,s)) (Either () (b,t))The prototypical example of a Distributor is (->). Unlike monoidal profunctors, distributors have not been very well studied. Like Alternative this interface doesn't seem to get the attention it deserves. The name was coined by Jean Bénabou who invented the term and originally used “profunctor,” then preferred “distributor”. Since "profunctor" became the preferred nomenclature, we use "distributor" or lax distributive profunctor for this more restrictive interface since it uses the distributive structure on the category with (), (,), Void & Either. Distributors have corresponding optics I dubbed "diopters" which can be thought of as positional patterns. If you transform an algebraic datatype into its eithers-of-tuples representation, you can match its algebraic structure with the algebraic combinators >*<, oneP, >+< & zeroP.
Distributors are like an exhaustive analog to Alternative for profunctors, but if we want to have partiality and failure we need this next interface with an even stronger analogy to Alternative.
class (Choice p, Distributor p, forall x. Alternative (p x)) => Alternator p where
alternate :: Either (p a b) (p c d) -> p (Either a c) (Either b d)The Alternator interface extends Choice, Distributor & Alternative with a method alternate that lets us default the Distributor methods.
prop> zeroP = empty
prop> x >+< y = alternate (Left x) <|> alternate (Right y)In the functorial case, the two descriptions of Alternative in terms of either zeroF & <+> or empty & <|> are equivalent but in the profunctorial case, they're distinguished into the Distributor interface which can be exhaustive and the Alternator interface which must be partial. Since (->) is not Alternative, it is an example of a Choice Distributor which is not an Alternator. However, Parsor and Printor have instances.
instance (Alternative f, Monad f) => Alternative (Parsor s f a) where
empty = Parsor (\_ -> empty)
Parsor p <|> Parsor q = Parsor (\str -> p str <|> q str)
instance (Alternative f, Monad f) => Alternator (Parsor s f) where
alternate = \case
Left (Parsor p) -> Parsor (fmap (\(b, str) -> (Left b, str)) . p)
Right (Parsor p) -> Parsor (fmap (\(b, str) -> (Right b, str)) . p)
instance (Alternative f, Monad f) => Distributor (Parsor s f)
instance Alternative f => Alternative (Printor s f a) where
empty = Printor (\_ -> empty)
Printor p <|> Printor q = Printor (\a -> p a <|> q a)
instance Applicative f => Distributor (Printor s f) where
zeroP = Printor absurd
Printor p >+< Printor q = Printor (either p q)
instance Alternative f => Alternator (Printor s f) where
alternate = \case
Left (Printor p) -> Printor (either p (\_ -> empty))
Right (Printor p) -> Printor (either (\_ -> empty) p)Parsors are Distributors exactly when they are Alternators, using default methods for zeroP and >+<. Notice however that Printor's Distributor instance works for exhaustive printers while its Alternator instance only works for partial printers. With Alternator, we can extend Alternatives partial 1-or-more combinator some profunctorially.
someP :: Alternator p => p a b -> p [a] [b]
someP p = _Cons >? p >*< manyP pRecall that the Filterable interface was dual to the Alternative interface. We can extend Filterable profunctorially, dualizing the Distributor interface. A "partial distributor" means both Alternator & Filtrator.
class (Cochoice p, forall x. Filterable (p x)) => Filtrator p where
filtrate :: p (Either a c) (Either b d) -> (p a b, p c d)
instance Filterable f => Filtrator (Parsor s f) where
filtrate (Parsor p) =
( Parsor (mapMaybe leftMay . p)
, Parsor (mapMaybe rightMay . p)
) where
leftMay (e, str) = either (\b -> Just (b, str)) (\_ -> Nothing) e
rightMay (e, str) = either (\_ -> Nothing) (\b -> Just (b, str)) e
instance Filtrator (Printor s f) where
filtrate (Printor p) = (Printor (p . Left), Printor (p . Right))Compare the signature of filtrate to that of uncurry (>+<) to see why Filtrators are dual to Distributors. Now that we've developed all of our basic combinators, let's write a simple printer-parser example for positive decimal integers.
>>> :{
posDecInt :: (Tokenized Char Char p, Alternator p, Filtrator p) => p Int Int
posDecInt = iso show read >?<
someP (satisfy (\c -> generalCategory c == DecimalNumber))
:}
>>> runParsor posDecInt "2001 A Space Odyssey" :: [(Int,String)]
[(2,"001 A Space Odyssey"),(20,"01 A Space Odyssey"),(200,"1 A Space Odyssey"),(2001," A Space Odyssey")]
>>> [pr " A Space Odyssey" | pr <- runPrintor posDecInt 2001] :: [String]
["2001 A Space Odyssey"]Partial distributors have associated optics which I dubbed "bifocals".
In a blog post Showcasing Applicative by Joachim Breitner, he demonstrates an example of extended Backus-Naur form grammars as a constant Applicative functor. Inspired by this idea and the similar example of regular expressions as Applicatives, we can extend partial distributors to Grammatical distributors.
class
( Alternator p, Filtrator p
, Tokenized Char Char p
, forall t. t ~ p () () => IsString t
) => Grammatical p where
inClass :: String -> p Char Char
inClass str = satisfy $ \ch -> elem ch str
notInClass :: String -> p Char Char
notInClass str = satisfy $ \ch -> notElem ch str
inCategory :: GeneralCategory -> p Char Char
inCategory cat = satisfy $ \ch -> cat == generalCategory ch
notInCategory :: GeneralCategory -> p Char Char
notInCategory cat = satisfy $ \ch -> cat /= generalCategory ch
rule :: String -> p a a -> p a a
rule _ = id
ruleRec :: String -> (p a a -> p a a) -> p a a
ruleRec name = rule name . fix
instance (Alternative f, Cons s s Char Char)
=> Grammatical (Printor s f)
instance (Monad f, Alternative f, Filterable f, Cons s s Char Char)
=> Grammatical (Parsor s f)Notice that all the methods of Grammatical have defaults which the Printor and Parsor instances use. We could overwrite them for other instances. The methods inClass, notInClass, inCategory and notInCategory are "regular predicates", i.e. predicates that have special regular expressions associated to them. The methods rule and ruleRec correspond to "nonterminal rules" in an extended Backus-Naur form grammar. The defaults for these methods completely ignores the rule names and just inlines the rules using id for non-recursive and fix for recursive rules. So, why have rule names if they're meaningless? A more full-featured parser type than Parsor would incorporate useful error messaging. For instance all parsec style parsers have a combinator <?> or label which can be used to over-write rule for a newtyped parsec Grammatical instance so that rule names show up when there is a parse failure.
Now, we can define a type Grammar which gives us a "final tagless encoding" of Backus-Naur grammars extended by regular expressions and "type-directed" by a Haskell type a.
type Grammar a = forall p. Grammatical p => p a a
genReadS :: Grammar a -> ReadS a
genReadS = runParsor
genShowS :: Alternative f => Grammar a -> a -> f ShowS
genShowS = runPrintorIn a lot of discussions about different options in the space of parsing tools, much is made of contrasting parser combinators and parser generators. However, this is a false dichotomy. Combinators are methods (or functions derived from methods) of type classes. Generators as we can see from genReadS and genShowS come from instances of those type classes. This makes the final tagless embedding approach extensible. To define a new generator, for instance for a parsec style parser, you just need to define a new type and instances for it up to Grammatical.
Even with only very low level combinators, we have almost enough to give a non-trivial example of a Grammar for an abstract syntax tree. The syntax we choose for the example is a form of regular expressions. This is an ideal example because it is (hopefully) familiar. It is prototypical as an "arithmetic expression algebra". It is complex enough to stress test Grammars. And it matches up with the embedded language which will let us define more generators.
data RegEx
= Terminal String -- abc123etc\.
| Sequence RegEx RegEx -- xy
| Fail -- \q
| Alternate RegEx RegEx -- x|y
| KleeneOpt RegEx -- x?
| KleeneStar RegEx -- x*
| KleenePlus RegEx -- x+
| AnyChar -- .
| InClass String -- [abc]
| NotInClass String -- [^abc]
| InCategory GeneralCategory -- \p{Lu}
| NotInCategory GeneralCategory -- \P{Ll}
| NonTerminal String -- \q{rule-name}
deriving stock (Eq, Ord, Show, Generic)Before we are able however to write our grammar, we will need a couple slightly higher level combinators. A very common feature of syntax is lists which are recognized using beginning, ending and separator delimiters.
data SepBy p = SepBy
{ beginBy :: p () ()
, endBy :: p () ()
, separateBy :: p () ()
}
sepBy :: Monoidal p => p () () -> SepBy p
sepBy = SepBy oneP oneP
noSep :: Monoidal p => SepBy p
noSep = sepBy oneP
chainl1
:: (Choice p, Cochoice p, Distributor p)
=> APartialIso a b (a,a) (b,b) -- ^ binary constructor pattern
-> SepBy p -> p a b -> p a b
chainl
:: (Alternator p, Filtrator p)
=> APartialIso a b (a,a) (b,b) -- ^ binary constructor pattern
-> APartialIso a b () () -- ^ nilary constructor pattern
-> SepBy p -> p a b -> p a bAssuming these combinators have definitions, we are now in a position to define our regular expression Grammar.
regexGrammar :: Grammar RegEx
regexGrammar = ruleRec "regex" altG
makePrisms ''RegEx
altG rex = rule "alternate" $
chainl1 _Alternate (sepBy "|") (seqG rex)
anyG = rule "any" $ _AnyChar >?< "."
atomG rex = rule "atom" $
nonterminalG
<|> failG
<|> classInG
<|> classNotInG
<|> categoryInG
<|> categoryNotInG
<|> _Terminal . _Cons >?< charG >*< pure ""
<|> anyG
<|> parenG rex
makePrisms ''GeneralCategory
categoryG = rule "category" $
_LowercaseLetter >?< "Ll"
<|> _UppercaseLetter >?< "Lu"
<|> _TitlecaseLetter >?< "Lt"
<|> _ModifierLetter >?< "Lm"
<|> _OtherLetter >?< "Lo"
<|> _NonSpacingMark >?< "Mn"
<|> _SpacingCombiningMark >?< "Mc"
<|> _EnclosingMark >?< "Me"
<|> _DecimalNumber >?< "Nd"
<|> _LetterNumber >?< "Nl"
<|> _OtherNumber >?< "No"
<|> _ConnectorPunctuation >?< "Pc"
<|> _DashPunctuation >?< "Pd"
<|> _OpenPunctuation >?< "Ps"
<|> _ClosePunctuation >?< "Pe"
<|> _InitialQuote >?< "Pi"
<|> _FinalQuote >?< "Pf"
<|> _OtherPunctuation >?< "Po"
<|> _MathSymbol >?< "Sm"
<|> _CurrencySymbol >?< "Sc"
<|> _ModifierSymbol >?< "Sk"
<|> _OtherSymbol >?< "So"
<|> _Space >?< "Zs"
<|> _LineSeparator >?< "Zl"
<|> _ParagraphSeparator >?< "Zp"
<|> _Control >?< "Cc"
<|> _Format >?< "Cf"
<|> _Surrogate >?< "Cs"
<|> _PrivateUse >?< "Co"
<|> _NotAssigned >?< "Cn"
categoryInG = rule "category-in" $
_InCategory >?< "\\p{" >* categoryG *< "}"
categoryNotInG = rule "category-not-in" $
_NotInCategory >?< "\\P{" >* categoryG *< "}"
charG = rule "char" $ charLiteralG <|> charEscapedG
charEscapedG = rule "char-escaped" $ "\\" >* inClass charsReserved
charLiteralG = rule "char-literal" $ notInClass charsReserved
charsReserved = "$()*+.?[\\]^{|}"
classInG = rule "class-in" $
_InClass >?< "[" >* manyP charG *< "]"
classNotInG = rule "class-not-in" $
_NotInClass >?< "[^" >* manyP charG *< "]"
exprG rex = rule "expression" $
terminalG
<|> kleeneOptG rex
<|> kleeneStarG rex
<|> kleenePlusG rex
<|> atomG rex
failG = rule "fail" $ _Fail >?< "\\q"
nonterminalG = rule "nonterminal" $
_NonTerminal >?< "\\q{" >* manyP charG *< "}"
parenG ex = rule "parenthesized" $ "(" >* ex *< ")"
kleeneOptG rex = rule "kleene-optional" $
_KleeneOpt >?< atomG rex *< "?"
kleeneStarG rex = rule "kleene-star" $
_KleeneStar >?< atomG rex *< "*"
kleenePlusG rex = rule "kleene-plus" $
_KleenePlus >?< atomG rex *< "+"
seqG rex = rule "sequence" $
chainl _Sequence (_Terminal . _Empty) noSep (exprG rex)
terminalG = rule "terminal" $ _Terminal >?< someP charGIt's not as beautiful as it could be but it works and its rough edges can be smoothed. Now, the RegExp type morally has the same "shape" as the Grammatical interface. So, to create new generators we define a couple invariant profunctors inspired by the blog post, and I leave their instances as exercises.
newtype DiRegEx a b = DiRegEx RegEx
data DiGrammar a b = DiGrammar
{ grammarStart :: DiRegEx a b
, grammarRules :: Set (String, RegEx)
}
genRegEx :: Grammar a -> RegEx
genRegEx (DiRegEx rex) = rex
genGrammar :: Grammar a -> [(String, RegEx)]
genGrammar (DiGrammar (DiRegEx start) rules) =
("start", start) : toList rulesPutting together the regular expression grammar regexGrammar with the grammar generator genGrammar and the printer generator genShowS, we can even print a self-defining Backus-Naur form grammar extended by regular expressions (regexBNF) -- of regular expressions.
>>> printGrammar regexGrammar
start = \q{regex}
alternate = \q{sequence}(\|\q{sequence})*
any = \.
atom = \q{nonterminal}|\q{fail}|\q{class-in}|\q{class-not-in}|\q{category-in}|\q{category-not-in}|\q{char}|\q{any}|\q{parenthesized}
category = Ll|Lu|Lt|Lm|Lo|Mn|Mc|Me|Nd|Nl|No|Pc|Pd|Ps|Pe|Pi|Pf|Po|Sm|Sc|Sk|So|Zs|Zl|Zp|Cc|Cf|Cs|Co|Cn
category-in = \\p\{\q{category}\}
category-not-in = \\P\{\q{category}\}
char = \q{char-literal}|\q{char-escaped}
char-escaped = \\[\$\(\)\*\+\.\?\[\\\]\^\{\|\}]
char-literal = [^\$\(\)\*\+\.\?\[\\\]\^\{\|\}]
class-in = \[\q{char}*\]
class-not-in = \[\^\q{char}*\]
expression = \q{terminal}|\q{kleene-optional}|\q{kleene-star}|\q{kleene-plus}|\q{atom}
fail = \\q
kleene-optional = \q{atom}\?
kleene-plus = \q{atom}\+
kleene-star = \q{atom}\*
nonterminal = \\q\{\q{char}*\}
parenthesized = \(\q{regex}\)
regex = \q{alternate}
sequence = \q{expression}*
terminal = \q{char}+
This post is not literate Haskell. Instead of trying to compile the post, play with this code on GitHub where it is hopefully kept compiling & tested. This is a result of a years long obsession. While I am pleased that my obsession has been slaked, there are still more paths to follow and I invite you to investigate these ideas further. Applicative & Alternative interfaces are useful for more than just parsers. What else are Monoidal, Distributor, Alternator & Filtrator useful for? Obviously we need much more than one non-trivial example of a Grammar. Want to make and push some more examples? Want to optimize or clean up or document the code some how? Can you construct a useful free Grammatical distributor, starting with a base of Identical Char Char? Can you use that free Grammatical distributor to define a function to left factor and do other useful transformations of grammars? What extensions of distributors can you do besides character stream grammars, like JSON grammars with aeson and typescript type generators? Want to investigate how grammars fit into the pattern optic hierarchy? Want to add some useful higher level grammar combinators? Want to factor exhaustive grammars or regular grammars out from grammar interfaces? Can Grammars generate parsers which aren't recursive descent like an Earley parser, by over-writing ruleRec perhaps? Maybe you'd like to pair program on one of these topics?