parser combinator
From Antoine Leblanc's blog post
1. background
A parser of flavor a has a function runParser that takes a string as input, and either parses a prefix of the string, in which case it returns:
- the parsed object, which is of type
a
or it fails to parse any prefix of the string, in which case it returns
- an error
runParser also returns the remainder of the string yet to be parsed.
This is captured by the following type:
data Parser a = Parser{
runParser :: String -> (String, Either ParseError a)
}
2. combining parsers
Suppose we have two different parsers for different patterns. A common task that we will often need to do is parse a string using the first parser, and then parse the remainder of the string using the second parser. Along the way, if we fail, we return an error.
This pattern can be abstracted into the following function:
andThen :: Parser a -> (a -> Parser b) -> Parser b
andThen p f = Parser $ \input ->
case runParser p input of
(remainder, Left error) -> (remainder, error)
(remainder, Right obj) -> runParser (f a) remainder
A couple things to note:
- remember that the Parser constructor takes as input a
runParserfunction - why should
andThentake a function that produces aparserBinstead of aparserBitself? Now we can produce special parsers that depend on what theparserAproduced.
So, we can compose parsers like this:
parseKeyVal :: Parser (String, JValue)
parseKeyVal = keyParser `andThen` \key ->
charParser ":" `andThen` \_ ->
valueParser `andThen` \value ->
constParser (key, value)
where constParser just returns its given value no matter the input.
It turns out that we can use the andThen function as the >>= function of the Monad type-class!
So then the previous block can be written using do notation:
parseKeyVal = do key <- keyParser
_ <- charParser ':'
val <- valueParser
return (key, val)
Some things to note:
- the thing that the
doblock produces is a parser. - implicitly on each line: the string input (or what remains of it) is being fed to a certain parser
3. other useful links
4. things to be careful of
- the difference between
endByandendBy1sepByvssepBy1manyvsmany1- the difference is matching 0 or more vs 1 or more