UP | HOME

parser combinator

From Antoine Leblanc's blog post

1. background

A parser of flavor a has a function runParser that takes a string as input, and either parses a prefix of the string, in which case it returns:

  • the parsed object, which is of type a

or it fails to parse any prefix of the string, in which case it returns

  • an error

runParser also returns the remainder of the string yet to be parsed.

This is captured by the following type:

data Parser a = Parser{
runParser :: String -> (String, Either ParseError a)
}

2. combining parsers

Suppose we have two different parsers for different patterns. A common task that we will often need to do is parse a string using the first parser, and then parse the remainder of the string using the second parser. Along the way, if we fail, we return an error.

This pattern can be abstracted into the following function:

andThen :: Parser a -> (a -> Parser b) -> Parser b
andThen p f = Parser $ \input -> 
                         case runParser p input of
                           (remainder, Left error) -> (remainder, error)
                           (remainder, Right obj) -> runParser (f a) remainder 

A couple things to note:

  • remember that the Parser constructor takes as input a runParser function
  • why should andThen take a function that produces a parserB instead of a parserB itself? Now we can produce special parsers that depend on what the parserA produced.

So, we can compose parsers like this:

parseKeyVal :: Parser (String, JValue)
parseKeyVal = keyParser `andThen` \key ->
              charParser ":" `andThen` \_ ->
              valueParser `andThen` \value ->
              constParser (key, value)

where constParser just returns its given value no matter the input.

It turns out that we can use the andThen function as the >>= function of the Monad type-class!

So then the previous block can be written using do notation:

parseKeyVal = do key <- keyParser
                 _   <- charParser ':'
                 val <- valueParser
                 return (key, val)

Some things to note:

  • the thing that the do block produces is a parser.
  • implicitly on each line: the string input (or what remains of it) is being fed to a certain parser

3. other useful links

4. things to be careful of

  • the difference between endBy and endBy1
    • sepBy vs sepBy1
    • many vs many1
    • the difference is matching 0 or more vs 1 or more

Created: 2024-07-15 Mon 01:27