Attoparsec Tutorial Part 2 - Parse and Return Values

2016-05-09
haskellattoparsec

Now we want to parse something a little more complicated. Take a string input of a key value pair and return the value. The input will look like this “name:Mercutio”. The first rule is this string begins with a key name:.

You can try importing Data.Attoparsec.Text, declare parseNameKey and running parse parseNameKey "name:". It will return Done and an empty string. To parse the value following the key and :, we only need to collect text until the end of line or input. takeTill consumes input until it parses the parser on the right or it runs out of input.

Now we can combine the two parsers into one and get the value back.

Now we will parse a different key value pair, a phone number phone:867-5309. Parsing the phone key is the same as parsing the name key,

The rules for a phone number’s values are: it consists of digits 0-9 and the dash character -, a dash cannot be the first or the last character, and there cannot be two dashes in sequence. These are some legal phone numbers: 489-4608, 123456789, 0937-876-321. These are some illegal phone numbers: 485-32-, -123, 12–232323. We will start with simple parser that handles only numbers.

takeWhile1 takes a function which takes a Char and returns a Bool: (Char -> Bool). It consumes input as long as the function it took returns True. It must consume at least once or the parser will fail. isDigit matches 0-9. It is defined in Data.Char. endOfLine <|> endOfInput will succeed if the character is \n, \r or there is no more input. <|> means try the parser on the left first and if it fails, try the parser on the right. It is part of the Alternative type class. Now we will make it handle dashes appropriately.

parsePhoneValue runs many1 getNums to parse one or more series of numbers that can be optionally followd by a dash in the case that there is another digit after the dash. The T.concat <$> part will concat all of the number strings after parsing.

getNums collects a a series of one or more digits with takeWhile1 isDigit. takeWhile1 takes a (Char -> Bool) function and consumes input until the function returns False. (Just <$> string "-") <|> pure Nothing checks if there is a dash and wraps it in Just if it succeeds, otherwise, it returns Nothing. Nothing has to be returned with pure or return because parsers are monads. (<|>) is an infix operator that tries a parser on the left, if it succeeds then it moves on, if it fails, it tracks the text input to the point before the left hand parser ran and then it performs the action on the right. If there is no dash, then we can return the series of digits. Otherwise, we use nextIsDigit to peek at the next character without consuming it. If it is a digit than the parser will succeed and return the number series and dash. If it is not, then the getNums parser will fail.

nextIsDigit peeks at the next character with peekChar. If it encounters the end of input it return Nothing and the nextIsDigit parser fails, otherwise if the the value of isDigit next is True then the parser succeeds, otherwise it will fail.

Finally, we can make a parser that combines the phone key and value parser.

We can also make a product type of the two values that we can parse now and create a parser that combines the parsers of those two values.

Continuing with the idea of test driven development, here are the specs to show it works against a variety of simple tests.