Attoparsec Tutorial Part 2 - Parse and Return Values

May 9, 2016

Now we want to parse something a little more complicated. Take a string input of a key value pair and return the value. The input will look like this “name:Mercutio”. The first rule is this string begins with a key name:.

{-# LANGUAGE OverloadedStrings #-}

import           Control.Applicative 
import           Data.Attoparsec.Text
import           Data.Char
import           Data.Text (Text)
import qualified Data.Text as T
import           Prelude hiding (takeWhile)
import           Test.Hspec
import           Test.Hspec.Attoparsec

parseNameKey :: Parser ()
parseNameKey = string "name:" >> return ()

You can try importing Data.Attoparsec.Text, declare parseNameKey and running parse parseNameKey "name:". It will return Done and an empty string. To parse the value following the key and :, we only need to collect text until the end of line or input. takeTill consumes input until it parses the parser on the right or it runs out of input.

parseNameValue :: Parser Text
parseNameValue = do
  n <- takeTill isEndOfLine
  endOfLine <|> endOfInput
  return n

Now we can combine the two parsers into one and get the value back.

parseNameKeyValuePair :: Parser Text
parseNameKeyValuePair = do

Now we will parse a different key value pair, a phone number phone:867-5309. Parsing the phone key is the same as parsing the name key,

parsePhoneKey :: Parser ()
parsePhoneKey = string "phone:" >> return ()

The rules for a phone number’s values are: it consists of digits 0-9 and the dash character -, a dash cannot be the first or the last character, and there cannot be two dashes in sequence. These are some legal phone numbers: 489-4608, 123456789, 0937-876-321. These are some illegal phone numbers: 485-32-, -123, 12–232323. We will start with simple parser that handles only numbers.

parseSimplePhoneValue :: Parser Text
parseSimplePhoneValue = do
  number <- takeWhile1 isDigit
  endOfLine <|> endOfInput
  return number

takeWhile1 takes a function which takes a Char and returns a Bool: (Char -> Bool). It consumes input as long as the function it took returns True. It must consume at least once or the parser will fail. isDigit matches 0-9. It is defined in Data.Char. endOfLine <|> endOfInput will succeed if the character is \n, \r or there is no more input. <|> means try the parser on the left first and if it fails, try the parser on the right. It is part of the Alternative type class. Now we will make it handle dashes appropriately.

getNums :: Parser Text
getNums = do
  num <- takeWhile1 isDigit
  mDash <- (Just <$> string "-") <|> pure Nothing
  case mDash of
    Nothing -> return num
    Just dash -> do
      return $ T.append num dash  

nextIsDigit :: Parser ()
nextIsDigit = do
  mNext <- peekChar
  case mNext of
    Nothing -> return () -- tried to parse the end of input
    Just next ->  
      if isDigit next
        then return ()
        else fail "did not find digit"

parsePhoneValue :: Parser Text
parsePhoneValue = do
  num <- T.concat <$> many1 getNums
  return num

parsePhoneValue runs many1 getNums to parse one or more series of numbers that can be optionally followd by a dash in the case that there is another digit after the dash. The T.concat <$> part will concat all of the number strings after parsing.

getNums collects a a series of one or more digits with takeWhile1 isDigit. takeWhile1 takes a (Char -> Bool) function and consumes input until the function returns False. (Just <$> string "-") <|> pure Nothing checks if there is a dash and wraps it in Just if it succeeds, otherwise, it returns Nothing. Nothing has to be returned with pure or return because parsers are monads. (<|>) is an infix operator that tries a parser on the left, if it succeeds then it moves on, if it fails, it tracks the text input to the point before the left hand parser ran and then it performs the action on the right. If there is no dash, then we can return the series of digits. Otherwise, we use nextIsDigit to peek at the next character without consuming it. If it is a digit than the parser will succeed and return the number series and dash. If it is not, then the getNums parser will fail.

nextIsDigit peeks at the next character with peekChar. If it encounters the end of input it return Nothing and the nextIsDigit parser fails, otherwise if the the value of isDigit next is True then the parser succeeds, otherwise it will fail.

Finally, we can make a parser that combines the phone key and value parser.

parsePhoneKeyValuePair :: Parser Text
parsePhoneKeyValuePair = do

We can also make a product type of the two values that we can parse now and create a parser that combines the parsers of those two values.

data Person = Person
  { name  :: Text
  , phone :: Text
  } deriving (Eq, Read, Show)

parsePersonData :: Parser Person
parsePersonData = do -- Person <$> parseNameKeyValuePair <*> parsePhoneKeyValuePair
  personName  <- parseNameKeyValuePair
  personPhone <- parsePhoneKeyValuePair
  return $ Person personName personPhone

Continuing with the idea of test driven development, here are the specs to show it works against a variety of simple tests.

spec :: Spec
spec = do
  describe "Parser" $ do
    it "should parse \"name:\"" $
      parseNameKey `shouldSucceedOn` ("name:" :: Text)
    it "should parse \"Mercutio\"" $
      parseNameValue `shouldSucceedOn` ("Mercutio" :: Text)
    it "should parse \"Mercutio\"" $
      ("Mercutio" :: Text) ~> parseNameValue `shouldParse` ("Mercutio" :: Text)
    it "should parse \"name:Mercutio\"" $
      ("name:Mercutio" :: Text) ~> parseNameKeyValuePair `shouldParse` ("Mercutio" :: Text)

    it "should parse \"name:Marcus Brutus\"" $
      ("name:Marcus Brutus" :: Text) ~> parseNameKeyValuePair `shouldParse` ("Marcus Brutus" :: Text)

    it "should parse \"name:Petruchio\"" $
      ("name:Petruchio" :: Text) ~> parseNameKeyValuePair `shouldParse` ("Petruchio" :: Text)
    it "should parse \"phone:867-5309\"" $
      ("phone:867-5309" :: Text) ~> parsePhoneKeyValuePair `shouldParse` ("867-5309" :: Text)

    it "should parse \"phone:867-1-5309\"" $
      ("phone:867-1-5309" :: Text) ~> parsePhoneKeyValuePair `shouldParse` ("867-1-5309" :: Text)

    it "should fail to parse \"phone:867--5309\"" $
      parsePhoneKeyValuePair `shouldFailOn` ("867--5309" :: Text)
    it "should fail to parse \"phone:-867-5309\"" $
      parsePhoneKeyValuePair `shouldFailOn` ("-867-5309" :: Text)

    it "should fail to parse \"phone:867-5309-\"" $
      parsePhoneKeyValuePair `shouldFailOn` ("867-5309-" :: Text)

main :: IO ()
main = hspec spec