Attoparsec Tutorial Part 1 - Parser Combinators and Test Driven Parsing
Attoparsec is a
Haskell parser combinator library for parsing ByteString
and Text. We can use it to make small parsers and combine
these small parsers to create more complex ones. The advantage of this
style is that we can individually test each parser, making it easier to
debug and build, as well as make the individual pieces reusable.
We are going to focus on parsing Text from
Data.Text.
{-# LANGUAGE OverloadedStrings #-}
import Data.Attoparsec.Text
import Data.Text (Text)
import qualified Data.Text as T
import Test.Hspec
import Test.Hspec.Attoparsec
parseHelloWorld :: Parser ()
parseHelloWorld = do
_ <- string "Hello World!"
return ()Though you cannot tell by looking at it, Parser () takes
a Text and returns (). Parser is
a type synonym for a more complex type that takes a type i,
manages some state, and returns a parsed value of your choice. Running a
parser will return Either Text a, where Text
is an error message and a is whatever we decide to return.
In this case it is ().
string takes a Text and returns it in a
Parser, but we are ignoring the value returned from
string because our parser returns (). In a
later lesson we will explore the Parser type declaration. It is worth
taking a look at the definition in the source
code.
Now we will make a test to prove to ourselves that the parser works.
spec1 :: Spec
spec1 = do
describe "parseHelloWorld" $ do
it "should parse the phrase 'Hello World!'" $
parseHelloWorld `shouldSucceedOn` ("Hello World!" :: Text)shouldSucceedOn runs a parser from the left hand side
with the input on the right hand side and allows the test to pass if the
parser returns Right. Do you want to see what a failed test
looks like? Change the right hand side of shouldSucceedOn
to "Hello World" and you will see a
"not enough input" error. The string function
could not find the last character !. Letβs slightly change
the parseHelloWorld function so we can show off some more
functionallity.
parseHelloWorld2 :: Parser Text
parseHelloWorld2 = do
result <- string "Hello World!"
return resultNow the parse will return the result from the string
function. We can verify this with another test.
spec2 :: Spec
spec2 = do
describe "parseHelloWorld2" $ do
it "should parse the phrase 'Hello World!' and return 'Hello World!'" $
("Hello World!" :: Text) ~> parseHelloWorld2 `shouldParse` ("Hello World!" :: Text)The (~>) function feeds a Text value to
be parsed by parseHelloWorld2 from the left hand side and
shouldParse says not only should the parser succeed, but
its result should match the right hand side. There are two ways this can
fail. parseHelloWorld2 could fail to parse the left hand
side, or it could succeed in parsing it but it does not return the
expectation on the right hand side. Try deleting something from the left
or right hand side and see what happens when you run the test.
Sometimes we want to be able to prove that a parser will fail, just
use shouldFailOn.
it "should fail to parse any other string" $ do
parseHelloWorld `shouldFailOn` ("Goodnight Everyone!" :: Text)For failed cases we do not need to inspect the returned result
because we just get an error message, the left hand side from
Either Text a that the parser returns. Finally, here is
what a parser combinator looks like. We will split up the original
function and glue it together in another one.
parseHello :: Parser Text
parseHello = string "Hello"
parseWorld :: Parser Text
parseWorld = string " World!"
parseHelloWorld3 :: Parser Text
parseHelloWorld3 = do
hello <- parseHello
world <- parseWorld
return $ T.concat [hello,world]Add a new test for it.
spec3 :: Spec
spec3 = do
describe "parseHelloWorld3" $ do
it "should parse 'Hello World!'" $
("Hello World!" :: Text) ~> parseHelloWorld3 `shouldParse` ("Hello World!" :: Text)
main :: IO ()
main = do
hspec spec1
hspec spec2
hspec spec3You can run this file with the following command:
stack --resolver lts-8.17 runghc 2016-05-09-attoparsec-tutorial-1.lhs.