- seriesOrder: 2
- 1.
map
— transforming the contents of a parser - 2.
apply
andreturn
— lifting functions to the world of Parsers - 3.
sequence
— transforming a list of Parsers into a single Parser - 4.
many
andmany1
— matching a parser multiple times - 5.
opt
— matching a parser zero or one time - 6. Throwing results away
- 7. Parsing lists with separators
- What about
bind
? - Review
- Listing of the parser library so far
- Summary
layout: post
title: “Building a useful set of parser combinators”
description: “15 or so combinators that can be combined to parse almost anything”
categories: [Combinators,Patterns]
seriesId: “Understanding Parser Combinators”
seriesOrder: 2
UPDATE: Slides and video from my talk on this topic
In this series, we are looking at how applicative parsers and parser combinators work.
- In the first post, we created the foundations of a parsing library.
- In this post, we’ll build out the library with many other useful combinators.
The combinator names will be copied from those used by FParsec, so that you can easily migrate to it.
1. map
— transforming the contents of a parser
When parsing, we often we want to match a particular string, such as a reserved word like “if” or “where”. A string is just a sequence of characters,
so surely we could use the same technique that we used to define anyOf
in the first post, but using andThen
instead of orElse
?
Here’s a (failed) attempt to create a pstring
parser using that approach:
let pstring str =
str
|> Seq.map pchar // convert into parsers
|> Seq.reduce andThen
This doesn’t work, because the output of andThen
is different from the input (a tuple, not a char) and so the reduce
approach fails.
In order to solve this, we’ll need to use a different technique.
To get started, let’s try just matching a string of a specific length.
Say, for example, that we want to match a three digits in a row. Well, we can do that using andThen
:
let parseDigit =
anyOf ['0'..'9']
let parseThreeDigits =
parseDigit .>>. parseDigit .>>. parseDigit
If we run it like this:
run parseThreeDigits "123A"
then we get the result:
Success ((('1', '2'), '3'), "A")
It does work, but the result contains a tuple inside a tuple (('1', '2'), '3')
which is fugly and hard to use.
It would be so much more convenient to just have a simple string ("123"
).
But in order to turn ('1', '2'), '3')
into "123"
, we’ll need a function that can reach inside of the parser and transform the result using an arbitrary passed in function.
Of course, what we need is the functional programmer’s best friend, map
.
To understand map
and similar functions, I like to think of there being two worlds: a “Normal World”, where regular things live, and “Parser World”, where Parser
s live.
You can think of Parser World as a sort of “mirror” of Normal World because it obeys the following rules:
- Every type in Normal World (say
char
) has a corresponding type in Parser World (Parser<char>
).
And:
- Every value in Normal World (say
"ABC"
) has a corresponding value in Parser World (that is, someParser<string>
that returns"ABC"
).
And:
- Every function in Normal World (say
char -> string
) has a corresponding function in Parser World (Parser<char> -> Parser<string>
).
Using this metaphor then, map
transforms (or “lifts”) a function in Normal World into a function in Parser World.
And by the way, if you like this metaphor, I have a whole series of posts that develop it further.
So that’s what map
does; how do we implement it?
The logic is:
- Inside the
innerFn
, run the parser to get the result. - If the result was a success, apply the specified function to the success value to get a new, transformed value, and…
- …return the new, mapped, value instead of the original value.
Here’s the code (I’ve named the map function mapP
to avoid confusion with other map functions):
let mapP f parser =
let innerFn input =
// run parser with the input
let result = run parser input
// test the result for Failure/Success
match result with
| Success (value,remaining) ->
// if success, return the value transformed by f
let newValue = f value
Success (newValue, remaining)
| Failure err ->
// if failed, return the error
Failure err
// return the inner function
Parser innerFn
If we look at the signature of mapP
:
val mapP :
f:('a -> 'b) -> Parser<'a> -> Parser<'b>
we can see that it has exactly the signature we want, transforming a function 'a -> 'b
into a function Parser<'a> -> Parser<'b>
.
It’s common to define an infix version of map
as well:
let ( <!> ) = mapP
And in the context of parsing, we’ll often want to put the mapping function after the parser, with the parameters flipped.
This makes using map
with the pipeline idiom much more convenient:
let ( |>> ) x f = mapP f x
Parsing three digits with mapP
With mapP
available, we can revisit parseThreeDigits
and turn the tuple into a string.
Here’s the code:
let parseDigit = anyOf ['0'..'9']
let parseThreeDigitsAsStr =
// create a parser that returns a tuple
let tupleParser =
parseDigit .>>. parseDigit .>>. parseDigit
// create a function that turns the tuple into a string
let transformTuple ((c1, c2), c3) =
String [| c1; c2; c3 |]
// use "map" to combine them
mapP transformTuple tupleParser
Or, if you prefer a more compact implementation:
let parseThreeDigitsAsStr =
(parseDigit .>>. parseDigit .>>. parseDigit)
|>> fun ((c1, c2), c3) -> String [| c1; c2; c3 |]
And if we test it, we get a string in the result now, rather than a tuple:
run parseThreeDigitsAsStr "123A" // Success ("123", "A")
We can go further, and map the string into an int:
let parseThreeDigitsAsInt =
mapP int parseThreeDigitsAsStr
If we test this, we get an int
in the Success branch.
run parseThreeDigitsAsInt "123A" // Success (123, "A")
Let’s check the type of parseThreeDigitsAsInt
:
val parseThreeDigitsAsInt : Parser<int>
It’s a Parser<int>
now, not a Parser<char>
or Parser<string>
.
The fact that a Parser
can contain any type, not just a char or string, is a key feature that will be very valuable when we need to build more complex parsers.
2. apply
and return
— lifting functions to the world of Parsers
To achieve our goal of creating a parser that matches a list of characters, we need two more helper functions which I will call returnP
and applyP
.
returnP
simply transforms a normal value into a value in Parser WorldapplyP
transforms a Parser containing a function (Parser< 'a->'b >
) into a function in Parser World (Parser<'a> -> Parser<'b >
)
Here’s a diagram of returnP
:
And here is the implementation of returnP
:
let returnP x =
let innerFn input =
// ignore the input and return x
Success (x,input )
// return the inner function
Parser innerFn
The signature of returnP
is just as we want:
val returnP :
'a -> Parser<'a>
Now here’s a diagram of applyP
:
And here is the implementation of applyP
, which uses .>>.
and map
:
let applyP fP xP =
// create a Parser containing a pair (f,x)
(fP .>>. xP)
// map the pair by applying f to x
|> mapP (fun (f,x) -> f x)
The infix version of applyP
is written as <*>
:
let ( <*> ) = applyP
Again, the signature of applyP
is just as we want:
val applyP :
Parser<('a -> 'b)> -> Parser<'a> -> Parser<'b>
Why do we need these two functions? Well, map
will lift functions in Normal World into functions in Parser World, but only for one-parameter functions.
What’s great about returnP
and applyP
is that, together, they can lift any function in Normal World into a function in Parser World, no matter how many parameters it has.
For example, we now can define a lift2
function that will lift a two parameter function into Parser World like this:
// lift a two parameter function to Parser World
let lift2 f xP yP =
returnP f <*> xP <*> yP
The signature of lift2
is:
val lift2 :
f:('a -> 'b -> 'c) -> Parser<'a> -> Parser<'b> -> Parser<'c>
Here’s a diagram of lift2
:
If you want to know more about how this works, check out my “man page” post on lift2
or my explanation that involves the “Monadster”.
Let’s see some examples of using lift2
in practice. First, lifting integer addition to addition of Parsers:
let addP =
lift2 (+)
The signature is:
val addP :
Parser<int> -> Parser<int> -> Parser<int>
which shows that addP
does indeed take two Parser<int>
parameters and returns another Parser<int>
.
And here’s the startsWith
function being lifted to Parser World:
let startsWith (str:string) prefix =
str.StartsWith(prefix)
let startsWithP =
lift2 startsWith
Again, the signature of startsWithP
is parallel to the signature of startsWith
, but lifted to the world of Parsers.
val startsWith :
str:string -> prefix:string -> bool
val startsWithP :
Parser<string> -> Parser<string> -> Parser<bool>
3. sequence
— transforming a list of Parsers into a single Parser
We now have the tools we need to implement our sequencing combinator! The logic will be:
- Start with the list “cons” operator. This is the two-parameter function that prepends a “head” element onto a “tail” of elements to make a new list.
- Lift
cons
into the world of Parsers usinglift2
. - We now have a a function that prepends a head
Parser
to a tail list ofParser
s to make a new list ofParser
s, where:- The head Parser is the first element in the list of parsers that has been passed in.
- The tail is generated by calling the same function recursively with the next parser in the list.
- When the input list is empty, just return a
Parser
containing an empty list.
Here’s the implementation:
let rec sequence parserList =
// define the "cons" function, which is a two parameter function
let cons head tail = head::tail
// lift it to Parser World
let consP = lift2 cons
// process the list of parsers recursively
match parserList with
| [] ->
returnP []
| head::tail ->
consP head (sequence tail)
The signature of sequence
is:
val sequence :
Parser<'a> list -> Parser<'a list>
which shows that the input is a list of Parser
s and the output is a Parser
containing a list of elements.
Let’s test it by creating a list of three parsers, and then combining them into one:
let parsers = [ pchar 'A'; pchar 'B'; pchar 'C' ]
let combined = sequence parsers
run combined "ABCD"
// Success (['A'; 'B'; 'C'], "D")
As you can see, when we run it we get back a list of characters, one for each parser in the original list.
Implementing the pstring
parser
At last, we can implement the parser that matches a string, which we’ll call pstring
.
The logic is:
- Convert the string into a list of characters.
- Convert each character into a
Parser<char>
. - Use
sequence
to convert the list ofParser<char>
into a singleParser<char list>
. - And finally, use
map
to convert theParser<char list>
into aParser<string>
.
Here’s the code:
/// Helper to create a string from a list of chars
let charListToStr charList =
String(List.toArray charList)
// match a specific string
let pstring str =
str
// convert to list of char
|> List.ofSeq
// map each char to a pchar
|> List.map pchar
// convert to Parser<char list>
|> sequence
// convert Parser<char list> to Parser<string>
|> mapP charListToStr
Let’s test it:
let parseABC = pstring "ABC"
run parseABC "ABCDE" // Success ("ABC", "DE")
run parseABC "A|CDE" // Failure "Expecting 'B'. Got '|'"
run parseABC "AB|DE" // Failure "Expecting 'C'. Got '|'"
It works as expected. Phew!
4. many
and many1
— matching a parser multiple times
Another common need is to match a particular parser as many times as you can. For example:
- When matching an integer, you want to match as many digit characters as you can.
- When matching a run of whitespace, you want to match as many whitespace characters as you can.
There are slightly different requirements for these two cases.
- When matching whitespace, it is often optional, so we want a “zero or more” matcher, which we’ll call
many
. - On the other hand, when matching digits for an integer, you want to match at least one digit, so we want a “one or more” matcher, which we’ll call
many1
.
Before creating these, we’ll define a helper function which matches a parser zero or more times. The logic is:
- Run the parser.
- If the parser returns
Failure
(and this is key) just return an empty list. That is, this function can never fail! - If the parser succeeds:
- Call the function recursively to get the remaining values (which could also be an empty list).
- Then combine the first value and the remaining values.
Here’s the code:
let rec parseZeroOrMore parser input =
// run parser with the input
let firstResult = run parser input
// test the result for Failure/Success
match firstResult with
| Failure err ->
// if parse fails, return empty list
([],input)
| Success (firstValue,inputAfterFirstParse) ->
// if parse succeeds, call recursively
// to get the subsequent values
let (subsequentValues,remainingInput) =
parseZeroOrMore parser inputAfterFirstParse
let values = firstValue::subsequentValues
(values,remainingInput)
With this helper function, we can easily define many
now — it’s just a wrapper over parseZeroOrMore
:
/// match zero or more occurences of the specified parser
let many parser =
let rec innerFn input =
// parse the input -- wrap in Success as it always succeeds
Success (parseZeroOrMore parser input)
Parser innerFn
The signature of many
shows that the output is indeed a list of values wrapped in a Parser
:
val many :
Parser<'a> -> Parser<'a list>
Now let’s test many
:
let manyA = many (pchar 'A')
// test some success cases
run manyA "ABCD" // Success (['A'], "BCD")
run manyA "AACD" // Success (['A'; 'A'], "CD")
run manyA "AAAD" // Success (['A'; 'A'; 'A'], "D")
// test a case with no matches
run manyA "|BCD" // Success ([], "|BCD")
Note that in the last case, even when there is nothing to match, the function succeeds.
There’s nothing about many
that restricts its use to single characters. For example, we can use it to match repetitive string sequences too:
let manyAB = many (pstring "AB")
run manyAB "ABCD" // Success (["AB"], "CD")
run manyAB "ABABCD" // Success (["AB"; "AB"], "CD")
run manyAB "ZCD" // Success ([], "ZCD")
run manyAB "AZCD" // Success ([], "AZCD")
Finally, let’s implement the original example of matching whitespace:
let whitespaceChar = anyOf [' '; '\t'; '\n']
let whitespace = many whitespaceChar
run whitespace "ABC" // Success ([], "ABC")
run whitespace " ABC" // Success ([' '], "ABC")
run whitespace "\tABC" // Success (['\t'], "ABC")
Defining many1
We can also define the “one or more” combinator many1
, using the following logic:
- Run the parser.
- If it fails, return the failure.
- If it succeeds:
- Call the helper function
parseZeroOrMore
to get the remaining values. - Then combine the first value and the remaining values.
- Call the helper function
/// match one or more occurences of the specified parser
let many1 parser =
let rec innerFn input =
// run parser with the input
let firstResult = run parser input
// test the result for Failure/Success
match firstResult with
| Failure err ->
Failure err // failed
| Success (firstValue,inputAfterFirstParse) ->
// if first found, look for zeroOrMore now
let (subsequentValues,remainingInput) =
parseZeroOrMore parser inputAfterFirstParse
let values = firstValue::subsequentValues
Success (values,remainingInput)
Parser innerFn
Again, the signature of many1
shows that the output is indeed a list of values wrapped in a Parser
:
val many1 :
Parser<'a> -> Parser<'a list>
Now let’s test many1
:
// define parser for one digit
let digit = anyOf ['0'..'9']
// define parser for one or more digits
let digits = many1 digit
run digits "1ABC" // Success (['1'], "ABC")
run digits "12BC" // Success (['1'; '2'], "BC")
run digits "123C" // Success (['1'; '2'; '3'], "C")
run digits "1234" // Success (['1'; '2'; '3'; '4'], "")
run digits "ABC" // Failure "Expecting '9'. Got 'A'"
As we saw in an earlier example, the last case gives a misleading error. It says “Expecting ‘9’” when it really should say “Expecting a digit”.
In the next post we’ll fix this.
Parsing an integer
Using many1
, we can create a parser for an integer. The implementation logic is:
- Create a parser for a digit.
- Use
many1
to get a list of digits. - Using
map
, transform the result (a list of digits) into a string and then into an int.
Here’s the code:
let pint =
// helper
let resultToInt digitList =
// ignore int overflow for now
String(List.toArray digitList) |> int
// define parser for one digit
let digit = anyOf ['0'..'9']
// define parser for one or more digits
let digits = many1 digit
// map the digits to an int
digits
|> mapP resultToInt
And let’s test it:
run pint "1ABC" // Success (1, "ABC")
run pint "12BC" // Success (12, "BC")
run pint "123C" // Success (123, "C")
run pint "1234" // Success (1234, "")
run pint "ABC" // Failure "Expecting '9'. Got 'A'"
5. opt
— matching a parser zero or one time
Sometimes we only want to match a parser zero or one time. For example, the pint
parser above does not handle negative values.
To correct this, we need to be able to handle an optional minus sign.
We can define an opt
combinator easily:
- Change the result of a specified parser to an option by mapping the result to
Some
. - Create another parser that always returns
None
. - Use
<|>
to choose the second (“None”) parser if the first fails.
Here’s the code:
let opt p =
let some = p |>> Some
let none = returnP None
some <|> none
Here’s an example of it in use — we match a digit followed by an optional semicolon:
let digit = anyOf ['0'..'9']
let digitThenSemicolon = digit .>>. opt (pchar ';')
run digitThenSemicolon "1;" // Success (('1', Some ';'), "")
run digitThenSemicolon "1" // Success (('1', None), "")
And here is pint
rewritten to handle an optional minus sign:
let pint =
// helper
let resultToInt (sign,charList) =
let i = String(List.toArray charList) |> int
match sign with
| Some ch -> -i // negate the int
| None -> i
// define parser for one digit
let digit = anyOf ['0'..'9']
// define parser for one or more digits
let digits = many1 digit
// parse and convert
opt (pchar '-') .>>. digits
|>> resultToInt
Note that the resultToInt
helper function now needs to handle the sign option as well as the list of digits.
And here it is in action:
run pint "123C" // Success (123, "C")
run pint "-123C" // Success (-123, "C")
6. Throwing results away
We often want to match something in the input, but we don’t care about the parsed value itself. For example:
- For a quoted string, we need to parse the quotes, but we don’t need the quotes themselves.
- For a statement ending in a semicolon, we need to ensure the semicolon is there, but we don’t need the semicolon itself.
- For whitespace separators, we need to ensure the whitespace is there, but we don’t need the actual whitespace data.
To handle these requirements, we will define some new combinators that throw away the results of a parser:
p1 >>. p2
will applyp1
andp2
in sequence, just like.>>.
, but throw away the result ofp1
and keep the result ofp2
.p1 .>> p2
will applyp1
andp2
in sequence, just like.>>.
, but keep the result ofp1
and throw away the result ofp2
.
These are easy to define — just map over the result of .>>.
, which is a tuple, and keep only one element of the pair.
/// Keep only the result of the left side parser
let (.>>) p1 p2 =
// create a pair
p1 .>>. p2
// then only keep the first value
|> mapP (fun (a,b) -> a)
/// Keep only the result of the right side parser
let (>>.) p1 p2 =
// create a pair
p1 .>>. p2
// then only keep the second value
|> mapP (fun (a,b) -> b)
These combinators allow us to simplify the digitThenSemicolon
example shown earlier:
let digit = anyOf ['0'..'9']
// use .>> below
let digitThenSemicolon = digit .>> opt (pchar ';')
run digitThenSemicolon "1;" // Success ('1', "")
run digitThenSemicolon "1" // Success ('1', "")
You can see that the result now is the same, whether or not the semicolon was present.
How about an example with whitespace?
The following code creates a parser that looks for “AB” followed by one or more whitespace chars, followed by “CD”.
let whitespaceChar = anyOf [' '; '\t'; '\n']
let whitespace = many1 whitespaceChar
let ab = pstring "AB"
let cd = pstring "CD"
let ab_cd = (ab .>> whitespace) .>>. cd
run ab_cd "AB \t\nCD" // Success (("AB", "CD"), "")
The result contains “AB” and “CD” only. The whitespace between them has been discarded.
Introducing between
A particularly common requirement is to look for a parser between delimiters such as quotes or brackets.
Creating a combinator for this is trivial:
/// Keep only the result of the middle parser
let between p1 p2 p3 =
p1 >>. p2 .>> p3
And here it is in use, to parse a quoted integer:
let pdoublequote = pchar '"'
let quotedInteger = between pdoublequote pint pdoublequote
run quotedInteger "\"1234\"" // Success (1234, "")
run quotedInteger "1234" // Failure "Expecting '"'. Got '1'"
7. Parsing lists with separators
Another common requirement is parsing lists, seperated by something like commas or whitespace.
To implement a “one or more” list, we need to:
- First combine the separator and parser into one combined parser, but using
>>.
to throw away the separator value. - Next, look for a list of the separator/parser combo using
many
. - Then prefix that with the first parser and combine the results.
Here’s the code:
/// Parses one or more occurrences of p separated by sep
let sepBy1 p sep =
let sepThenP = sep >>. p
p .>>. many sepThenP
|>> fun (p,pList) -> p::pList
For the “zero or more” version, we can choose the empty list as an alternate if sepBy1
does not find any matches:
/// Parses zero or more occurrences of p separated by sep
let sepBy p sep =
sepBy1 p sep <|> returnP []
Here’s some tests for sepBy1
and sepBy
, with results shown in the comments:
let comma = pchar ','
let digit = anyOf ['0'..'9']
let zeroOrMoreDigitList = sepBy digit comma
let oneOrMoreDigitList = sepBy1 digit comma
run oneOrMoreDigitList "1;" // Success (['1'], ";")
run oneOrMoreDigitList "1,2;" // Success (['1'; '2'], ";")
run oneOrMoreDigitList "1,2,3;" // Success (['1'; '2'; '3'], ";")
run oneOrMoreDigitList "Z;" // Failure "Expecting '9'. Got 'Z'"
run zeroOrMoreDigitList "1;" // Success (['1'], ";")
run zeroOrMoreDigitList "1,2;" // Success (['1'; '2'], ";")
run zeroOrMoreDigitList "1,2,3;" // Success (['1'; '2'; '3'], ";")
run zeroOrMoreDigitList "Z;" // Success ([], "Z;")
What about bind
?
One combinator that we haven’t implemented so far is bind
(or >>=
).
If you know anything about functional programming, or have seen my talk on FP patterns, you’ll know that bind
is a powerful tool that can be used to implement many functions.
Up to this point, I thought that it would be better to show implementations for combinators such as map
and .>>.
that were explicit and thus, hopefully, easier to understand.
But now that we have have some experience, let’s implement bind
and see what we can do with it.
Here’s the implementation of bindP
(as I’ll call it)
/// "bindP" takes a parser-producing function f, and a parser p
/// and passes the output of p into f, to create a new parser
let bindP f p =
let innerFn input =
let result1 = run p input
match result1 with
| Failure err ->
// return error from parser1
Failure err
| Success (value1,remainingInput) ->
// apply f to get a new parser
let p2 = f value1
// run parser with remaining input
run p2 remainingInput
Parser innerFn
The signature of bindP
is:
val bindP :
f:('a -> Parser<'b>) -> Parser<'a> -> Parser<'b>
which conforms to a standard bind signature. The input f
is a “diagonal” function ('a -> Parser<'b>
) and the output is a “horizontal” function (Parser<'a> -> Parser<'b>
).
See this post for more details on how bind
works.
The infix version of bind
is >>=
. Note that the parameters are flipped: f
is now the second parameter which makes it more convenient for F#’s pipeline idiom.
let ( >>= ) p f = bindP f p
Reimplementing other combinators with bindP
and returnP
The combination of bindP
and returnP
can be used to re-implement many of the other combinators. Here are some examples:
let mapP f =
bindP (f >> returnP)
let andThen p1 p2 =
p1 >>= (fun p1Result ->
p2 >>= (fun p2Result ->
returnP (p1Result,p2Result) ))
let applyP fP xP =
fP >>= (fun f ->
xP >>= (fun x ->
returnP (f x) ))
// (assuming "many" is defined)
let many1 p =
p >>= (fun head ->
many p >>= (fun tail ->
returnP (head::tail) ))
Note that the combinators that check the Failure
path can not be implemented using bind
. These include orElse
and many
.
Review
We could keep building combinators for ever, but I think we have everything we need to build a JSON parser now, so let’s stop and review what we have done.
In the previous post we created these combinators:
.>>.
(andThen
) applies the two parsers in sequence and returns the results in a tuple.<|>
(orElse
) applies the first parser, and if that fails, the second parsers.choice
extendsorElse
to choose from a list of parsers.
And in this post we created the following additional combinators:
bindP
chains the result of a parser to another parser-producing function.mapP
transforms the result of a parser.returnP
lifts an normal value into the world of parsers.applyP
allows us to lift multi-parameter functions into functions that work on Parsers.lift2
usesapplyP
to lift two-parameter functions into Parser World.sequence
converts a list of Parsers into a Parser containing a list.many
matches zero or more occurences of the specified parser.many1
matches one or more occurences of the specified parser.opt
matches an optional occurrence of the specified parser..>>
keeps only the result of the left side parser.>>.
keeps only the result of the right side parser.between
keeps only the result of the middle parser.sepBy
parses zero or more occurrences of a parser with a separator.sepBy1
parses one or more occurrences of a parser with a separator.
I hope you can see why the concept of “combinators” is so powerful; given just a few basic functions, we have built up a library of useful functions quickly and concisely.
Listing of the parser library so far
Here’s the complete listing for the parsing library so far — it’s about 200 lines of code now!
The source code displayed below is also available at this gist.
open System
/// Type that represents Success/Failure in parsing
type Result<'a> =
| Success of 'a
| Failure of string
/// Type that wraps a parsing function
type Parser<'T> = Parser of (string -> Result<'T * string>)
/// Parse a single character
let pchar charToMatch =
// define a nested inner function
let innerFn str =
if String.IsNullOrEmpty(str) then
Failure "No more input"
else
let first = str.[0]
if first = charToMatch then
let remaining = str.[1..]
Success (charToMatch,remaining)
else
let msg = sprintf "Expecting '%c'. Got '%c'" charToMatch first
Failure msg
// return the "wrapped" inner function
Parser innerFn
/// Run a parser with some input
let run parser input =
// unwrap parser to get inner function
let (Parser innerFn) = parser
// call inner function with input
innerFn input
/// "bindP" takes a parser-producing function f, and a parser p
/// and passes the output of p into f, to create a new parser
let bindP f p =
let innerFn input =
let result1 = run p input
match result1 with
| Failure err ->
// return error from parser1
Failure err
| Success (value1,remainingInput) ->
// apply f to get a new parser
let p2 = f value1
// run parser with remaining input
run p2 remainingInput
Parser innerFn
/// Infix version of bindP
let ( >>= ) p f = bindP f p
/// Lift a value to a Parser
let returnP x =
let innerFn input =
// ignore the input and return x
Success (x,input)
// return the inner function
Parser innerFn
/// apply a function to the value inside a parser
let mapP f =
bindP (f >> returnP)
/// infix version of mapP
let ( <!> ) = mapP
/// "piping" version of mapP
let ( |>> ) x f = mapP f x
/// apply a wrapped function to a wrapped value
let applyP fP xP =
fP >>= (fun f ->
xP >>= (fun x ->
returnP (f x) ))
/// infix version of apply
let ( <*> ) = applyP
/// lift a two parameter function to Parser World
let lift2 f xP yP =
returnP f <*> xP <*> yP
/// Combine two parsers as "A andThen B"
let andThen p1 p2 =
p1 >>= (fun p1Result ->
p2 >>= (fun p2Result ->
returnP (p1Result,p2Result) ))
/// Infix version of andThen
let ( .>>. ) = andThen
/// Combine two parsers as "A orElse B"
let orElse p1 p2 =
let innerFn input =
// run parser1 with the input
let result1 = run p1 input
// test the result for Failure/Success
match result1 with
| Success result ->
// if success, return the original result
result1
| Failure err ->
// if failed, run parser2 with the input
let result2 = run p2 input
// return parser2's result
result2
// return the inner function
Parser innerFn
/// Infix version of orElse
let ( <|> ) = orElse
/// Choose any of a list of parsers
let choice listOfParsers =
List.reduce ( <|> ) listOfParsers
/// Choose any of a list of characters
let anyOf listOfChars =
listOfChars
|> List.map pchar // convert into parsers
|> choice
/// Convert a list of Parsers into a Parser of a list
let rec sequence parserList =
// define the "cons" function, which is a two parameter function
let cons head tail = head::tail
// lift it to Parser World
let consP = lift2 cons
// process the list of parsers recursively
match parserList with
| [] ->
returnP []
| head::tail ->
consP head (sequence tail)
/// (helper) match zero or more occurences of the specified parser
let rec parseZeroOrMore parser input =
// run parser with the input
let firstResult = run parser input
// test the result for Failure/Success
match firstResult with
| Failure err ->
// if parse fails, return empty list
([],input)
| Success (firstValue,inputAfterFirstParse) ->
// if parse succeeds, call recursively
// to get the subsequent values
let (subsequentValues,remainingInput) =
parseZeroOrMore parser inputAfterFirstParse
let values = firstValue::subsequentValues
(values,remainingInput)
/// matches zero or more occurences of the specified parser
let many parser =
let rec innerFn input =
// parse the input -- wrap in Success as it always succeeds
Success (parseZeroOrMore parser input)
Parser innerFn
/// matches one or more occurences of the specified parser
let many1 p =
p >>= (fun head ->
many p >>= (fun tail ->
returnP (head::tail) ))
/// Parses an optional occurrence of p and returns an option value.
let opt p =
let some = p |>> Some
let none = returnP None
some <|> none
/// Keep only the result of the left side parser
let (.>>) p1 p2 =
// create a pair
p1 .>>. p2
// then only keep the first value
|> mapP (fun (a,b) -> a)
/// Keep only the result of the right side parser
let (>>.) p1 p2 =
// create a pair
p1 .>>. p2
// then only keep the second value
|> mapP (fun (a,b) -> b)
/// Keep only the result of the middle parser
let between p1 p2 p3 =
p1 >>. p2 .>> p3
/// Parses one or more occurrences of p separated by sep
let sepBy1 p sep =
let sepThenP = sep >>. p
p .>>. many sepThenP
|>> fun (p,pList) -> p::pList
/// Parses zero or more occurrences of p separated by sep
let sepBy p sep =
sepBy1 p sep <|> returnP []
Summary
In this post, we have built on the basic parsing code from last time to create a library of a 15 or so combinators that can be combined to parse almost anything.
Soon, we’ll use them to build a JSON parser, but before that, let’s pause and take time to clean up the error messages.
That will be the topic of the next post.
The source code for this post is available at this gist.