The Parser
The AST for a program captures its behavior in such a way that it is easy for later stages of the compiler (e.g. code generation) to interpret. We basically want one object for each construct in the language, and the AST should closely model the language. In Kaleidoscope, we have expressions, and a function object. When parsing with Parsec we will unpack tokens straight into our AST which we define as the Expr
algebraic data type:
module Syntax where
type Name = String
data Expr
= Float Double
| BinOp Op Expr Expr
| Var String
| Call Name [Expr]
| Function Name [Expr] Expr
| Extern Name [Expr]
deriving (Eq, Ord, Show)
data Op
= Plus
| Minus
| Times
| Divide
deriving (Eq, Ord, Show)
This is all (intentionally) rather straight-forward: variables capture the variable name, binary operators capture their operation (e.g. Plus
, Minus
, …), and calls capture a function name as well as a list of any argument expressions.
We create Parsec parser which will scan an input source and unpack it into our Expr
type. The code composes within the Parser
to generate the resulting parser which is then executed using the parse
function.
module Parser where
import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Expr as Ex
import qualified Text.Parsec.Token as Tok
import Lexer
import Syntax
binary s f assoc = Ex.Infix (reservedOp s >> return (BinOp f)) assoc
table = [[binary "*" Times Ex.AssocLeft,
binary "/" Divide Ex.AssocLeft]
,[binary "+" Plus Ex.AssocLeft,
binary "-" Minus Ex.AssocLeft]]
int :: Parser Expr
int = do
n <- integer
return $ Float (fromInteger n)
floating :: Parser Expr
floating = do
n <- float
return $ Float n
expr :: Parser Expr
expr = Ex.buildExpressionParser table factor
variable :: Parser Expr
variable = do
var <- identifier
return $ Var var
function :: Parser Expr
function = do
reserved "def"
name <- identifier
args <- parens $ many variable
body <- expr
return $ Function name args body
extern :: Parser Expr
extern = do
reserved "extern"
name <- identifier
args <- parens $ many variable
return $ Extern name args
call :: Parser Expr
call = do
name <- identifier
args <- parens $ commaSep expr
return $ Call name args
factor :: Parser Expr
factor = try floating
<|> try int
<|> try extern
<|> try function
<|> try call
<|> variable
<|> parens expr
defn :: Parser Expr
defn = try extern
<|> try function
<|> expr
contents :: Parser a -> Parser a
contents p = do
Tok.whiteSpace lexer
r <- p
eof
return r
toplevel :: Parser [Expr]
toplevel = many $ do
def <- defn
reservedOp ";"
return def
parseExpr :: String -> Either ParseError Expr
parseExpr s = parse (contents expr) "<stdin>" s
parseToplevel :: String -> Either ParseError [Expr]
parseToplevel s = parse (contents toplevel) "<stdin>" s