From AST to IR
Now that we have the infrastructure in place we can begin ingest our AST from Syntax.hs
and construct a LLVM module from it. We will create a new Emit.hs
module and spread the logic across two functions. The first codegenTop
will emit toplevel constructions in modules ( functions and external definitions ) and will return a LLVM
monad. The last instruction on the stack we’ll bind into the ret
instruction to ensure and emit as the return value of the function. We’ll also sequentially assign
each of the named arguments from the function to a stack allocated value with a reference in our symbol table.
codegenTop :: S.Expr -> LLVM ()
codegenTop (S.Function name args body) = do
define double name fnargs bls
where
fnargs = toSig args
bls = createBlocks $ execCodegen $ do
entry <- addBlock entryBlockName
setBlock entry
forM args $ \a -> do
var <- alloca double
store var (local (AST.Name a))
assign a var
cgen body >>= ret
codegenTop (S.Extern name args) = do
external double name fnargs
where fnargs = toSig args
codegenTop exp = do
define double "main" [] blks
where
blks = createBlocks $ execCodegen $ do
entry <- addBlock entryBlockName
setBlock entry
cgen exp >>= ret
toSig :: [String] -> [(AST.Type, AST.Name)]
toSig = map (\x -> (double, AST.Name x))
The second is the expression level code generation (cgen
) which will recursively walk the AST pushing instructions on the stack and changing the current block as needed. The simplest AST node is constant integers and floating point values which simply return constant values in LLVM IR.
cgen :: S.Expr -> Codegen AST.Operand
cgen (S.Float n) = return $ cons $ C.Float (F.Double n)
We need to reference local variables so we’ll invoke our getvar
function in conjunction with a load
use values. The conscious reader will intuit that this might result in an excessive amount of extraneous instructions pushing temporary values on the stack, something that we’ll address later with a simple optimization pass.
cgen (S.Var x) = getvar x >>= load
For Call
we’ll first evaluate each argument and then invoke the function with the values. Since our language only has double type values, this is trivial and we don’t need to worry too much.
cgen (S.Call fn args) = do
largs <- mapM cgen args
call (externf (AST.Name fn)) largs
Finally for our operators we’ll construct a predefined association map of symbol strings to implementations of functions with the corresponding logic for the operation.
binops = Map.fromList [
("+", fadd)
, ("-", fsub)
, ("*", fmul)
, ("/", fdiv)
, ("<", lt)
]
For the comparison operator we’ll invoke the uitofp
which will convert a unsigned integer quantity to a floating point value. LLVM requires the unsigned single bit types as the values for comparison and test operations but we prefer to work entirely with doubles where possible.
lt :: AST.Operand -> AST.Operand -> Codegen AST.Operand
lt a b = do
test <- fcmp FP.ULT a b
uitofp double test
Just like the call
instruction above we simply generate the code for operands and invoke the function we just looked up for the symbol.
cgen (S.BinaryOp op a b) = do
case Map.lookup op binops of
Just f -> do
ca <- cgen a
cb <- cgen b
f ca cb
Nothing -> error "No such operator"
Putting everything together we find that we nice little minimal language that supports both function abstraction and basic arithmetic. The final step is to hook into LLVM bindings to generate a string representation of the LLVM IR which will print out the string on each action in the REPL. We’ll discuss these functions in more depth in the next chapter.
codegen :: AST.Module -> [S.Expr] -> IO AST.Module
codegen mod fns = withContext $ \context ->
liftError $ withModuleFromAST context newast $ \m -> do
llstr <- moduleLLVMAssembly m
putStrLn llstr
return newast
where
modn = mapM codegenTop fns
newast = runLLVM mod modn
Running Main.hs
we can observe our code generator in action.
ready> def foo(a b) a*a + 2*a*b + b*b;
; ModuleID = 'my cool jit'
define double @foo(double %a, double %b) {
entry:
%0 = fmul double %a, %a
%1 = fmul double %a, 2.000000e+00
%2 = fmul double %1, %b
%3 = fadd double %0, %2
%4 = fmul double %b, %b
%5 = fadd double %4, %3
ret double %5
}
ready> def bar(a) foo(a, 4.0) + bar(31337);
define double @bar(double %a) {
entry:
%0 = alloca double
store double %a, double* %0
%1 = load double* %0
%2 = call double @foo(double %1, double 4.000000e+00)
%3 = call double @bar(double 3.133700e+04)
%4 = fadd double %2, %3
ret double %4
}