Chapter 3 ( Code Generation ) - From AST to IR - 《Implementing a JIT Compiled Language with Haskell and LLVM》

From AST to IR

From AST to IR

Now that we have the infrastructure in place we can begin ingest our AST from Syntax.hs and construct a LLVM module from it. We will create a new Emit.hs module and spread the logic across two functions. The first codegenTop will emit toplevel constructions in modules ( functions and external definitions ) and will return a LLVM monad. The last instruction on the stack we’ll bind into the ret instruction to ensure and emit as the return value of the function. We’ll also sequentially assign each of the named arguments from the function to a stack allocated value with a reference in our symbol table.

codegenTop :: S.Expr -> LLVM ()
codegenTop (S.Function name args body) = do
  define double name fnargs bls
  where
    fnargs = toSig args
    bls = createBlocks $ execCodegen $ do
      entry <- addBlock entryBlockName
      setBlock entry
      forM args $ \a -> do
        var <- alloca double
        store var (local (AST.Name a))
        assign a var
      cgen body >>= ret
codegenTop (S.Extern name args) = do
  external double name fnargs
  where fnargs = toSig args
codegenTop exp = do
  define double "main" [] blks
  where
    blks = createBlocks $ execCodegen $ do
      entry <- addBlock entryBlockName
      setBlock entry
      cgen exp >>= ret
toSig :: [String] -> [(AST.Type, AST.Name)]
toSig = map (\x -> (double, AST.Name x))

The second is the expression level code generation (cgen) which will recursively walk the AST pushing instructions on the stack and changing the current block as needed. The simplest AST node is constant integers and floating point values which simply return constant values in LLVM IR.

cgen :: S.Expr -> Codegen AST.Operand
cgen (S.Float n) = return $ cons $ C.Float (F.Double n)

We need to reference local variables so we’ll invoke our getvar function in conjunction with a load use values. The conscious reader will intuit that this might result in an excessive amount of extraneous instructions pushing temporary values on the stack, something that we’ll address later with a simple optimization pass.

cgen (S.Var x) = getvar x >>= load

For Call we’ll first evaluate each argument and then invoke the function with the values. Since our language only has double type values, this is trivial and we don’t need to worry too much.

cgen (S.Call fn args) = do
  largs <- mapM cgen args
  call (externf (AST.Name fn)) largs

Finally for our operators we’ll construct a predefined association map of symbol strings to implementations of functions with the corresponding logic for the operation.

binops = Map.fromList [
      ("+", fadd)
    , ("-", fsub)
    , ("*", fmul)
    , ("/", fdiv)
    , ("<", lt)
  ]

For the comparison operator we’ll invoke the uitofp which will convert a unsigned integer quantity to a floating point value. LLVM requires the unsigned single bit types as the values for comparison and test operations but we prefer to work entirely with doubles where possible.

lt :: AST.Operand -> AST.Operand -> Codegen AST.Operand
lt a b = do
  test <- fcmp FP.ULT a b
  uitofp double test

Just like the call instruction above we simply generate the code for operands and invoke the function we just looked up for the symbol.

cgen (S.BinaryOp op a b) = do
  case Map.lookup op binops of
    Just f -> do
      ca <- cgen a
      cb <- cgen b
      f ca cb
    Nothing -> error "No such operator"

Putting everything together we find that we nice little minimal language that supports both function abstraction and basic arithmetic. The final step is to hook into LLVM bindings to generate a string representation of the LLVM IR which will print out the string on each action in the REPL. We’ll discuss these functions in more depth in the next chapter.

codegen :: AST.Module -> [S.Expr] -> IO AST.Module
codegen mod fns = withContext $ \context ->
  liftError $ withModuleFromAST context newast $ \m -> do
    llstr <- moduleLLVMAssembly m
    putStrLn llstr
    return newast
  where
    modn = mapM codegenTop fns
    newast = runLLVM mod modn

Running Main.hs we can observe our code generator in action.

ready> def foo(a b) a*a + 2*a*b + b*b;
; ModuleID = 'my cool jit'
define double @foo(double %a, double %b)  {
entry:
  %0 = fmul double %a, %a
  %1 = fmul double %a, 2.000000e+00
  %2 = fmul double %1, %b
  %3 = fadd double %0, %2
  %4 = fmul double %b, %b
  %5 = fadd double %4, %3
  ret double %5
}
ready> def bar(a) foo(a, 4.0) + bar(31337);
define double @bar(double %a) {
entry:
  %0 = alloca double
  store double %a, double* %0
  %1 = load double* %0
  %2 = call double @foo(double %1, double 4.000000e+00)
  %3 = call double @bar(double 3.133700e+04)
  %4 = fadd double %2, %3
  ret double %4
}