Core language

A program in Agda consists of a number of declarations written in an *.agda file. A declaration introduces a new identifier and gives its type and definition. It is possible to declare:

Declarations have a signature part and a definition part. These can appear separately in the program. Names must be declared before they are used, but by separating the signature from the definition it is possible to define things in mutual recursion.

Grammar

At its core, Agda is a dependently typed lambda calculus. The grammar of terms where a represents a generic term is:

  1. a ::= x -- variable
  2. | λ x a -- lambda abstraction
  3. | f -- defined function
  4. | (x : a) a -- function space
  5. | F -- data/record type
  6. | c a -- data/record constructor
  7. | s -- sort Seti, Setω+i

Syntax overview

The syntax of an Agda program is defined in terms of three key components:

  • Expressions write function bodies and types.

  • Declarations declare types, data-types, postulates, records, functions etc.

  • Pragmas define program options.

There are also three main levels of syntax, corresponding to different levels of interpretation:

  • Concrete is the high-level sugared syntax, it representing exactly what the user wrote (Agda.Syntax.Concrete).

  • Abstract, before typechecking (Agda.Syntax.Abstract)

  • Internal, the full-intepreted core Agda terms, typechecked; roughly corresponding to (Agda.Syntax.Internal).

The process of translating an *.agda file into an executable has several stages:

  1. *.agda file
  2. ==[ parser (Lexer.x + Parser.y) ]==>
  3. Concrete syntax
  4. ==[ nicifier (Syntax.Concrete.Definitions) ]==>
  5. 'Nice' concrete syntax
  6. ==[ scope checking (Syntax.Translation.ConcreteToAbstract) ]==>
  7. Abstract syntax
  8. ==[ type checking (TypeChecking.Rules.*) ]==>
  9. Internal syntax
  10. ==[ Agda.Compiler.ToTreeless ]==>
  11. Treeless syntax
  12. ==[ different backends (Compiler.MAlonzo.*, Compiler.JS.*, ...) ]==>
  13. Source code
  14. ==[ different compilers (GHC compiler, ...) ]==>
  15. Executable

The following sections describe these stages in more detail:

Lexer

Lexical analysis (aka tokenization) is the process of converting a sequence of characters (the raw *.agda file) into a sequence of tokens (strings with a meaning).

The lexer in Agda is generated by Alex, and is an adaptation of GHC’s lexer. The main lexing function lexer is called by the Agda.Syntax.Parser.Parser to get the next token from the input.

Parser

The parser is the component that takes the output of the lexer and builds a data structure that we will call Concrete Syntax, while checking for correct syntax.

The parser is generated by Happy.

Example: when a name is a sequence of parts, the lexer just sees it as a string, the parser does the translation in this step.

Concrete Syntax

The concrete syntax is a raw representation of the program text without any desugaring at all. This is what the parser produces. The idea is that if we figure out how to keep the concrete syntax around, it can be printed exactly as the user wrote it.

Nice Concrete Syntax

The Nice Concrete Syntax is a slightly reorganized version of the Concrete Syntax that is easier to deal with internally. Among other things, it:

  • detects mutual blocks

  • assembles definitions from their isolated parts

  • collects fixity information of mixfix operators and attaches it to definitions

  • emits warnings for possibly unintended but still valid declarations, which essentially is dead code such as empty instance blocks and misplaced pragmas

Abstract Syntax

The translation from Agda.Syntax.Concrete to Agda.Syntax.Abstract involves scope analysis, figuring out infix operator precedences and tidying up definitions.

The abstract syntax Agda.Syntax.Abstract is the result after desugaring and scope analysis of the concrete syntax. The type checker works on abstract syntax, producing internal syntax.

Internal Syntax

This is the final stage of syntax before being handed off to one of the backends. Terms are well-scoped and well-typed.

While producing the Internal Syntax, terms are checked for safety. This safety check means termination check and coverage check for functions, and positivity check for datatypes.

Type-directed operations such as instance resolution and disambiguation of overloaded constructors (different constructors with the same name) also happen here.

The internal syntax Agda.Syntax.Internal uses the following haskell datatype to represent the grammar of a Term presented above.

  1. data Term = Var {-# UNPACK #-} !Int Elims -- ^ @x es@ neutral
  2. | Lam ArgInfo (Abs Term) -- ^ Terms are beta normal. Relevance is ignored
  3. | Lit Literal
  4. | Def QName Elims -- ^ @f es@, possibly a delta/iota-redex
  5. | Con ConHead ConInfo Elims
  6. -- ^ @c es@ or @record { fs = es }@
  7. -- @es@ allows only Apply and IApply eliminations,
  8. -- and IApply only for data constructors.
  9. | Pi (Dom Type) (Abs Type) -- ^ dependent or non-dependent function space
  10. | Sort Sort
  11. | Level Level
  12. | MetaV {-# UNPACK #-} !MetaId Elims

Treeless Syntax

The treeless syntax is intended to be used as input for the compiler backends. It is more low-level than the internal syntax and is not used for type checking. Some of the features of the treeless syntax are:

  • case expressions instead of case trees

  • no instantiated datatypes / constructors

For instance, the Glasgow Haskell Compiler (GHC) backend translates the treeless syntax into a proper GHC Haskell program.

Another backend that may be used is the JavaScript backend, which translates the treeless syntax to JavaScript code.

The treeless representation of the program has A-normal form (ANF). That means that all the case expressions are targeting a single variable, and all alternatives may only peel off one constructor.

The backends can handle an ANF syntax easier than a syntax of a language where one may case arbitrary expressions and use deep patterns.