Indentation

Nim’s standard grammar describes an indentation sensitive language. This means that all the control structures are recognized by indentation. Indentation consists only of spaces; tabulators are not allowed.

The indentation handling is implemented as follows: The lexer annotates the following token with the preceding number of spaces; indentation is not a separate token. This trick allows parsing of Nim with only 1 token of lookahead.

The parser uses a stack of indentation levels: the stack consists of integers counting the spaces. The indentation information is queried at strategic places in the parser but ignored otherwise: The pseudo-terminal IND{>} denotes an indentation that consists of more spaces than the entry at the top of the stack; IND{=} an indentation that has the same number of spaces. DED is another pseudo terminal that describes the action of popping a value from the stack, IND{>} then implies to push onto the stack.

With this notation we can now easily define the core of the grammar: A block of statements (simplified example):

  1. ifStmt = 'if' expr ':' stmt
  2. (IND{=} 'elif' expr ':' stmt)*
  3. (IND{=} 'else' ':' stmt)?
  4. simpleStmt = ifStmt / ...
  5. stmt = IND{>} stmt ^+ IND{=} DED # list of statements
  6. / simpleStmt # or a simple statement