Lexical Analysis - Numeric literals - 《Nim v2.0 Manual》

Numeric literals
- Custom numeric literals

Numeric literals

Numeric literals have the form:

hexdigit = digit | 'A'..'F' | 'a'..'f'
octdigit = '0'..'7'
bindigit = '0'..'1'
unary_minus = '-' # See the section about unary minus
HEX_LIT = unary_minus? '0' ('x' | 'X' ) hexdigit ( ['_'] hexdigit )*
DEC_LIT = unary_minus? digit ( ['_'] digit )*
OCT_LIT = unary_minus? '0' 'o' octdigit ( ['_'] octdigit )*
BIN_LIT = unary_minus? '0' ('b' | 'B' ) bindigit ( ['_'] bindigit )*
INT_LIT = HEX_LIT
        | DEC_LIT
        | OCT_LIT
        | BIN_LIT
INT8_LIT = INT_LIT ['\''] ('i' | 'I') '8'
INT16_LIT = INT_LIT ['\''] ('i' | 'I') '16'
INT32_LIT = INT_LIT ['\''] ('i' | 'I') '32'
INT64_LIT = INT_LIT ['\''] ('i' | 'I') '64'
UINT_LIT = INT_LIT ['\''] ('u' | 'U')
UINT8_LIT = INT_LIT ['\''] ('u' | 'U') '8'
UINT16_LIT = INT_LIT ['\''] ('u' | 'U') '16'
UINT32_LIT = INT_LIT ['\''] ('u' | 'U') '32'
UINT64_LIT = INT_LIT ['\''] ('u' | 'U') '64'
exponent = ('e' | 'E' ) ['+' | '-'] digit ( ['_'] digit )*
FLOAT_LIT = unary_minus? digit (['_'] digit)* (('.' digit (['_'] digit)* [exponent]) |exponent)
FLOAT32_SUFFIX = ('f' | 'F') ['32']
FLOAT32_LIT = HEX_LIT '\'' FLOAT32_SUFFIX
            | (FLOAT_LIT | DEC_LIT | OCT_LIT | BIN_LIT) ['\''] FLOAT32_SUFFIX
FLOAT64_SUFFIX = ( ('f' | 'F') '64' ) | 'd' | 'D'
FLOAT64_LIT = HEX_LIT '\'' FLOAT64_SUFFIX
            | (FLOAT_LIT | DEC_LIT | OCT_LIT | BIN_LIT) ['\''] FLOAT64_SUFFIX
CUSTOM_NUMERIC_LIT = (FLOAT_LIT | INT_LIT) '\'' CUSTOM_NUMERIC_SUFFIX
# CUSTOM_NUMERIC_SUFFIX is any Nim identifier that is not
# a pre-defined type suffix.

As can be seen in the productions, numeric literals can contain underscores for readability. Integer and floating-point literals may be given in decimal (no prefix), binary (prefix 0b), octal (prefix 0o), and hexadecimal (prefix 0x) notation.

The fact that the unary minus - in a number literal like -1 is considered to be part of the literal is a late addition to the language. The rationale is that an expression -128’i8 should be valid and without this special case, this would be impossible — 128 is not a valid int8 value, only -128 is.

For the unary_minus rule there are further restrictions that are not covered in the formal grammar. For - to be part of the number literal the immediately preceding character has to be in the set {‘ ‘, ‘\t’, ‘\n’, ‘\r’, ‘,’, ‘;’, ‘(‘, ‘[‘, ‘{‘}. This set was designed to cover most cases in a natural manner.

In the following examples, -1 is a single token:

echo -1
echo(-1)
echo [-1]
echo 3,-1
"abc";-1

In the following examples, -1 is parsed as two separate tokens (as - 1):

echo x-1
echo (int)-1
echo [a]-1
"abc"-1

The suffix starting with an apostrophe (‘’’) is called a type suffix. Literals without a type suffix are of an integer type unless the literal contains a dot or E|e in which case it is of type float. This integer type is int if the literal is in the range low(int32)..high(int32), otherwise it is int64. For notational convenience, the apostrophe of a type suffix is optional if it is not ambiguous (only hexadecimal floating-point literals with a type suffix can be ambiguous).

The pre-defined type suffixes are:

Type Suffix	Resulting type of literal
`‘i8`	int8
`‘i16`	int16
`‘i32`	int32
`‘i64`	int64
`‘u`	uint
`‘u8`	uint8
`‘u16`	uint16
`‘u32`	uint32
`‘u64`	uint64
`‘f`	float32
`‘d`	float64
`‘f32`	float32
`‘f64`	float64

Floating-point literals may also be in binary, octal or hexadecimal notation: 0B0_10001110100_0000101001000111101011101111111011000101001101001001’f64 is approximately 1.72826e35 according to the IEEE floating-point standard.

Literals must match the datatype, for example, 333’i8 is an invalid literal. Non-base-10 literals are used mainly for flags and bit pattern representations, therefore the checking is done on bit width and not on value range. Hence: 0b10000000’u8 == 0x80’u8 == 128, but, 0b10000000’i8 == 0x80’i8 == -1 instead of causing an overflow error.

Custom numeric literals

If the suffix is not predefined, then the suffix is assumed to be a call to a proc, template, macro or other callable identifier that is passed the string containing the literal. The callable identifier needs to be declared with a special ‘ prefix:

import strutils
type u4 = distinct uint8 # a 4-bit unsigned integer aka "nibble"
proc `'u4`(n: string): u4 =
  # The leading ' is required.
  result = (parseInt(n) and 0x0F).u4
var x = 5'u4

More formally, a custom numeric literal 123’custom is transformed to r”123”.’custom in the parsing step. There is no AST node kind that corresponds to this transformation. The transformation naturally handles the case that additional parameters are passed to the callee:

import strutils
type u4 = distinct uint8 # a 4-bit unsigned integer aka "nibble"
proc `'u4`(n: string; moreData: int): u4 =
  result = (parseInt(n) and 0x0F).u4
var x = 5'u4(123)

Custom numeric literals are covered by the grammar rule named CUSTOM_NUMERIC_LIT. A custom numeric literal is a single token.