Source Encoding
Zig source code is encoded in UTF-8. An invalid UTF-8 byte sequence results in a compile error.
Throughout all zig source code (including in comments), some code points are never allowed:
- Ascii control characters, except for U+000a (LF), U+000d (CR), and U+0009 (HT): U+0000 - U+0008, U+000b - U+000c, U+000e - U+0001f, U+007f.
- Non-Ascii Unicode line endings: U+0085 (NEL), U+2028 (LS), U+2029 (PS).
LF (byte value 0x0a, code point U+000a, '\n'
) is the line terminator in Zig source code. This byte value terminates every line of zig source code except the last line of the file. It is recommended that non-empty source files end with an empty line, which means the last byte would be 0x0a (LF).
Each LF may be immediately preceded by a single CR (byte value 0x0d, code point U+000d, '\r'
) to form a Windows style line ending, but this is discouraged. A CR in any other context is not allowed.
HT hard tabs (byte value 0x09, code point U+0009, '\t'
) are interchangeable with SP spaces (byte value 0x20, code point U+0020, ' '
) as a token separator, but use of hard tabs is discouraged. See Grammar.
Note that running zig fmt
on a source file will implement all recommendations mentioned here. Note also that the stage1 compiler does not yet support CR or HT control characters.
Note that a tool reading Zig source code can make assumptions if the source code is assumed to be correct Zig code. For example, when identifying the ends of lines, a tool can use a naive search such as /\n/
, or an advanced search such as /\r\n?|[\n\u0085\u2028\u2029]/
, and in either case line endings will be correctly identified. For another example, when identifying the whitespace before the first token on a line, a tool can either use a naive search such as /[ \t]/
, or an advanced search such as /\s/
, and in either case whitespace will be correctly identified.