Unicode Identifiers
ECMAScript 6 offers better Unicode support than previous versions of JavaScript, and it also changes what characters may be used as identifiers. In ECMAScript 5, it was already possible to use Unicode escape sequences for identifiers. For example:
// Valid in ECMAScript 5 and 6
var \u0061 = "abc";
console.log(\u0061); // "abc"
// equivalent to:
console.log(a); // "abc"
After the var
statement in this example, you can use either \u0061
or a
to access the variable. In ECMAScript 6, you can also use Unicode code point escape sequences as identifiers, like this:
// Valid in ECMAScript 5 and 6
var \u{61} = "abc";
console.log(\u{61}); // "abc"
// equivalent to:
console.log(a); // "abc"
This example just replaces \u0061
with its code point equivalent. Otherwise, it does exactly the same thing as the previous example.
Additionally, ECMAScript 6 formally specifies valid identifiers in terms of Unicode Standard Annex #31: Unicode Identifier and Pattern Syntax, which gives the following rules:
- The first character must be
$
,_
, or any Unicode symbol with a derived core property ofID_Start
. - Each subsequent character must be
$
,_
,\u200c
(a zero-width non-joiner),\u200d
(a zero-width joiner), or any Unicode symbol with a derived core property ofID_Continue
.
The ID_Start
and ID_Continue
derived core properties are defined in Unicode Identifier and Pattern Syntax as a way to identify symbols that are appropriate for use in identifiers such as variables and domain names. The specification is not specific to JavaScript.