Other Regular Expression Changes
Regular expressions are an important part of working with strings in JavaScript, and like many parts of the language, they haven’t changed much in recent versions. ECMAScript 6, however, makes several improvements to regular expressions to go along with the updates to strings.
The Regular Expression y Flag
ECMAScript 6 standardized the y
flag after it was implemented in Firefox as a proprietary extension to regular expressions. The y
flag affects a regular expression search’s sticky
property, and it tells the search to start matching characters in a string at the position specified by the regular expression’s lastIndex
property. If there is no match at that location, then the regular expression stops matching. To see how this works, consider the following code:
var text = "hello1 hello2 hello3",
pattern = /hello\d\s?/,
result = pattern.exec(text),
globalPattern = /hello\d\s?/g,
globalResult = globalPattern.exec(text),
stickyPattern = /hello\d\s?/y,
stickyResult = stickyPattern.exec(text);
console.log(result[0]); // "hello1 "
console.log(globalResult[0]); // "hello1 "
console.log(stickyResult[0]); // "hello1 "
pattern.lastIndex = 1;
globalPattern.lastIndex = 1;
stickyPattern.lastIndex = 1;
result = pattern.exec(text);
globalResult = globalPattern.exec(text);
stickyResult = stickyPattern.exec(text);
console.log(result[0]); // "hello1 "
console.log(globalResult[0]); // "hello2 "
console.log(stickyResult[0]); // Error! stickyResult is null
This example has three regular expressions. The expression in pattern
has no flags, the one in globalPattern
uses the g
flag, and the one in stickyPattern
uses the y
flag. In the first trio of console.log()
calls, all three regular expressions should return "hello1 "
with a space at the end.
After that, the lastIndex
property is changed to 1 on all three patterns, meaning that the regular expression should start matching from the second character on all of them. The regular expression with no flags completely ignores the change to lastIndex
and still matches "hello1 "
without incident. The regular expression with the g
flag goes on to match "hello2 "
because it is searching forward from the second character of the string ("e"
). The sticky regular expression doesn’t match anything beginning at the second character so stickyResult
is null
.
The sticky flag saves the index of the next character after the last match in lastIndex
whenever an operation is performed. If an operation results in no match, then lastIndex
is set back to 0. The global flag behaves the same way, as demonstrated here:
var text = "hello1 hello2 hello3",
pattern = /hello\d\s?/,
result = pattern.exec(text),
globalPattern = /hello\d\s?/g,
globalResult = globalPattern.exec(text),
stickyPattern = /hello\d\s?/y,
stickyResult = stickyPattern.exec(text);
console.log(result[0]); // "hello1 "
console.log(globalResult[0]); // "hello1 "
console.log(stickyResult[0]); // "hello1 "
console.log(pattern.lastIndex); // 0
console.log(globalPattern.lastIndex); // 7
console.log(stickyPattern.lastIndex); // 7
result = pattern.exec(text);
globalResult = globalPattern.exec(text);
stickyResult = stickyPattern.exec(text);
console.log(result[0]); // "hello1 "
console.log(globalResult[0]); // "hello2 "
console.log(stickyResult[0]); // "hello2 "
console.log(pattern.lastIndex); // 0
console.log(globalPattern.lastIndex); // 14
console.log(stickyPattern.lastIndex); // 14
The value of lastIndex
changes to 7 after the first call to exec()
and to 14 after the second call, for both the stickyPattern
and globalPattern
variables.
There are two more subtle details about the sticky flag to keep in mind:
- The
lastIndex
property is only honored when calling methods that exist on the regular expression object, like theexec()
andtest()
methods. Passing the regular expression to a string method, such asmatch()
, will not result in the sticky behavior. - When using the
^
character to match the start of a string, sticky regular expressions only match from the start of the string (or the start of the line in multiline mode). WhilelastIndex
is 0, the^
makes a sticky regular expression no different from a non-sticky one. IflastIndex
doesn’t correspond to the beginning of the string in single-line mode or the beginning of a line in multiline mode, the sticky regular expression will never match.
As with other regular expression flags, you can detect the presence of y
by using a property. In this case, you’d check the sticky
property, as follows:
var pattern = /hello\d/y;
console.log(pattern.sticky); // true
The sticky
property is set to true if the sticky flag is present, and the property is false if not. The sticky
property is read-only based on the presence of the flag and cannot be changed in code.
Similar to the u
flag, the y
flag is a syntax change, so it will cause a syntax error in older JavaScript engines. You can use the following approach to detect support:
function hasRegExpY() {
try {
var pattern = new RegExp(".", "y");
return true;
} catch (ex) {
return false;
}
}
Just like the u
check, this returns false if it’s unable to create a regular expression with the y
flag. In one final similarity to u
, if you need to use y
in code that runs in older JavaScript engines, be sure to use the RegExp
constructor when defining those regular expressions to avoid a syntax error.
Duplicating Regular Expressions
In ECMAScript 5, you can duplicate regular expressions by passing them into the RegExp
constructor like this:
var re1 = /ab/i,
re2 = new RegExp(re1);
The re2
variable is just a copy of the re1
variable. But if you provide the second argument to the RegExp
constructor, which specifies the flags for the regular expression, your code won’t work, as in this example:
var re1 = /ab/i,
// throws an error in ES5, okay in ES6
re2 = new RegExp(re1, "g");
If you execute this code in an ECMAScript 5 environment, you’ll get an error stating that the second argument cannot be used when the first argument is a regular expression. ECMAScript 6 changed this behavior such that the second argument is allowed and overrides any flags present on the first argument. For example:
var re1 = /ab/i,
// throws an error in ES5, okay in ES6
re2 = new RegExp(re1, "g");
console.log(re1.toString()); // "/ab/i"
console.log(re2.toString()); // "/ab/g"
console.log(re1.test("ab")); // true
console.log(re2.test("ab")); // true
console.log(re1.test("AB")); // true
console.log(re2.test("AB")); // false
In this code, re1
has the case-insensitive i
flag while re2
has only the global g
flag. The RegExp
constructor duplicated the pattern from re1
and substituted the g
flag for the i
flag. Without the second argument, re2
would have the same flags as re1
.
The flags
Property
Along with adding a new flag and changing how you can work with flags, ECMAScript 6 added a property associated with them. In ECMAScript 5, you could get the text of a regular expression by using the source
property, but to get the flag string, you’d have to parse the output of the toString()
method as shown below:
function getFlags(re) {
var text = re.toString();
return text.substring(text.lastIndexOf("/") + 1, text.length);
}
// toString() is "/ab/g"
var re = /ab/g;
console.log(getFlags(re)); // "g"
This converts a regular expression into a string and then returns the characters found after the last /
. Those characters are the flags.
ECMAScript 6 makes fetching flags easier by adding a flags
property to go along with the source
property. Both properties are prototype accessor properties with only a getter assigned, making them read-only. The flags
property makes inspecting regular expressions easier for both debugging and inheritance purposes.
A late addition to ECMAScript 6, the flags
property returns the string representation of any flags applied to a regular expression. For example:
var re = /ab/g;
console.log(re.source); // "ab"
console.log(re.flags); // "g"
This fetches all flags on re
and prints them to the console with far fewer lines of code than the toString()
technique can. Using source
and flags
together allows you to extract the pieces of the regular expression that you need without parsing the regular expression string directly.
The changes to strings and regular expressions that this chapter has covered so far are definitely powerful, but ECMAScript 6 improves your power over strings in a much bigger way. It brings a type of literal to the table that makes strings more flexible.