buy the book to support the author.
Chapter 12. Strings
Strings are immutable sequences of JavaScript characters. Each such character is a 16-bit UTF-16 code unit. That means that a single Unicode character is represented by either one or two JavaScript characters. You mainly need to worry about the two-character case whenever you are counting characters or splitting strings (see Chapter 24).
String Literals
Both single and double quotes can be used to delimit string literals:
'He said: "Hello"'
"He said: \"Hello\""
'Everyone\'s a winner'
"Everyone's a winner"
Thus, you are free to use either kind of quote. There are several considerations, though:
- The most common style in the community is to use double quotes for HTML and single quotes for JavaScript.
- On the other hand, double quotes are used exclusively for strings in some languages (e.g., C and Java). Therefore, it may make sense to use them in a multilanguage code base.
- For JSON (discussed in Chapter 22), you must use double quotes.
Your code will look cleaner if you quote consistently. But sometimes, a different quote means that you don’t have to escape, which can justify your being less consistent (e.g., you may normally use single quotes, but temporarily switch to double quotes to write the last one of the preceding examples).
Escaping in String Literals
Most characters in string literals simply represent themselves. The backslash is used for escaping and enables several special features:
- Line continuations
- You can spread a string over multiple lines by escaping the end of the line (the line-terminating character, the line terminator) with a backslash:
var
str
=
'written \
over \
multiple \
lines'
;
console
.
log
(
str
===
'written over multiple lines'
);
// true
An alternative is to use the plus operator to concatenate:
var
str
=
'written '
+
'over '
+
'multiple '
+
'lines'
;
- Character escape sequences
- These sequences start with a backslash:
- Control characters:
\b
is a backspace,\f
is a form feed,\n
is a line feed (newline),\r
is a carriage return,\t
is a horizontal tab, and\v
is a vertical tab. - Escaped characters that represent themselves:
\'
is a single quote,\"
is a double quote, and\
is a backslash. All characters exceptb f n r t v x u
and decimal digits represent themselves, too. Here are two examples:
- > '\"'
- '"'
- > '\q'
- 'q'
- NUL character (Unicode code point 0)
- This character is represented by
\0
. - Hexadecimal escape sequences
\xHH
(HH
are two hexadecimal digits) specifies a character via an ASCII code. For example:
- > '\x4D'
- 'M'
- Unicode escape sequences
\uHHHH
(HHHH
are four hexadecimal digits) specifies a UTF-16 code unit (see Chapter 24). Here are two examples:
- > '\u004D'
- 'M'
- > '\u03C0'
- 'π'
Character Access
There are two operations that return the _n_th character of a string.[16] Note that JavaScript does not have a special data type for characters; these operations return strings:
- > 'abc'.charAt(1)
- 'b'
- > 'abc'[1]
- 'b'
Some older browsers don’t support the array-like access to characters via square brackets.
Converting to String
Values are converted to a string as follows:
Value | Result |
undefined
|
'undefined'
|
null
|
'null'
|
A boolean |
false → 'false'
|
true → 'true'
| |
A number |
The number as a string (e.g., 3.141 → '3.141' )
|
A string | Same as input (nothing to convert) |
An object |
Call ToPrimitive(value, String) (see Algorithm: ToPrimitive()—Converting a Value to a Primitive) and convert the resulting primitive.
|
Manually Converting to String
The three most common ways to convert any value to a string are:
String(value)
| (Invoked as a function, not as a constructor) |
''+value
| |
value.toString()
|
(Does not work for undefined and null !)
|
I prefer String()
, because it is more descriptive. Here are some examples:
- > String(false)
- 'false'
- > String(7.35)
- '7.35'
- > String({ first: 'John', last: 'Doe' })
- '[object Object]'
- > String([ 'a', 'b', 'c' ])
- 'a,b,c'
Note that for displaying data, JSON.stringify()
(JSON.stringify(value, replacer?, space?)) often works better than the canonical conversion to string:
- > console.log(JSON.stringify({ first: 'John', last: 'Doe' }))
- {"first":"John","last":"Doe"}
- > console.log(JSON.stringify([ 'a', 'b', 'c' ]))
- ["a","b","c"]
Naturally, you have to be aware of the limitations of JSON.stringify()
—it doesn’t always show everything. For example, it hides properties whose values it can’t handle (functions and more!). On the plus side, its output can be parsed by eval()
and it can display deeply nested data as nicely formatted trees.
Pitfall: conversion is not invertible
- > String(false)
- 'false'
- > Boolean('false')
- true
For undefined
and null
, we face similar problems.
Comparing Strings
There are two ways of comparing strings. First, you can use the comparison operators: <
, >
, ===
, <=
, >=
. They have the following drawbacks:
- They’re case-sensitive:
- > 'B' > 'A' // ok
- true
- > 'B' > 'a' // should be true
- false
- They don’t handle umlauts and accents well:
- > 'ä' < 'b' // should be true
- false
- > 'é' < 'f' // should be true
- false
Second, you can use String.prototype.localeCompare(other)
, which tends to fare better, but isn’t always supported (consult Search and Compare for details).The following is an interaction in Firefox’s console:
- > 'B'.localeCompare('A')
- 2
- > 'B'.localeCompare('a')
- 2
- > 'ä'.localeCompare('b')
- -2
- > 'é'.localeCompare('f')
- -2
A result less than zero means that the receiver is “smaller” than the argument. A result greater than zero means that the receiver is “larger” than the argument.
Concatenating Strings
There are two main approaches for concatenating strings.
Concatenation: The Plus (+) Operator
The operator +
does string concatenation as soon as one of its operands is a string. If you want to collect string pieces in a variable, the compound assignment operator +=
is useful:
- > var str = '';
- > str += 'Say hello ';
- > str += 7;
- > str += ' times fast!';
- > str
- 'Say hello 7 times fast!'
Concatenation: Joining an Array of String Fragments
It may seem that the previous approach creates a new string whenever a piece is added to str
. Older JavaScript engines do it that way, which means that you can improve the performance of string concatenation by collecting all the pieces in an array first and joining them as a last step:
- > var arr = [];
- > arr.push('Say hello ');
- > arr.push(7);
- > arr.push(' times fast');
- > arr.join('')
- 'Say hello 7 times fast'
However, newer engines optimize string concatenation via +
and use a similar method internally. Therefore, the plus operator is faster on those engines.
The Function String
The function String
can be invoked in two ways:
String(value)
- As a normal function, it converts
value
to a primitive string (see Converting to String):
- > String(123)
- '123'
- > typeof String('abc') // no change
- 'string'
new String(str)
- As a constructor, it creates a new instance of
String
(see Wrapper Objects for Primitives), an object that wrapsstr
(nonstrings are coerced to string). For example:
- > typeof new String('abc')
- 'object'
The former invocation is the common one.
String Constructor Method
String.fromCharCode(codeUnit1, codeUnit2, …)
produces a string whose characters are the UTF-16 code units specified by the 16-bit unsigned integers codeUnit1
, codeUnit2
, and so on. For example:
- > String.fromCharCode(97, 98, 99)
- 'abc'
If you want to turn an array of numbers into a string, you can do so via apply()
(see func.apply(thisValue, argArray)):
- > String.fromCharCode.apply(null, [97, 98, 99])
- 'abc'
The inverse of String.fromCharCode()
is String.prototype.charCodeAt()
.
String Instance Property length
The length
property indicates the number of JavaScript characters in the string and is immutable:
- > 'abc'.length
- 3
String Prototype Methods
All methods of primitive strings are stored in String.prototype
(refer back to Primitives Borrow Their Methods from Wrappers). Next, I describe how they work for primitive strings, not for instances of String
.
Extract Substrings
The following methods extract substrings from the receiver:
String.prototype.charAt(pos)
- Returns a string with the character at position
pos
. For example:
- > 'abc'.charAt(1)
- 'b'
The following two expressions return the same result, but some older JavaScript engines support only charAt()
for accessing characters:
str
.
charAt
(
n
)
str
[
n
]
String.prototype.charCodeAt(pos)
- Returns the code (a 16-bit unsigned integer) of the JavaScript character (a UTF-16 code unit; see Chapter 24) at position
pos
.
This is how you create an array of character codes:
- > 'abc'.split('').map(function (x) { return x.charCodeAt(0) })
- [ 97, 98, 99 ]
The inverse of charCodeAt()
is String.fromCharCode()
.
String.prototype.slice(start, end?)
- Returns the substring starting at position
start
up to and excluding positionend
. Both of the two parameters can be negative, and then thelength
of the string is added to them:
- > 'abc'.slice(2)
- 'c'
- > 'abc'.slice(1, 2)
- 'b'
- > 'abc'.slice(-2)
- 'bc'
String.prototype.substring(start, end?)
- Should be avoided in favor of
slice()
, which is similar, but can handle negative positions and is implemented more consistently across browsers. String.prototype.split(separator?, limit?)
- Extracts the substrings of the receiver that are delimited by
separator
and returns them in an array. The method has two parameters:
separator
: Either a string or a regular expression. If missing, the complete string is returned, wrapped in an array.limit
: If given, the returned array contains at mostlimit
elements.
Here are some examples:
- > 'a, b,c, d'.split(',') // string
- [ 'a', ' b', 'c', ' d' ]
- > 'a, b,c, d'.split(/,/) // simple regular expression
- [ 'a', ' b', 'c', ' d' ]
- > 'a, b,c, d'.split(/, */) // more complex regular expression
- [ 'a', 'b', 'c', 'd' ]
- > 'a, b,c, d'.split(/, */, 2) // setting a limit
- [ 'a', 'b' ]
- > 'test'.split() // no separator provided
- [ 'test' ]
If there is a group, then the matches are also returned as array elements:
- > 'a, b , '.split(/(,)/)
- [ 'a', ',', ' b ', ',', ' ' ]
- > 'a, b , '.split(/ *(,) */)
- [ 'a', ',', 'b', ',', '' ]
Use ''
(empty string) as a separator to produce an array with the characters of a string:
- > 'abc'.split('')
- [ 'a', 'b', 'c' ]
Transform
While the previous section was about extracting substrings, this section is about transforming a given string into a new one. These methods are typically used as follows:
var
str
=
str
.
trim
();
In other words, the original string is discarded after it has been (nondestructively) transformed:
String.prototype.trim()
- Removes all whitespace from the beginning and the end of the string:
- > '\r\nabc \t'.trim()
- 'abc'
String.prototype.concat(str1?, str2?, …)
- Returns the concatenation of the receiver and
str1
,str2
, etc.:
- > 'hello'.concat(' ', 'world', '!')
- 'hello world!'
String.prototype.toLowerCase()
- Creates a new string with all of the original string’s characters converted to lowercase:
- > 'MJÖLNIR'.toLowerCase()
- 'mjölnir'
String.prototype.toLocaleLowerCase()
- Works the same as
toLowerCase()
, but respects the rules of the current locale. According to the ECMAScript spec: “There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.” String.prototype.toUpperCase()
- Creates a new string with all of the original string’s characters converted to uppercase:
- > 'mjölnir'.toUpperCase()
- 'MJÖLNIR'
String.prototype.toLocaleUpperCase()
- Works the same as
toUpperCase()
, but respects the rules of the current locale.
Search and Compare
The following methods are used for searching and comparing strings:
String.prototype.indexOf(searchString, position?)
- Searches for
searchString
starting atposition
(the default is 0). It returns the position wheresearchString
has been found or –1 if it can’t be found:
- > 'aXaX'.indexOf('X')
- 1
- > 'aXaX'.indexOf('X', 2)
- 3
Note that when it comes to finding text inside a string, a regular expression works just as well. For example, the following two expressions are equivalent:
str
.
indexOf
(
'abc'
)
>=
0
/
abc
/
.
test
(
str
)
String.prototype.lastIndexOf(searchString, position?)
- Searches for
searchString
, starting atposition
(the default is the end), backward. It returns the position wheresearchString
has been found or –1 if it can’t be found:
- > 'aXaX'.lastIndexOf('X')
- 3
- > 'aXaX'.lastIndexOf('X', 2)
- 1
String.prototype.localeCompare(other)
- Performs a locale-sensitive comparison of the string with
other
. It returns a number:
- < 0 if the string comes before
other
- = 0 if the string is equivalent to
other
- > 0 if the string comes after
other
For example:
- > 'apple'.localeCompare('banana')
- -2
- > 'apple'.localeCompare('apple')
- 0
Warning
Not all JavaScript engines implement this method properly. Some just base it on the comparison operators. However, the ECMAScript Internationalization API (see The ECMAScript Internationalization API) does provide a Unicode-aware implementation. That is, if that API is available in an engine, localeCompare()
will work.
If it is supported, localeCompare()
is a better choice for comparing strings than the comparison operators. Consult Comparing Strings for more information.
Test, Match, and Replace with Regular Expressions
The following methods work with regular expressions:
String.prototype.search(regexp)
(more thoroughly explained in String.prototype.search: At What Index Is There a Match?)- Returns the first index at which
regexp
matches in the receiver (or –1 if it doesn’t):
- > '-yy-xxx-y-'.search(/x+/)
- 4
String.prototype.match(regexp)
(more thoroughly explained in String.prototype.match: Capture Groups or Return All Matching Substrings)- Matches the given regular expression against the receiver. It returns a match object for the first match if the flag
/g
ofregexp
is not set:
- > '-abb--aaab-'.match(/(a+)b/)
- [ 'ab',
- 'a',
- index: 1,
- input: '-abb--aaab-' ]
If the flag /g
is set, then all complete matches (group 0) are returned in an array:
- > '-abb--aaab-'.match(/(a+)b/g)
- [ 'ab', 'aaab' ]
String.prototype.replace(search, replacement)
(more thoroughly explained in String.prototype.replace: Search and Replace)- Searches for
search
and replaces it withreplacement
.search
can be a string or a regular expression, andreplacement
can be a string or a function. Unless you use a regular expression assearch
whose flag/g
is set, only the first occurrence will be replaced:
- > 'iixxxixx'.replace('i', 'o')
- 'oixxxixx'
- > 'iixxxixx'.replace(/i/, 'o')
- 'oixxxixx'
- > 'iixxxixx'.replace(/i/g, 'o')
- 'ooxxxoxx'
A dollar sign ($
) in a replacement string allows you to refer to the complete match or a captured group:
- > 'iixxxixx'.replace(/i+/g, '($&)') // complete match
- '(ii)xxx(i)xx'
- > 'iixxxixx'.replace(/(i+)/g, '($1)') // group 1
- '(ii)xxx(i)xx'
You can also compute a replacement via a function:
- > function repl(all) { return '('+all.toUpperCase()+')' }
- > 'axbbyyxaa'.replace(/a+|b+/g, repl)
- '(A)x(BB)yyx(AA)'
[16] Strictly speaking, a JavaScript string consists of a sequence of UTF-16 code units. That is, JavaScript characters are Unicode code units (see Chapter 24).