buy the book to support the author.

Chapter 12. Strings

Strings are immutable sequences of JavaScript characters. Each such character is a 16-bit UTF-16 code unit. That means that a single Unicode character is represented by either one or two JavaScript characters. You mainly need to worry about the two-character case whenever you are counting characters or splitting strings (see Chapter 24).

String Literals

Both single and double quotes can be used to delimit string literals:

  1. 'He said: "Hello"'
  2. "He said: \"Hello\""
  3.  
  4. 'Everyone\'s a winner'
  5. "Everyone's a winner"

Thus, you are free to use either kind of quote. There are several considerations, though:

  • The most common style in the community is to use double quotes for HTML and single quotes for JavaScript.
  • On the other hand, double quotes are used exclusively for strings in some languages (e.g., C and Java). Therefore, it may make sense to use them in a multilanguage code base.
  • For JSON (discussed in Chapter 22), you must use double quotes.

Your code will look cleaner if you quote consistently. But sometimes, a different quote means that you don’t have to escape, which can justify your being less consistent (e.g., you may normally use single quotes, but temporarily switch to double quotes to write the last one of the preceding examples).

Escaping in String Literals

Most characters in string literals simply represent themselves. The backslash is used for escaping and enables several special features:

  • Line continuations
  • You can spread a string over multiple lines by escaping the end of the line (the line-terminating character, the line terminator) with a backslash:
  1. var str = 'written \
  2. over \
  3. multiple \
  4. lines';
  5. console.log(str === 'written over multiple lines'); // true

An alternative is to use the plus operator to concatenate:

  1. var str = 'written ' +
  2. 'over ' +
  3. 'multiple ' +
  4. 'lines';
  • Character escape sequences
  • These sequences start with a backslash:
  • Control characters: \b is a backspace, \f is a form feed, \n is a line feed (newline), \r is a carriage return, \t is a horizontal tab, and \v is a vertical tab.
  • Escaped characters that represent themselves: \' is a single quote, \" is a double quote, and \ is a backslash. All characters except b f n r t v x u and decimal digits represent themselves, too. Here are two examples:
  1. > '\"'
  2. '"'
  3. > '\q'
  4. 'q'
  • NUL character (Unicode code point 0)
  • This character is represented by \0.
  • Hexadecimal escape sequences
  • \xHH (HH are two hexadecimal digits) specifies a character via an ASCII code. For example:
  1. > '\x4D'
  2. 'M'
  • Unicode escape sequences
  • \uHHHH (HHHH are four hexadecimal digits) specifies a UTF-16 code unit (see Chapter 24). Here are two examples:
  1. > '\u004D'
  2. 'M'
  3. > '\u03C0'
  4. 'π'

Character Access

There are two operations that return the _n_th character of a string.[16] Note that JavaScript does not have a special data type for characters; these operations return strings:

  1. > 'abc'.charAt(1)
  2. 'b'
  3. > 'abc'[1]
  4. 'b'

Some older browsers don’t support the array-like access to characters via square brackets.

Converting to String

Values are converted to a string as follows:

Value Result
undefined 'undefined'
null 'null'
A boolean false'false'
true'true'
A number The number as a string (e.g., 3.141'3.141')
A string Same as input (nothing to convert)
An object Call ToPrimitive(value, String) (see Algorithm: ToPrimitive()—Converting a Value to a Primitive) and convert the resulting primitive.

Manually Converting to String

The three most common ways to convert any value to a string are:

String(value) (Invoked as a function, not as a constructor)
''+value
value.toString() (Does not work for undefined and null!)

I prefer String(), because it is more descriptive. Here are some examples:

  1. > String(false)
  2. 'false'
  3. > String(7.35)
  4. '7.35'
  5. > String({ first: 'John', last: 'Doe' })
  6. '[object Object]'
  7. > String([ 'a', 'b', 'c' ])
  8. 'a,b,c'

Note that for displaying data, JSON.stringify() (JSON.stringify(value, replacer?, space?)) often works better than the canonical conversion to string:

  1. > console.log(JSON.stringify({ first: 'John', last: 'Doe' }))
  2. {"first":"John","last":"Doe"}
  3. > console.log(JSON.stringify([ 'a', 'b', 'c' ]))
  4. ["a","b","c"]

Naturally, you have to be aware of the limitations of JSON.stringify()—it doesn’t always show everything. For example, it hides properties whose values it can’t handle (functions and more!). On the plus side, its output can be parsed by eval() and it can display deeply nested data as nicely formatted trees.

Pitfall: conversion is not invertible

Given how often JavaScript automatically converts, it is a shame that the conversion isn’t always invertible, especially with regard to booleans:

  1. > String(false)
  2. 'false'
  3. > Boolean('false')
  4. true

For undefined and null, we face similar problems.

Comparing Strings

There are two ways of comparing strings. First, you can use the comparison operators: <, >, ===, <=, >=. They have the following drawbacks:

  • They’re case-sensitive:
  1. > 'B' > 'A' // ok
  2. true
  3. > 'B' > 'a' // should be true
  4. false
  • They don’t handle umlauts and accents well:
  1. > 'ä' < 'b' // should be true
  2. false
  3. > 'é' < 'f' // should be true
  4. false

Second, you can use String.prototype.localeCompare(other), which tends to fare better, but isn’t always supported (consult Search and Compare for details).The following is an interaction in Firefox’s console:

  1. > 'B'.localeCompare('A')
  2. 2
  3. > 'B'.localeCompare('a')
  4. 2
  5.  
  6. > 'ä'.localeCompare('b')
  7. -2
  8. > 'é'.localeCompare('f')
  9. -2

A result less than zero means that the receiver is “smaller” than the argument. A result greater than zero means that the receiver is “larger” than the argument.

Concatenating Strings

There are two main approaches for concatenating strings.

Concatenation: The Plus (+) Operator

The operator + does string concatenation as soon as one of its operands is a string. If you want to collect string pieces in a variable, the compound assignment operator += is useful:

  1. > var str = '';
  2. > str += 'Say hello ';
  3. > str += 7;
  4. > str += ' times fast!';
  5. > str
  6. 'Say hello 7 times fast!'

Concatenation: Joining an Array of String Fragments

It may seem that the previous approach creates a new string whenever a piece is added to str. Older JavaScript engines do it that way, which means that you can improve the performance of string concatenation by collecting all the pieces in an array first and joining them as a last step:

  1. > var arr = [];
  2.  
  3. > arr.push('Say hello ');
  4. > arr.push(7);
  5. > arr.push(' times fast');
  6.  
  7. > arr.join('')
  8. 'Say hello 7 times fast'

However, newer engines optimize string concatenation via + and use a similar method internally. Therefore, the plus operator is faster on those engines.

The Function String

The function String can be invoked in two ways:

  • String(value)
  • As a normal function, it converts value to a primitive string (see Converting to String):
  1. > String(123)
  2. '123'
  3. > typeof String('abc') // no change
  4. 'string'
  • new String(str)
  • As a constructor, it creates a new instance of String (see Wrapper Objects for Primitives), an object that wraps str (nonstrings are coerced to string). For example:
  1. > typeof new String('abc')
  2. 'object'

The former invocation is the common one.

String Constructor Method

String.fromCharCode(codeUnit1, codeUnit2, …)produces a string whose characters are the UTF-16 code units specified by the 16-bit unsigned integers codeUnit1, codeUnit2, and so on. For example:

  1. > String.fromCharCode(97, 98, 99)
  2. 'abc'

If you want to turn an array of numbers into a string, you can do so via apply() (see func.apply(thisValue, argArray)):

  1. > String.fromCharCode.apply(null, [97, 98, 99])
  2. 'abc'

The inverse of String.fromCharCode() is String.prototype.charCodeAt().

String Instance Property length

The length property indicates the number of JavaScript characters in the string and is immutable:

  1. > 'abc'.length
  2. 3

String Prototype Methods

All methods of primitive strings are stored in String.prototype (refer back to Primitives Borrow Their Methods from Wrappers). Next, I describe how they work for primitive strings, not for instances of String.

Extract Substrings

The following methods extract substrings from the receiver:

  • String.prototype.charAt(pos)
  • Returns a string with the character at position pos. For example:
  1. > 'abc'.charAt(1)
  2. 'b'

The following two expressions return the same result, but some older JavaScript engines support only charAt() for accessing characters:

  1. str.charAt(n)
  2. str[n]
  • String.prototype.charCodeAt(pos)
  • Returns the code (a 16-bit unsigned integer) of the JavaScript character (a UTF-16 code unit; see Chapter 24) at position pos.

This is how you create an array of character codes:

  1. > 'abc'.split('').map(function (x) { return x.charCodeAt(0) })
  2. [ 97, 98, 99 ]

The inverse of charCodeAt() is String.fromCharCode().

  • String.prototype.slice(start, end?)
  • Returns the substring starting at position start up to and excluding position end. Both of the two parameters can be negative, and then the length of the string is added to them:
  1. > 'abc'.slice(2)
  2. 'c'
  3. > 'abc'.slice(1, 2)
  4. 'b'
  5. > 'abc'.slice(-2)
  6. 'bc'
  • String.prototype.substring(start, end?)
  • Should be avoided in favor of slice(), which is similar, but can handle negative positions and is implemented more consistently across browsers.
  • String.prototype.split(separator?, limit?)
  • Extracts the substrings of the receiver that are delimited by separator and returns them in an array. The method has two parameters:
  • separator: Either a string or a regular expression. If missing, the complete string is returned, wrapped in an array.
  • limit: If given, the returned array contains at most limit elements.

Here are some examples:

  1. > 'a, b,c, d'.split(',') // string
  2. [ 'a', ' b', 'c', ' d' ]
  3. > 'a, b,c, d'.split(/,/) // simple regular expression
  4. [ 'a', ' b', 'c', ' d' ]
  5. > 'a, b,c, d'.split(/, */) // more complex regular expression
  6. [ 'a', 'b', 'c', 'd' ]
  7. > 'a, b,c, d'.split(/, */, 2) // setting a limit
  8. [ 'a', 'b' ]
  9. > 'test'.split() // no separator provided
  10. [ 'test' ]

If there is a group, then the matches are also returned as array elements:

  1. > 'a, b , '.split(/(,)/)
  2. [ 'a', ',', ' b ', ',', ' ' ]
  3. > 'a, b , '.split(/ *(,) */)
  4. [ 'a', ',', 'b', ',', '' ]

Use '' (empty string) as a separator to produce an array with the characters of a string:

  1. > 'abc'.split('')
  2. [ 'a', 'b', 'c' ]

Transform

While the previous section was about extracting substrings, this section is about transforming a given string into a new one. These methods are typically used as follows:

  1. var str = str.trim();

In other words, the original string is discarded after it has been (nondestructively) transformed:

  • String.prototype.trim()
  • Removes all whitespace from the beginning and the end of the string:
  1. > '\r\nabc \t'.trim()
  2. 'abc'
  • String.prototype.concat(str1?, str2?, …)
  • Returns the concatenation of the receiver and str1, str2, etc.:
  1. > 'hello'.concat(' ', 'world', '!')
  2. 'hello world!'
  • String.prototype.toLowerCase()
  • Creates a new string with all of the original string’s characters converted to lowercase:
  1. > 'MJÖLNIR'.toLowerCase()
  2. 'mjölnir'
  • String.prototype.toLocaleLowerCase()
  • Works the same as toLowerCase(), but respects the rules of the current locale. According to the ECMAScript spec: “There will only be a difference in the few cases (such as Turkish) where the rules for that language conflict with the regular Unicode case mappings.”
  • String.prototype.toUpperCase()
  • Creates a new string with all of the original string’s characters converted to uppercase:
  1. > 'mjölnir'.toUpperCase()
  2. 'MJÖLNIR'
  • String.prototype.toLocaleUpperCase()
  • Works the same as toUpperCase(), but respects the rules of the current locale.

Search and Compare

The following methods are used for searching and comparing strings:

  • String.prototype.indexOf(searchString, position?)
  • Searches for searchString starting at position (the default is 0). It returns the position where searchString has been found or –1 if it can’t be found:
  1. > 'aXaX'.indexOf('X')
  2. 1
  3. > 'aXaX'.indexOf('X', 2)
  4. 3

Note that when it comes to finding text inside a string, a regular expression works just as well. For example, the following two expressions are equivalent:

  1. str.indexOf('abc') >= 0
  2. /abc/.test(str)
  • String.prototype.lastIndexOf(searchString, position?)
  • Searches for searchString, starting at position (the default is the end), backward. It returns the position where searchString has been found or –1 if it can’t be found:
  1. > 'aXaX'.lastIndexOf('X')
  2. 3
  3. > 'aXaX'.lastIndexOf('X', 2)
  4. 1
  • String.prototype.localeCompare(other)
  • Performs a locale-sensitive comparison of the string with other. It returns a number:
  • < 0 if the string comes before other
  • = 0 if the string is equivalent to other
  • > 0 if the string comes after other

For example:

  1. > 'apple'.localeCompare('banana')
  2. -2
  3. > 'apple'.localeCompare('apple')
  4. 0

Warning

Not all JavaScript engines implement this method properly. Some just base it on the comparison operators. However, the ECMAScript Internationalization API (see The ECMAScript Internationalization API) does provide a Unicode-aware implementation. That is, if that API is available in an engine, localeCompare() will work.

If it is supported, localeCompare() is a better choice for comparing strings than the comparison operators. Consult Comparing Strings for more information.

Test, Match, and Replace with Regular Expressions

The following methods work with regular expressions:

  1. > '-yy-xxx-y-'.search(/x+/)
  2. 4
  1. > '-abb--aaab-'.match(/(a+)b/)
  2. [ 'ab',
  3. 'a',
  4. index: 1,
  5. input: '-abb--aaab-' ]

If the flag /g is set, then all complete matches (group 0) are returned in an array:

  1. > '-abb--aaab-'.match(/(a+)b/g)
  2. [ 'ab', 'aaab' ]
  • String.prototype.replace(search, replacement) (more thoroughly explained in String.prototype.replace: Search and Replace)
  • Searches for search and replaces it with replacement. search can be a string or a regular expression, and replacement can be a string or a function. Unless you use a regular expression as search whose flag /g is set, only the first occurrence will be replaced:
  1. > 'iixxxixx'.replace('i', 'o')
  2. 'oixxxixx'
  3. > 'iixxxixx'.replace(/i/, 'o')
  4. 'oixxxixx'
  5. > 'iixxxixx'.replace(/i/g, 'o')
  6. 'ooxxxoxx'

A dollar sign ($) in a replacement string allows you to refer to the complete match or a captured group:

  1. > 'iixxxixx'.replace(/i+/g, '($&)') // complete match
  2. '(ii)xxx(i)xx'
  3. > 'iixxxixx'.replace(/(i+)/g, '($1)') // group 1
  4. '(ii)xxx(i)xx'

You can also compute a replacement via a function:

  1. > function repl(all) { return '('+all.toUpperCase()+')' }
  2. > 'axbbyyxaa'.replace(/a+|b+/g, repl)
  3. '(A)x(BB)yyx(AA)'


[16] Strictly speaking, a JavaScript string consists of a sequence of UTF-16 code units. That is, JavaScript characters are Unicode code units (see Chapter 24).