IV. Primitive values - 18. Strings - 《JavaScript for impatient programmers (beta)》

18. Strings

Please support this book: buy it or donate

18. Strings

Strings are primitive values in JavaScript and immutable. That is, string-related operations always produce new strings and never change existing strings.

18.1. Plain string literals

Plain string literals are delimited by either single quotes or double quotes:

const str1 = 'abc';
const str2 = "abc";
assert.equal(str1, str2);

Single quotes are used more often, because it makes it easier to mention HTML with its double quotes.

The next chapter covers template literals, which give you:

String interpolation
Multiple lines
Raw string literals (backslash has no special meaning)

18.1.1. Escaping

The backslash lets you create special characters:

Unix line break: '\n'
Windows line break: '\r\n'
Tab: '\t'
Backslash: '\'
The backslash also lets you use the delimiter of a string literal inside that literal:

'She said: "Let\'s go!"'
"She said: \"Let's go!\""

18.2. Accessing characters and code points

18.2.1. Accessing JavaScript characters

JavaScript has no extra data type for characters – characters are always transported as strings.

const str = 'abc';
// Reading a character at a given index
assert.equal(str[1], 'b');
// Counting the characters in a string:
assert.equal(str.length, 3);

18.2.2. Accessing Unicode code points via for-of and spreading

Iterating over strings via for-of or spreading (…) visits Unicode code points. Each code point is represented by 1–2 JavaScript characters. For more information, see the section on the atoms of text in this chapter.

This is how you iterate over the code points of a string via for-of:

for (const ch of 'abc') {
  console.log(ch);
}
// Output:
// 'a'
// 'b'
// 'c'

And this is how you convert a string into an Array of code points via spreading:

assert.deepEqual([...'abc'], ['a', 'b', 'c']);

18.3. String concatenation via +

If at least one operand is a string, the plus operator (+) converts any non-strings to strings and concatenates the result:

assert.equal(3 + ' times ' + 4, '3 times 4');

The assignment operator += is useful if you want to assemble a string, piece by piece:

let str = ''; // must be `let`!
str += 'Say it';
str += ' one more';
str += ' time';
assert.equal(str, 'Say it one more time');

As an aside, this way of assembling strings is quite efficient, because most JavaScript engines internally optimize it.

18.4. Converting to string

These are three ways of converting a value x to a string:

String(x)
''+x
x.toString() (does not work for undefined and null)
Recommendation: use the descriptive and safe String().

Examples:

assert.equal(String(undefined), 'undefined');
assert.equal(String(null), 'null');
assert.equal(String(false), 'false');
assert.equal(String(true), 'true');
assert.equal(String(123.45), '123.45');

Pitfall for booleans: If you convert a boolean to a string via String(), you can’t convert it back via Boolean().

> String(false)
'false'
> Boolean('false')
true

18.4.1. Stringifying objects

Plain objects have a default representation that is not very useful:

> String({a: 1})
'[object Object]'

Arrays have a better string representation, but it still hides much information:

> String(['a', 'b'])
'a,b'
> String(['a', ['b']])
'a,b'
> String([1, 2])
'1,2'
> String(['1', '2'])
'1,2'
> String([true])
'true'
> String(['true'])
'true'
> String(true)
'true'

Stringifying functions returns their source code:

> String(function f() {return 4})
'function f() {return 4}'

18.4.2. Customizing the stringification of objects

You can override the built-in way of stringifying objects by implementing the method toString():

const obj = {
  toString() {
    return 'hello';
  }
};
assert.equal(String(obj), 'hello');

18.4.3. An alternate way of stringifying values

The JSON data format is a text representation of JavaScript values. Therefore, JSON.stringify() can also be used to stringify data:

> JSON.stringify({a: 1})
'{"a":1}'
> JSON.stringify(['a', ['b']])
'["a",["b"]]'

The caveat is that JSON only supports null, booleans, numbers, strings, Arrays and objects (which it always treats as if they were created by object literals).

Tip: The third parameter lets you switch on multi-line output and specify how much to indent. For example:

console.log(JSON.stringify({first: 'Jane', last: 'Doe'}, null, 2));

This statement produces the following output.

{
  "first": "Jane",
  "last": "Doe"
}

18.5. Comparing strings

Strings can be compared via the following operators:

< <= > >=

There is one important caveat to consider: These operators compare based on the numeric values of JavaScript characters. That means that the order that JavaScript uses for strings is different from the one used in dictionaries and phone books:

> 'A' < 'B' // ok
true
> 'a' < 'B' // not ok
false
> 'ä' < 'b' // not ok
false

Properly comparing text is beyond the scope of this book. It is supported via the ECMAScript Internationalization API (Intl).

18.6. Atoms of text: JavaScript characters, code points, grapheme clusters

Quick recap of the chapter on Unicode:

Code points: Unicode characters, with a range of 21 bits.
UTF-16 code units: JavaScript characters, with a range of 16 bits. Code points are encoded as 1–2 UTF-16 code units.
Grapheme clusters: Graphemes are written symbols, as displayed on screen or paper. A grapheme cluster is a sequence of 1 or more code points that encodes a grapheme.
To represent code points in JavaScript strings, one or two JavaScript characters are used. You can see that when counting characters via .length:

// 3 Unicode code points, 3 JavaScript characters:
assert.equal('abc'.length, 3);
// 1 Unicode code point, 2 JavaScript characters:
assert.equal('?'.length, 2);

18.6.1. Working with code points

Let’s explore JavaScript’s tools for working with code points.

Code point escapes let you specify code points hexadecimally. They expand to one or two JavaScript characters.

> '\u{1F642}'
'?'

Converting from code points:

> String.fromCodePoint(0x1F642)
'?'

Converting to code points:

> '?'.codePointAt(0).toString(16)
'1f642'

Iteration honors code points. For example, the iteration-based for-of loop:

const str = '?a';
assert.equal(str.length, 3);
for (const codePoint of str) {
  console.log(codePoint);
}
// Output:
// '?'
// 'a'

Or iteration-based spreading (…):

> [...'?a']
[ '?', 'a' ]

Spreading is therefore a good tool for counting code points:

> [...'?a'].length
2
> '?a'.length
3

18.6.2. Working with code units

Indices and lengths of strings are based on JavaScript characters (i.e., code units).

To specify code units numerically, you can use code unit escapes:

> '\uD83D\uDE42'
'?'

And you can use so-called char codes:

> String.fromCharCode(0xD83D) + String.fromCharCode(0xDE42)
'?'

To get the char code of a character, use .charCodeAt():

> '?'.charCodeAt(0).toString(16)
'd83d'

18.6.3. Caveat: grapheme clusters

When working with text that may be written in any human language, it’s best to split at the boundaries of grapheme clusters, not at the boundaries of code units.

TC39 is working on Intl.Segmenter, a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).

Until that proposal becomes a standard, you can use one of several libraries that are available (do a web search for “JavaScript grapheme”).

18.7. Quick reference: Strings

Strings are immutable, none of the string methods ever modify their strings.

18.7.1. Converting to string

Tbl. 16 describes how various values are converted to strings.

Table 16: Converting values to strings.
`x`	`String(x)`
`undefined`	`'undefined'`
`null`	`'null'`
Boolean value	`false` `→` `'false'`, `true` `→` `'true'`
Number value	Example: `123` `→` `'123'`
String value	`x` (input, unchanged)
An object	Configurable via, e.g., `toString()`

18.7.2. Numeric values of characters and code points

Char codes: Unicode UTF-16 code units as numbers
- String.fromCharCode(), String.prototype.charCodeAt()
- Precision: 16 bits, unsigned
Code points: Unicode code points as numbers
- String.fromCodePoint(), String.prototype.codePointAt()
- Precision: 21 bits, unsigned (17 planes, 16 bits each)

18.7.3. String operators

// Access characters via []
const str = 'abc';
assert.equal(str[1], 'b');
// Concatenate strings via +
assert.equal('a' + 'b' + 'c', 'abc');
assert.equal('take ' + 3 + ' oranges', 'take 3 oranges');

18.7.4. String.prototype: finding and matching

.endsWith(searchString: string, endPos=this.length): boolean [ES6]

Returns true if the string would end with searchString if its length were endPos. Returns false, otherwise.

> 'foo.txt'.endsWith('.txt')
true
> 'abcde'.endsWith('cd', 4)
true

.includes(searchString: string, startPos=0): boolean [ES6]

Returns true if the string contains the searchString and false, otherwise. The search starts at startPos.

> 'abc'.includes('b')
true
> 'abc'.includes('b', 2)
false

.indexOf(searchString: string, minIndex=0): number [ES1]

Returns the lowest index at which searchString appears within the string, or -1, otherwise. Any returned index will be minIndex or higher.

> 'abab'.indexOf('a')
0
> 'abab'.indexOf('a', 1)
2
> 'abab'.indexOf('c')
-1

.lastIndexOf(searchString: string, maxIndex=Infinity): number [ES1]

Returns the highest index at which searchString appears within the string, or -1, otherwise. Any returned index will be maxIndex or lower.

> 'abab'.lastIndexOf('ab', 2)
2
> 'abab'.lastIndexOf('ab', 1)
0
> 'abab'.lastIndexOf('ab')
2

.match(regExp: string | RegExp): RegExpMatchArray | null [ES3]

If regExp is a regular expression with flag /g not set, then .match() returns the first match for regExp within the string. Or null if there is no match. If regExp is a string, it is used to create a regular expression before performing the previous steps.

The result has the following type:

interface RegExpMatchArray extends Array<string> {
  index: number;
  input: string;
  groups: undefined | {
    [key: string]: string
  };
}

Numbered capture groups become Array indices. Named capture groups (ES2018) become properties of .groups. In this mode, .match() works like RegExp.prototype.exec().

Examples:

> 'ababb'.match(/a(b+)/)
{ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: undefined }
> 'ababb'.match(/a(?<foo>b+)/)
{ 0: 'ab', 1: 'b', index: 0, input: 'ababb', groups: { foo: 'b' } }
> 'abab'.match(/x/)
null

.match(regExp: RegExp): string[] | null [ES3]

If flag /g of regExp is set, .match() returns either an Array with all matches or null if there was no match.

> 'ababb'.match(/a(b+)/g)
[ 'ab', 'abb' ]
> 'ababb'.match(/a(?<foo>b+)/g)
[ 'ab', 'abb' ]
> 'abab'.match(/x/g)
null

.search(regExp: string | RegExp): number [ES3]

Returns the index at which regExp occurs within the string. If regExp is a string, it is used to create a regular expression.

> 'a2b'.search(/[0-9]/)
1
> 'a2b'.search('[0-9]')
1

.startsWith(searchString: string, startPos=0): boolean [ES6]

Returns true if searchString occurs in the string at index startPos. Returns false, otherwise.

> '.gitignore'.startsWith('.')
true
> 'abcde'.startsWith('bc', 1)
true

18.7.5. String.prototype: extracting

.slice(start=0, end=this.length): string [ES3]

Returns the substring of the string that starts at (including) index start and ends at (excluding) index end. You can use negative indices where -1 means this.length-1 (etc.).

> 'abc'.slice(1, 3)
'bc'
> 'abc'.slice(1)
'bc'
> 'abc'.slice(-2)
'bc'

.split(separator: string | RegExp, limit?: number): string[] [ES3]

Splits the string into an Array of substrings – the strings that occur between the separators. The separator can either be a string or a regular expression. Captures made by groups in the regular expression are included in the result.

> 'abc'.split('')
[ 'a', 'b', 'c' ]
> 'a | b | c'.split('|')
[ 'a ', ' b ', ' c' ]
> 'a : b : c'.split(/ *: */)
[ 'a', 'b', 'c' ]
> 'a : b : c'.split(/( *):( *)/)
[ 'a', ' ', ' ', 'b', ' ', ' ', 'c' ]

.substring(start: number, end=this.length): string [ES1]

Use .slice() instead of this method. .substring() wasn’t implemented consistently in older engines and doesn’t support negative indices.

18.7.6. String.prototype: combining

.concat(…strings: string[]): string [ES3]

Returns the concatenation of the string and strings. 'a'+'b' is equivalent to 'a'.concat('b') and more concise.

> 'ab'.concat('cd', 'ef', 'gh')
'abcdefgh'

.padEnd(len: number, fillString=' '): string [ES2017]

Appends fillString to the string until it has the desired length len.

> '#'.padEnd(2)
'# '
> 'abc'.padEnd(2)
'abc'
> '#'.padEnd(5, 'abc')
'#abca'

.padStart(len: number, fillString=' '): string [ES2017]

Prepends fillString to the string until it has the desired length len.

> '#'.padStart(2)
' #'
> 'abc'.padStart(2)
'abc'
> '#'.padStart(5, 'abc')
'abca#'

.repeat(count=0): string [ES6]

Returns a string that is the string, repeated count times.

> '*'.repeat()
''
> '*'.repeat(3)
'***'

18.7.7. String.prototype: transforming

.normalize(form: 'NFC'|'NFD'|'NFKC'|'NFKD' = 'NFC'): string [ES6]

Normalizes the string according to the Unicode Normalization Forms.

.replace(searchValue: string | RegExp, replaceValue: string): string [ES3]

Replace matches of searchValue with replaceValue. If searchValue is a string, only the first verbatim occurrence is replaced. If searchValue is a regular expression without flag /g, only the first match is replaced. If searchValue is a regular expression with /g then all matches are replaced.

> 'x.x.'.replace('.', '#')
'x#x.'
> 'x.x.'.replace(/./, '#')
'#.x.'
> 'x.x.'.replace(/./g, '#')
'####'

Special characters in replaceValue are:

$$: becomes $
$n: becomes the capture of numbered group n (alas, $0 does not work)
$&: becomes the complete match
$`: becomes everything before the match
$': becomes everything after the match
Examples:

> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$2|')
'a |04| b'
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$&|')
'a |2020-04| b'
> 'a 2020-04 b'.replace(/([0-9]{4})-([0-9]{2})/, '|$`|')
'a |a | b'

Named capture groups (ES2018) are supported, too:

$<name> becomes the capture of named group name
Example:

> 'a 2020-04 b'.replace(/(?<year>[0-9]{4})-(?<month>[0-9]{2})/, '|$<month>|')
'a |04| b'

.replace(searchValue: string | RegExp, replacer: (…args: any[]) => string): string [ES3]

If the second parameter is a function occurrences are replaced with the strings it returns. Its parameters args are:

matched: string: the complete match
g1: string|undefined: the capture of numbered group 1
g2: string|undefined: the capture of numbered group 2
(Etc.)
offset: number: where was the match found in the input string?
input: string: the whole input string

const regexp = /([0-9]{4})-([0-9]{2})/;
const replacer = (all, year, month) => '|' + all + '|';
assert.equal(
  'a 2020-04 b'.replace(regexp, replacer),
  'a |2020-04| b');

Named capture groups (ES2018) are supported, too. If there are any, a last parameter contains an object whose properties contain the captures:

const regexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})/;
const replacer = (...args) => {
  const groups=args.pop();
  return '|' + groups.month + '|';
};
assert.equal(
  'a 2020-04 b'.replace(regexp, replacer),
  'a |04| b');

.toUpperCase(): string [ES1]

Returns a copy of the string in which all lowercase alphabetic characters are converted to uppercase. How well that works for various alphabets depends on the JavaScript engine.

> '-a2b-'.toUpperCase()
'-A2B-'
> 'αβγ'.toUpperCase()
'ΑΒΓ'

.toLowerCase(): string [ES1]

Returns a copy of the string in which all uppercase alphabetic characters are converted to lowercase. How well that works for various alphabets depends on the JavaScript engine.

> '-A2B-'.toLowerCase()
'-a2b-'
> 'ΑΒΓ'.toLowerCase()
'αβγ'

.trim(): string [ES5]

Returns a copy of the string in which all leading and trailing whitespace (spaces, tabs, line terminators, etc.) is gone.

> '\r\n#\t  '.trim()
'#'

.trimEnd(): string [ES2019]

Similar to .trim(), but only the end of the string is trimmed:

> '  abc  '.trimEnd()
'  abc'

.trimStart(): string [ES2019]

Similar to .trim(), but only the beginning of the string is trimmed:

> '  abc  '.trimStart()
'abc  '

18.7.8. String.prototype: chars, char codes, code points

.charAt(pos: number): string [ES1]

Returns the character at index pos, as a string (JavaScript does not have a datatype for characters). str[i] is equivalent to str.charAt(i) and more concise (caveat: may not work on old engines).

> 'abc'.charAt(1)
'b'

.charCodeAt(pos: number): number [ES1]

Returns the 16-bit number (0–65535) of the UTF-16 code unit (character) at index pos.

> 'abc'.charCodeAt(1)
98

.codePointAt(pos: number): number | undefined [ES6]

Returns the 21-bit number of the Unicode code point of the 1–2 characters at index pos. If there is no such index, it returns undefined.

18. Strings

18. Strings

18.1. Plain string literals

18.1.1. Escaping

18.2. Accessing characters and code points

18.2.1. Accessing JavaScript characters

18.2.2. Accessing Unicode code points via for-of and spreading

18.3. String concatenation via +

18.4. Converting to string

18.4.1. Stringifying objects

18.4.2. Customizing the stringification of objects

18.4.3. An alternate way of stringifying values

18.5. Comparing strings

18.6. Atoms of text: JavaScript characters, code points, grapheme clusters

18.6.1. Working with code points

18.6.2. Working with code units

18.6.3. Caveat: grapheme clusters

18.7. Quick reference: Strings

18.7.1. Converting to string

18.7.2. Numeric values of characters and code points

18.7.3. String operators

18.7.4. String.prototype: finding and matching

18.7.5. String.prototype: extracting

18.7.6. String.prototype: combining

18.7.7. String.prototype: transforming

18.7.8. String.prototype: chars, char codes, code points

18.7.9. Sources