- 18. Strings
- 18.1. Plain string literals
- 18.2. Accessing characters and code points
- 18.3. String concatenation via +
- 18.4. Converting to string
- 18.5. Comparing strings
- 18.6. Atoms of text: JavaScript characters, code points, grapheme clusters
- 18.7. Quick reference: Strings
- 18.7.1. Converting to string
- 18.7.2. Numeric values of characters and code points
- 18.7.3. String operators
- 18.7.4. String.prototype: finding and matching
- 18.7.5. String.prototype: extracting
- 18.7.6. String.prototype: combining
- 18.7.7. String.prototype: transforming
- 18.7.8. String.prototype: chars, char codes, code points
- 18.7.9. Sources
18. Strings
Strings are primitive values in JavaScript and immutable. That is, string-related operations always produce new strings and never change existing strings.
18.1. Plain string literals
Plain string literals are delimited by either single quotes or double quotes:
Single quotes are used more often, because it makes it easier to mention HTML with its double quotes.
The next chapter covers template literals, which give you:
- String interpolation
- Multiple lines
- Raw string literals (backslash has no special meaning)
18.1.1. Escaping
The backslash lets you create special characters:
- Unix line break:
'\n'
- Windows line break:
'\r\n'
- Tab:
'\t'
- Backslash:
'\'
The backslash also lets you use the delimiter of a string literal inside that literal:
18.2. Accessing characters and code points
18.2.1. Accessing JavaScript characters
JavaScript has no extra data type for characters – characters are always transported as strings.
18.2.2. Accessing Unicode code points via for-of and spreading
Iterating over strings via for-of
or spreading (…
) visits Unicode code points. Each code point is represented by 1–2 JavaScript characters. For more information, see the section on the atoms of text in this chapter.
This is how you iterate over the code points of a string via for-of
:
And this is how you convert a string into an Array of code points via spreading:
18.3. String concatenation via +
If at least one operand is a string, the plus operator (+
) converts any non-strings to strings and concatenates the result:
The assignment operator +=
is useful if you want to assemble a string, piece by piece:
As an aside, this way of assembling strings is quite efficient, because most JavaScript engines internally optimize it.
18.4. Converting to string
These are three ways of converting a value x
to a string:
String(x)
''+x
x.toString()
(does not work forundefined
andnull
)
Recommendation: use the descriptive and safeString()
.
Examples:
Pitfall for booleans: If you convert a boolean to a string via String()
, you can’t convert it back via Boolean()
.
18.4.1. Stringifying objects
Plain objects have a default representation that is not very useful:
Arrays have a better string representation, but it still hides much information:
Stringifying functions returns their source code:
18.4.2. Customizing the stringification of objects
You can override the built-in way of stringifying objects by implementing the method toString()
:
18.4.3. An alternate way of stringifying values
The JSON data format is a text representation of JavaScript values. Therefore, JSON.stringify()
can also be used to stringify data:
The caveat is that JSON only supports null
, booleans, numbers, strings, Arrays and objects (which it always treats as if they were created by object literals).
Tip: The third parameter lets you switch on multi-line output and specify how much to indent. For example:
This statement produces the following output.
{
"first": "Jane",
"last": "Doe"
}
18.5. Comparing strings
Strings can be compared via the following operators:
< <= > >=
There is one important caveat to consider: These operators compare based on the numeric values of JavaScript characters. That means that the order that JavaScript uses for strings is different from the one used in dictionaries and phone books:
Properly comparing text is beyond the scope of this book. It is supported via the ECMAScript Internationalization API (Intl
).
18.6. Atoms of text: JavaScript characters, code points, grapheme clusters
Quick recap of the chapter on Unicode:
- Code points: Unicode characters, with a range of 21 bits.
- UTF-16 code units: JavaScript characters, with a range of 16 bits. Code points are encoded as 1–2 UTF-16 code units.
- Grapheme clusters: Graphemes are written symbols, as displayed on screen or paper. A grapheme cluster is a sequence of 1 or more code points that encodes a grapheme.
To represent code points in JavaScript strings, one or two JavaScript characters are used. You can see that when counting characters via.length
:
18.6.1. Working with code points
Let’s explore JavaScript’s tools for working with code points.
Code point escapes let you specify code points hexadecimally. They expand to one or two JavaScript characters.
Converting from code points:
Converting to code points:
Iteration honors code points. For example, the iteration-based for-of
loop:
Or iteration-based spreading (…
):
Spreading is therefore a good tool for counting code points:
18.6.2. Working with code units
Indices and lengths of strings are based on JavaScript characters (i.e., code units).
To specify code units numerically, you can use code unit escapes:
And you can use so-called char codes:
To get the char code of a character, use .charCodeAt()
:
18.6.3. Caveat: grapheme clusters
When working with text that may be written in any human language, it’s best to split at the boundaries of grapheme clusters, not at the boundaries of code units.
TC39 is working on Intl.Segmenter
, a proposal for the ECMAScript Internationalization API to support Unicode segmentation (along grapheme cluster boundaries, word boundaries, sentence boundaries, etc.).
Until that proposal becomes a standard, you can use one of several libraries that are available (do a web search for “JavaScript grapheme”).
18.7. Quick reference: Strings
Strings are immutable, none of the string methods ever modify their strings.
18.7.1. Converting to string
Tbl. 16 describes how various values are converted to strings.
x | String(x) |
---|---|
undefined | 'undefined' |
null | 'null' |
Boolean value | false → 'false' , true → 'true' |
Number value | Example: 123 → '123' |
String value | x (input, unchanged) |
An object | Configurable via, e.g., toString() |
18.7.2. Numeric values of characters and code points
- Char codes: Unicode UTF-16 code units as numbers
String.fromCharCode()
,String.prototype.charCodeAt()
- Precision: 16 bits, unsigned
- Code points: Unicode code points as numbers
String.fromCodePoint()
,String.prototype.codePointAt()
- Precision: 21 bits, unsigned (17 planes, 16 bits each)
18.7.3. String operators
18.7.4. String.prototype: finding and matching
.endsWith(searchString: string, endPos=this.length): boolean
[ES6]
Returns true
if the string would end with searchString
if its length were endPos
. Returns false
, otherwise.
.includes(searchString: string, startPos=0): boolean
[ES6]
Returns true
if the string contains the searchString
and false
, otherwise. The search starts at startPos
.
.indexOf(searchString: string, minIndex=0): number
[ES1]
Returns the lowest index at which searchString
appears within the string, or -1
, otherwise. Any returned index will be minIndex
or higher.
.lastIndexOf(searchString: string, maxIndex=Infinity): number
[ES1]
Returns the highest index at which searchString
appears within the string, or -1
, otherwise. Any returned index will be maxIndex
or lower.
.match(regExp: string | RegExp): RegExpMatchArray | null
[ES3]
If regExp
is a regular expression with flag /g
not set, then .match()
returns the first match for regExp
within the string. Or null
if there is no match. If regExp
is a string, it is used to create a regular expression before performing the previous steps.
The result has the following type:
Numbered capture groups become Array indices. Named capture groups (ES2018) become properties of .groups
. In this mode, .match()
works like RegExp.prototype.exec()
.
Examples:
.match(regExp: RegExp): string[] | null
[ES3]
If flag /g
of regExp
is set, .match()
returns either an Array with all matches or null
if there was no match.
.search(regExp: string | RegExp): number
[ES3]
Returns the index at which regExp
occurs within the string. If regExp
is a string, it is used to create a regular expression.
.startsWith(searchString: string, startPos=0): boolean
[ES6]
Returns true
if searchString
occurs in the string at index startPos
. Returns false
, otherwise.
18.7.5. String.prototype: extracting
.slice(start=0, end=this.length): string
[ES3]
Returns the substring of the string that starts at (including) index start
and ends at (excluding) index end
. You can use negative indices where -1
means this.length-1
(etc.).
.split(separator: string | RegExp, limit?: number): string[]
[ES3]
Splits the string into an Array of substrings – the strings that occur between the separators. The separator can either be a string or a regular expression. Captures made by groups in the regular expression are included in the result.
.substring(start: number, end=this.length): string
[ES1]
Use .slice()
instead of this method. .substring()
wasn’t implemented consistently in older engines and doesn’t support negative indices.
18.7.6. String.prototype: combining
.concat(…strings: string[]): string
[ES3]
Returns the concatenation of the string and strings
. 'a'+'b'
is equivalent to 'a'.concat('b')
and more concise.
.padEnd(len: number, fillString=' '): string
[ES2017]
Appends fillString
to the string until it has the desired length len
.
.padStart(len: number, fillString=' '): string
[ES2017]
Prepends fillString
to the string until it has the desired length len
.
.repeat(count=0): string
[ES6]
Returns a string that is the string, repeated count
times.
18.7.7. String.prototype: transforming
.normalize(form: 'NFC'|'NFD'|'NFKC'|'NFKD' = 'NFC'): string
[ES6]
Normalizes the string according to the Unicode Normalization Forms.
.replace(searchValue: string | RegExp, replaceValue: string): string
[ES3]
Replace matches of searchValue
with replaceValue
. If searchValue
is a string, only the first verbatim occurrence is replaced. If searchValue
is a regular expression without flag /g
, only the first match is replaced. If searchValue
is a regular expression with /g
then all matches are replaced.
Special characters in replaceValue
are:
$$
: becomes$
$n
: becomes the capture of numbered groupn
(alas,$0
does not work)$&
: becomes the complete match$`
: becomes everything before the match$'
: becomes everything after the match
Examples:
Named capture groups (ES2018) are supported, too:
$<name>
becomes the capture of named groupname
Example:
.replace(searchValue: string | RegExp, replacer: (…args: any[]) => string): string
[ES3]
If the second parameter is a function occurrences are replaced with the strings it returns. Its parameters args
are:
matched: string
: the complete matchg1: string|undefined
: the capture of numbered group 1g2: string|undefined
: the capture of numbered group 2- (Etc.)
offset: number
: where was the match found in the input string?input: string
: the whole input string
Named capture groups (ES2018) are supported, too. If there are any, a last parameter contains an object whose properties contain the captures:
.toUpperCase(): string
[ES1]
Returns a copy of the string in which all lowercase alphabetic characters are converted to uppercase. How well that works for various alphabets depends on the JavaScript engine.
.toLowerCase(): string
[ES1]
Returns a copy of the string in which all uppercase alphabetic characters are converted to lowercase. How well that works for various alphabets depends on the JavaScript engine.
.trim(): string
[ES5]
Returns a copy of the string in which all leading and trailing whitespace (spaces, tabs, line terminators, etc.) is gone.
.trimEnd(): string
[ES2019]
Similar to .trim()
, but only the end of the string is trimmed:
.trimStart(): string
[ES2019]
Similar to .trim()
, but only the beginning of the string is trimmed:
18.7.8. String.prototype: chars, char codes, code points
.charAt(pos: number): string
[ES1]
Returns the character at index pos
, as a string (JavaScript does not have a datatype for characters). str[i]
is equivalent to str.charAt(i)
and more concise (caveat: may not work on old engines).
.charCodeAt(pos: number): number
[ES1]
Returns the 16-bit number (0–65535) of the UTF-16 code unit (character) at index pos
.
.codePointAt(pos: number): number | undefined
[ES6]
Returns the 21-bit number of the Unicode code point of the 1–2 characters at index pos
. If there is no such index, it returns undefined
.