Chapter 3. Lexical Analysis (Tokenization)
Esprima tokenizer takes a string as an input and produces an array of tokens, a list of object representing categorized input characters. This is known as lexical analysis.
The interface of the tokenize
function is as follows:
- esprima.tokenize(input, config)
where
input
is a string representing the program to be tokenizedconfig
is an object used to customize the parsing behavior (optional)
The input
argument is mandatory. Its type must be a string, otherwise the tokenization behavior is not determined.
The description of various properties of config
is summarized in the following table:
Name | Type | Default | Description |
---|---|---|---|
range | Boolean | false | Annotate each token with its zero-based start and end location |
loc | Boolean | false | Annotate each token with its column and row-based location |
comment | Boolean | false | Include every line and block comment in the output |
An example Node.js REPL session that demonstrates the use of Esprima tokenizer is:
- $ node
- > var esprima = require('esprima')
- > esprima.tokenize('answer = 42')
- [ { type: 'Identifier', value: 'answer' },
- { type: 'Punctuator', value: '=' },
- { type: 'Numeric', value: '42' } ]
In the above example, the input string is tokenized into 3 tokens: an identifier, a punctuator, and a number. For each token, the type
property is a string indicating the type of the token and the value
property stores the corresponding the lexeme, i.e. a string of characters which forms a syntactic unit.
Unlike the parse
function, the tokenize
function can work with an input string that does not represent a valid JavaScript program. This is because lexical analysis, as the name implies, does not involve the process of understanding the syntactic structure of the input.
- $ node
- > var esprima = require('esprima')
- > esprima.tokenize('42 = answer')
- [ { type: 'Numeric', value: '42' },
- { type: 'Punctuator', value: '=' },
- { type: 'Identifier', value: 'answer' } ]
- > esprima.tokenize('while (if {}')
- [ { type: 'Keyword', value: 'while' },
- { type: 'Punctuator', value: '(' },
- { type: 'Keyword', value: 'if' },
- { type: 'Punctuator', value: '{' },
- { type: 'Punctuator', value: '}' } ]