Chapter 3. Lexical Analysis (Tokenization)

Chapter 3. Lexical Analysis (Tokenization)

Esprima tokenizer takes a string as an input and produces an array of tokens, a list of object representing categorized input characters. This is known as lexical analysis.

The interface of the tokenize function is as follows:

esprima.tokenize(input, config)

where

input is a string representing the program to be tokenized
config is an object used to customize the parsing behavior (optional)

The input argument is mandatory. Its type must be a string, otherwise the tokenization behavior is not determined.

The description of various properties of config is summarized in the following table:

Name	Type	Default	Description
range	Boolean	false	Annotate each token with its zero-based start and end location
loc	Boolean	false	Annotate each token with its column and row-based location
comment	Boolean	false	Include every line and block comment in the output

An example Node.js REPL session that demonstrates the use of Esprima tokenizer is:

$ node
> var esprima = require('esprima')
> esprima.tokenize('answer = 42')
[ { type: 'Identifier', value: 'answer' },
  { type: 'Punctuator', value: '=' },
  { type: 'Numeric', value: '42' } ]

In the above example, the input string is tokenized into 3 tokens: an identifier, a punctuator, and a number. For each token, the type property is a string indicating the type of the token and the value property stores the corresponding the lexeme, i.e. a string of characters which forms a syntactic unit.

Unlike the parse function, the tokenize function can work with an input string that does not represent a valid JavaScript program. This is because lexical analysis, as the name implies, does not involve the process of understanding the syntactic structure of the input.

$ node
> var esprima = require('esprima')
> esprima.tokenize('42 = answer')
[ { type: 'Numeric', value: '42' },
  { type: 'Punctuator', value: '=' },
  { type: 'Identifier', value: 'answer' } ]
> esprima.tokenize('while (if {}')
[ { type: 'Keyword', value: 'while' },
  { type: 'Punctuator', value: '(' },
  { type: 'Keyword', value: 'if' },
  { type: 'Punctuator', value: '{' },
  { type: 'Punctuator', value: '}' } ]