Token Collection

When Esprima parser is performing the syntactical analysis, first it needs to break down the source into a series of tokens. By default, the tokens are not stored as part of the parsing result. It is possible to keep the tokens found during the parsing by setting the tokens flag in the configuration object to true. Take a look at this example:

  1. $ node
  2. > var esprima = require('esprima')
  3. > esprima.parseScript('const answer = 42', { tokens: true })
  4. Script {
  5. type: 'Program',
  6. body:
  7. [ VariableDeclaration {
  8. type: 'VariableDeclaration',
  9. declarations: [Object],
  10. kind: 'const' } ],
  11. sourceType: 'script',
  12. tokens:
  13. [ { type: 'Keyword', value: 'const' },
  14. { type: 'Identifier', value: 'answer' },
  15. { type: 'Punctuator', value: '=' },
  16. { type: 'Numeric', value: '42' } ] }

The output of the parser now contains an additional property, an array named tokens. Every element in this array is the token found during the parsing process. For each token, the type property is a string indicating the type of the token and the value property stores the corresponding the lexeme, i.e. a string of characters which forms a syntactic unit.

The token also contains its location, if the parsing configuration has the flag range or loc (or both), as shown in the following example:

  1. $ node
  2. > var esprima = require('esprima')
  3. > var output = esprima.parseScript('const answer = 42', { tokens: true, range: true })
  4. > output.tokens
  5. [ { type: 'Keyword', value: 'const', range: [ 0, 5 ] },
  6. { type: 'Identifier', value: 'answer', range: [ 6, 12 ] },
  7. { type: 'Punctuator', value: '=', range: [ 13, 14 ] },
  8. { type: 'Numeric', value: '42', range: [ 15, 17 ] } ]

To tokenize a program without parsing it at all, refer to Chapter 3. Lexical Analysis (Tokenization).