Basics" level="1">Basics
- ASTs" level="2">ASTs
- Stages of Babel" level="2">Stages of Babel
  - Parse" level="3">Parse
    - Lexical Analysis" level="4">Lexical Analysis
    - Syntactic Analysis" level="4">Syntactic Analysis
  - Transform" level="3">Transform
  - Generate" level="3">Generate
- Traversal" level="2">Traversal
  - Visitors" level="3">Visitors
  - Paths" level="3">Paths
    - Paths in Visitors" level="4">Paths in Visitors
  - State" level="3">State
  - Scopes" level="3">Scopes
    - Bindings" level="4">Bindings

Basics" class="reference-link">Basics

Babel is a JavaScript compiler, specifically a source-to-source compiler, often called a “transpiler”. This means that you give Babel some JavaScript code, Babel modifies the code, and generates the new code back out.

ASTs" class="reference-link">ASTs

Each of these steps involve creating or working with an Abstract Syntax Tree or AST.

Babel uses an AST modified from ESTree, with the core spec located here.

function square(n) {
  return n * n;
}

Check out AST Explorer to get a better sense of the AST nodes. Here is a link to it with the example code above pasted in.

This same program can be represented as a tree like this:

- FunctionDeclaration:
  - id:
    - Identifier:
      - name: square
  - params [1]
    - Identifier
      - name: n
  - body:
    - BlockStatement
      - body [1]
        - ReturnStatement
          - argument
            - BinaryExpression
              - operator: *
              - left
                - Identifier
                  - name: n
              - right
                - Identifier
                  - name: n

Or as a JavaScript Object like this:

{
  type: "FunctionDeclaration",
  id: {
    type: "Identifier",
    name: "square"
  },
  params: [{
    type: "Identifier",
    name: "n"
  }],
  body: {
    type: "BlockStatement",
    body: [{
      type: "ReturnStatement",
      argument: {
        type: "BinaryExpression",
        operator: "*",
        left: {
          type: "Identifier",
          name: "n"
        },
        right: {
          type: "Identifier",
          name: "n"
        }
      }
    }]
  }
}

You’ll notice that each level of the AST has a similar structure:

{
  type: "FunctionDeclaration",
  id: {...},
  params: [...],
  body: {...}
}

{
  type: "Identifier",
  name: ...
}

{
  type: "BinaryExpression",
  operator: ...,
  left: {...},
  right: {...}
}

Note: Some properties have been removed for simplicity.

Each of these are known as a Node. An AST can be made up of a single Node, or hundreds if not thousands of Nodes. Together they are able to describe the syntax of a program that can be used for static analysis.

Every Node has this interface:

interface Node {
  type: string;
}

The type field is a string representing the type of Node the object is (e.g. "FunctionDeclaration", "Identifier", or "BinaryExpression"). Each type of Node defines an additional set of properties that describe that particular node type.

There are additional properties on every Node that Babel generates which describe the position of the Node in the original source code.

{
  type: ...,
  start: 0,
  end: 38,
  loc: {
    start: {
      line: 1,
      column: 0
    },
    end: {
      line: 3,
      column: 1
    }
  },
  ...
}

These properties start, end, loc, appear in every single Node.

Stages of Babel" class="reference-link">Stages of Babel

The three primary stages of Babel are parse, transform, generate.

Parse" class="reference-link">Parse

The parse stage, takes code and outputs an AST. There are two phases of parsing in Babel: Lexical Analysis and Syntactic Analysis.

Lexical Analysis" class="reference-link">Lexical Analysis

Lexical Analysis will take a string of code and turn it into a stream of tokens.

You can think of tokens as a flat array of language syntax pieces.

n * n;

[
  { type: { ... }, value: "n", start: 0, end: 1, loc: { ... } },
  { type: { ... }, value: "*", start: 2, end: 3, loc: { ... } },
  { type: { ... }, value: "n", start: 4, end: 5, loc: { ... } },
  ...
]

Each of the types here have a set of properties describing the token:

{
  type: {
    label: 'name',
    keyword: undefined,
    beforeExpr: false,
    startsExpr: true,
    rightAssociative: false,
    isLoop: false,
    isAssign: false,
    prefix: false,
    postfix: false,
    binop: null,
    updateContext: null
  },
  ...
}

Like AST nodes they also have a start, end, and loc.

Syntactic Analysis" class="reference-link">Syntactic Analysis

Syntactic Analysis will take a stream of tokens and turn it into an AST representation. Using the information in the tokens, this phase will reformat them as an AST which represents the structure of the code in a way that makes it easier to work with.

Transform" class="reference-link">Transform

The transform stage takes an AST and traverses through it, adding, updating, and removing nodes as it goes along. This is by far the most complex part of Babel or any compiler. This is where plugins operate and so it will be the subject of most of this handbook. So we won’t dive too deep right now.

Generate" class="reference-link">Generate

The code generation) stage takes the final AST and turns it back into a string of code, also creating source maps.

Code generation is pretty simple: you traverse through the AST depth-first, building a string that represents the transformed code.

Traversal" class="reference-link">Traversal

When you want to transform an AST you have to traverse the tree recursively.

Say we have the type FunctionDeclaration. It has a few properties: id, params, and body. Each of them have nested nodes.

{
  type: "FunctionDeclaration",
  id: {
    type: "Identifier",
    name: "square"
  },
  params: [{
    type: "Identifier",
    name: "n"
  }],
  body: {
    type: "BlockStatement",
    body: [{
      type: "ReturnStatement",
      argument: {
        type: "BinaryExpression",
        operator: "*",
        left: {
          type: "Identifier",
          name: "n"
        },
        right: {
          type: "Identifier",
          name: "n"
        }
      }
    }]
  }
}

So we start at the FunctionDeclaration and we know its internal properties so we visit each of them and their children in order.

Next we go to id which is an Identifier. Identifiers don’t have any child node properties so we move on.

After that is params which is an array of nodes so we visit each of them. In this case it’s a single node which is also an Identifier so we move on.

Then we hit body which is a BlockStatement with a property body that is an array of Nodes so we go to each of them.

The only item here is a ReturnStatement node which has an argument, we go to the argument and find a BinaryExpression.

The BinaryExpression has an operator, a left, and a right. The operator isn’t a node, just a value, so we don’t go to it, and instead just visit left and right.

This traversal process happens throughout the Babel transform stage.

Visitors" class="reference-link">Visitors

When we talk about “going” to a node, we actually mean we are visiting them. The reason we use that term is because there is this concept of a visitor.

Visitors are a pattern used in AST traversal across languages. Simply put they are an object with methods defined for accepting particular node types in a tree. That’s a bit abstract so let’s look at an example.

const MyVisitor = {
  Identifier() {
    console.log("Called!");
  }
};
// You can also create a visitor and add methods on it later
let visitor = {};
visitor.MemberExpression = function() {};
visitor.FunctionDeclaration = function() {}

Note: Identifier() { ... } is shorthand for Identifier: { enter() { ... } }.

This is a basic visitor that when used during a traversal will call the Identifier() method for every Identifier in the tree.

So with this code the Identifier() method will be called four times with each Identifier (including square).

function square(n) {
  return n * n;
}

path.traverse(MyVisitor);
Called!
Called!
Called!
Called!

These calls are all on node enter. However there is also the possibility of calling a visitor method when on exit.

Imagine we have this tree structure:

- FunctionDeclaration
  - Identifier (id)
  - Identifier (params[0])
  - BlockStatement (body)
    - ReturnStatement (body)
      - BinaryExpression (argument)
        - Identifier (left)
        - Identifier (right)

As we traverse down each branch of the tree we eventually hit dead ends where we need to traverse back up the tree to get to the next node. Going down the tree we enter each node, then going back up we exit each node.

Let’s walk through what this process looks like for the above tree.

Enter FunctionDeclaration
- Enter Identifier (id)
  - Hit dead end
- Exit Identifier (id)
- Enter Identifier (params[0])
  - Hit dead end
- Exit Identifier (params[0])
- Enter BlockStatement (body)
  - Enter ReturnStatement (body)
    - Enter BinaryExpression (argument)
      - Enter Identifier (left)
        
        Hit dead end
      - Exit Identifier (left)
      - Enter Identifier (right)
        
        Hit dead end
      - Exit Identifier (right)
    - Exit BinaryExpression (argument)
  - Exit ReturnStatement (body)
- Exit BlockStatement (body)
Exit FunctionDeclaration

So when creating a visitor you have two opportunities to visit a node.

const MyVisitor = {
  Identifier: {
    enter() {
      console.log("Entered!");
    },
    exit() {
      console.log("Exited!");
    }
  }
};

If necessary, you can also apply the same function for multiple visitor nodes by separating them with a | in the method name as a string like Identifier|MemberExpression.

Example usage in the flow-comments plugin

const MyVisitor = {
  "ExportNamedDeclaration|Flow"(path) {}
};

You can also use aliases as visitor nodes (as defined in babel-types).

For example,

Function is an alias for FunctionDeclaration, FunctionExpression, ArrowFunctionExpression, ObjectMethod and ClassMethod.

const MyVisitor = {
  Function(path) {}
};

Paths" class="reference-link">Paths

An AST generally has many Nodes, but how do Nodes relate to one another? We could have one giant mutable object that you manipulate and have full access to, or we can simplify this with Paths.

A Path is an object representation of the link between two nodes.

For example if we take the following node and its child:

{
  type: "FunctionDeclaration",
  id: {
    type: "Identifier",
    name: "square"
  },
  ...
}

And represent the child Identifier as a path, it looks something like this:

{
  "parent": {
    "type": "FunctionDeclaration",
    "id": {...},
    ....
  },
  "node": {
    "type": "Identifier",
    "name": "square"
  }
}

It also has additional metadata about the path:

{
  "parent": {...},
  "node": {...},
  "hub": {...},
  "contexts": [],
  "data": {},
  "shouldSkip": false,
  "shouldStop": false,
  "removed": false,
  "state": null,
  "opts": null,
  "skipKeys": null,
  "parentPath": null,
  "context": null,
  "container": null,
  "listKey": null,
  "inList": false,
  "parentKey": null,
  "key": null,
  "scope": null,
  "type": null,
  "typeAnnotation": null
}

As well as tons and tons of methods related to adding, updating, moving, and removing nodes, but we’ll get into those later.

In a sense, paths are a reactive representation of a node’s position in the tree and all sorts of information about the node. Whenever you call a method that modifies the tree, this information is updated. Babel manages all of this for you to make working with nodes easy and as stateless as possible.

Paths in Visitors" class="reference-link">Paths in Visitors

When you have a visitor that has a Identifier() method, you’re actually visiting the path instead of the node. This way you are mostly working with the reactive representation of a node instead of the node itself.

const MyVisitor = {
  Identifier(path) {
    console.log("Visiting: " + path.node.name);
  }
};

a + b + c;

path.traverse(MyVisitor);
Visiting: a
Visiting: b
Visiting: c

State" class="reference-link">State

State is the enemy of AST transformation. State will bite you over and over again and your assumptions about state will almost always be proven wrong by some syntax that you didn’t consider.

Take the following code:

function square(n) {
  return n * n;
}

Let’s write a quick hacky visitor that will rename n to x.

let paramName;
const MyVisitor = {
  FunctionDeclaration(path) {
    const param = path.node.params[0];
    paramName = param.name;
    param.name = "x";
  },
  Identifier(path) {
    if (path.node.name === paramName) {
      path.node.name = "x";
    }
  }
};

This might work for the above code, but we can easily break that by doing this:

function square(n) {
  return n * n;
}
n;

The better way to deal with this is recursion. So let’s make like a Christopher Nolan film and put a visitor inside of a visitor.

const updateParamNameVisitor = {
  Identifier(path) {
    if (path.node.name === this.paramName) {
      path.node.name = "x";
    }
  }
};
const MyVisitor = {
  FunctionDeclaration(path) {
    const param = path.node.params[0];
    const paramName = param.name;
    param.name = "x";
    path.traverse(updateParamNameVisitor, { paramName });
  }
};
path.traverse(MyVisitor);

Of course, this is a contrived example but it demonstrates how to eliminate global state from your visitors.

Scopes" class="reference-link">Scopes

Next let’s introduce the concept of a scope). JavaScript has lexical scoping#Lexical_scoping_vs._dynamic_scoping), which is a tree structure where blocks create new scope.

// global scope
function scopeOne() {
  // scope 1
  function scopeTwo() {
    // scope 2
  }
}

Whenever you create a reference in JavaScript, whether that be by a variable, function, class, param, import, label, etc., it belongs to the current scope.

var global = "I am in the global scope";
function scopeOne() {
  var one = "I am in the scope created by `scopeOne()`";
  function scopeTwo() {
    var two = "I am in the scope created by `scopeTwo()`";
  }
}

Code within a deeper scope may use a reference from a higher scope.

function scopeOne() {
  var one = "I am in the scope created by `scopeOne()`";
  function scopeTwo() {
    one = "I am updating the reference in `scopeOne` inside `scopeTwo`";
  }
}

A lower scope might also create a reference of the same name without modifying it.

function scopeOne() {
  var one = "I am in the scope created by `scopeOne()`";
  function scopeTwo() {
    var one = "I am creating a new `one` but leaving reference in `scopeOne()` alone.";
  }
}

When writing a transform, we want to be wary of scope. We need to make sure we don’t break existing code while modifying different parts of it.

We may want to add new references and make sure they don’t collide with existing ones. Or maybe we just want to find where a variable is referenced. We want to be able to track these references within a given scope.

A scope can be represented as:

{
  path: path,
  block: path.node,
  parentBlock: path.parent,
  parent: parentScope,
  bindings: [...]
}

When you create a new scope you do so by giving it a path and a parent scope. Then during the traversal process it collects all the references (“bindings”) within that scope.

Once that’s done, there’s all sorts of methods you can use on scopes. We’ll get into those later though.

Bindings" class="reference-link">Bindings

References all belong to a particular scope; this relationship is known as a binding.

function scopeOnce() {
  var ref = "This is a binding";
  ref; // This is a reference to a binding
  function scopeTwo() {
    ref; // This is a reference to a binding from a lower scope
  }
}

A single binding looks like this:

{
  identifier: node,
  scope: scope,
  path: path,
  kind: 'var',
  referenced: true,
  references: 3,
  referencePaths: [path, path, path],
  constant: false,
  constantViolations: [path]
}

With this information you can find all the references to a binding, see what type of binding it is (parameter, declaration, etc.), lookup what scope it belongs to, or get a copy of its identifier. You can even tell if it’s constant and if not, see what paths are causing it to be non-constant.

Being able to tell if a binding is constant is useful for many purposes, the largest of which is minification.

function scopeOne() {
  var ref1 = "This is a constant binding";
  becauseNothingEverChangesTheValueOf(ref1);
  function scopeTwo() {
    var ref2 = "This is *not* a constant binding";
    ref2 = "Because this changes the value";
  }
}