Chapter 4: Generators - Breaking Run-to-Completion - 《You Don't Know JS: Async & Performance（1st edition）》

Breaking Run-to-Completion
- Input and Output
  - Iteration Messaging
    - Tale of Two Questions
- Multiple Iterators
  - Interleaving

Breaking Run-to-Completion

In Chapter 1, we explained an expectation that JS developers almost universally rely on in their code: once a function starts executing, it runs until it completes, and no other code can interrupt and run in between.

As bizarre as it may seem, ES6 introduces a new type of function that does not behave with the run-to-completion behavior. This new type of function is called a “generator.”

To understand the implications, let’s consider this example:

var x = 1;
function foo() {
    x++;
    bar();                // <-- what about this line?
    console.log( "x:", x );
}
function bar() {
    x++;
}
foo();                    // x: 3

In this example, we know for sure that bar() runs in between x++ and console.log(x). But what if bar() wasn’t there? Obviously, the result would be 2 instead of 3.

Now let’s twist your brain. What if bar() wasn’t present, but it could still somehow run between the x++ and console.log(x) statements? How would that be possible?

In preemptive multithreaded languages, it would essentially be possible for bar() to “interrupt” and run at exactly the right moment between those two statements. But JS is not preemptive, nor is it (currently) multithreaded. And yet, a cooperative form of this “interruption” (concurrency) is possible, if foo() itself could somehow indicate a “pause” at that part in the code.

Note: I use the word “cooperative” not only because of the connection to classical concurrency terminology (see Chapter 1), but because as you’ll see in the next snippet, the ES6 syntax for indicating a pause point in code is yield — suggesting a politely cooperative yielding of control.

Here’s the ES6 code to accomplish such cooperative concurrency:

var x = 1;
function *foo() {
    x++;
    yield; // pause!
    console.log( "x:", x );
}
function bar() {
    x++;
}

Note: You will likely see most other JS documentation/code that will format a generator declaration as function* foo() { .. } instead of as I’ve done here with function *foo() { .. } — the only difference being the stylistic positioning of the *. The two forms are functionally/syntactically identical, as is a third function*foo() { .. } (no space) form. There are arguments for both styles, but I basically prefer function *foo.. because it then matches when I reference a generator in writing with *foo(). If I said only foo(), you wouldn’t know as clearly if I was talking about a generator or a regular function. It’s purely a stylistic preference.

Now, how can we run the code in that previous snippet such that bar() executes at the point of the yield inside of *foo()?

// construct an iterator `it` to control the generator
var it = foo();
// start `foo()` here!
it.next();
x;                        // 2
bar();
x;                        // 3
it.next();                // x: 3

OK, there’s quite a bit of new and potentially confusing stuff in those two code snippets, so we’ve got plenty to wade through. But before we explain the different mechanics/syntax with ES6 generators, let’s walk through the behavior flow:

The it = foo() operation does not execute the *foo() generator yet, but it merely constructs an iterator that will control its execution. More on iterators in a bit.
The first it.next() starts the *foo() generator, and runs the x++ on the first line of *foo().
*foo() pauses at the yield statement, at which point that first it.next() call finishes. At the moment, *foo() is still running and active, but it’s in a paused state.
We inspect the value of x, and it’s now 2.
We call bar(), which increments x again with x++.
We inspect the value of x again, and it’s now 3.
The final it.next() call resumes the *foo() generator from where it was paused, and runs the console.log(..) statement, which uses the current value of x of 3.

Clearly, *foo() started, but did not run-to-completion — it paused at the yield. We resumed *foo() later, and let it finish, but that wasn’t even required.

So, a generator is a special kind of function that can start and stop one or more times, and doesn’t necessarily ever have to finish. While it won’t be terribly obvious yet why that’s so powerful, as we go throughout the rest of this chapter, that will be one of the fundamental building blocks we use to construct generators-as-async-flow-control as a pattern for our code.

Input and Output

A generator function is a special function with the new processing model we just alluded to. But it’s still a function, which means it still has some basic tenets that haven’t changed — namely, that it still accepts arguments (aka “input”), and that it can still return a value (aka “output”):

function *foo(x,y) {
    return x * y;
}
var it = foo( 6, 7 );
var res = it.next();
res.value;        // 42

We pass in the arguments 6 and 7 to *foo(..) as the parameters x and y, respectively. And *foo(..) returns the value 42 back to the calling code.

We now see a difference with how the generator is invoked compared to a normal function. foo(6,7) obviously looks familiar. But subtly, the *foo(..) generator hasn’t actually run yet as it would have with a function.

Instead, we’re just creating an iterator object, which we assign to the variable it, to control the *foo(..) generator. Then we call it.next(), which instructs the *foo(..) generator to advance from its current location, stopping either at the next yield or end of the generator.

The result of that next(..) call is an object with a value property on it holding whatever value (if anything) was returned from *foo(..). In other words, yield caused a value to be sent out from the generator during the middle of its execution, kind of like an intermediate return.

Again, it won’t be obvious yet why we need this whole indirect iterator object to control the generator. We’ll get there, I promise.

Iteration Messaging

In addition to generators accepting arguments and having return values, there’s even more powerful and compelling input/output messaging capability built into them, via yield and next(..).

Consider:

function *foo(x) {
    var y = x * (yield);
    return y;
}
var it = foo( 6 );
// start `foo(..)`
it.next();
var res = it.next( 7 );
res.value;        // 42

First, we pass in 6 as the parameter x. Then we call it.next(), and it starts up *foo(..).

Inside *foo(..), the var y = x .. statement starts to be processed, but then it runs across a yield expression. At that point, it pauses *foo(..) (in the middle of the assignment statement!), and essentially requests the calling code to provide a result value for the yield expression. Next, we call it.next( 7 ), which is passing the 7 value back in to be that result of the paused yield expression.

So, at this point, the assignment statement is essentially var y = 6 * 7. Now, return y returns that 42 value back as the result of the it.next( 7 ) call.

Notice something very important but also easily confusing, even to seasoned JS developers: depending on your perspective, there’s a mismatch between the yield and the next(..) call. In general, you’re going to have one more next(..) call than you have yield statements — the preceding snippet has one yield and two next(..) calls.

Why the mismatch?

Because the first next(..) always starts a generator, and runs to the first yield. But it’s the second next(..) call that fulfills the first paused yield expression, and the third next(..) would fulfill the second yield, and so on.

Tale of Two Questions

Actually, which code you’re thinking about primarily will affect whether there’s a perceived mismatch or not.

Consider only the generator code:

var y = x * (yield);
return y;

This first yield is basically asking a question: “What value should I insert here?”

Who’s going to answer that question? Well, the first next() has already run to get the generator up to this point, so obviously it can’t answer the question. So, the second next(..) call must answer the question posed by the first yield.

See the mismatch — second-to-first?

But let’s flip our perspective. Let’s look at it not from the generator’s point of view, but from the iterator’s point of view.

To properly illustrate this perspective, we also need to explain that messages can go in both directions — yield .. as an expression can send out messages in response to next(..) calls, and next(..) can send values to a paused yield expression. Consider this slightly adjusted code:

function *foo(x) {
    var y = x * (yield "Hello");    // <-- yield a value!
    return y;
}
var it = foo( 6 );
var res = it.next();    // first `next()`, don't pass anything
res.value;                // "Hello"
res = it.next( 7 );        // pass `7` to waiting `yield`
res.value;                // 42

yield .. and next(..) pair together as a two-way message passing system during the execution of the generator.

So, looking only at the iterator code:

var res = it.next();    // first `next()`, don't pass anything
res.value;                // "Hello"
res = it.next( 7 );        // pass `7` to waiting `yield`
res.value;                // 42

Note: We don’t pass a value to the first next() call, and that’s on purpose. Only a paused yield could accept such a value passed by a next(..), and at the beginning of the generator when we call the first next(), there is no paused yield to accept such a value. The specification and all compliant browsers just silently discard anything passed to the first next(). It’s still a bad idea to pass a value, as you’re just creating silently “failing” code that’s confusing. So, always start a generator with an argument-free next().

The first next() call (with nothing passed to it) is basically asking a question: “What next value does the *foo(..) generator have to give me?” And who answers this question? The first yield "hello" expression.

See? No mismatch there.

Depending on who you think about asking the question, there is either a mismatch between the yield and next(..) calls, or not.

But wait! There’s still an extra next() compared to the number of yield statements. So, that final it.next(7) call is again asking the question about what next value the generator will produce. But there’s no more yield statements left to answer, is there? So who answers?

The return statement answers the question!

And if there is no return in your generator — return is certainly not any more required in generators than in regular functions — there’s always an assumed/implicit return; (aka return undefined;), which serves the purpose of default answering the question posed by the final it.next(7) call.

These questions and answers — the two-way message passing with yield and next(..) — are quite powerful, but it’s not obvious at all how these mechanisms are connected to async flow control. We’re getting there!

Multiple Iterators

It may appear from the syntactic usage that when you use an iterator to control a generator, you’re controlling the declared generator function itself. But there’s a subtlety that’s easy to miss: each time you construct an iterator, you are implicitly constructing an instance of the generator which that iterator will control.

You can have multiple instances of the same generator running at the same time, and they can even interact:

function *foo() {
    var x = yield 2;
    z++;
    var y = yield (x * z);
    console.log( x, y, z );
}
var z = 1;
var it1 = foo();
var it2 = foo();
var val1 = it1.next().value;            // 2 <-- yield 2
var val2 = it2.next().value;            // 2 <-- yield 2
val1 = it1.next( val2 * 10 ).value;        // 40  <-- x:20,  z:2
val2 = it2.next( val1 * 5 ).value;        // 600 <-- x:200, z:3
it1.next( val2 / 2 );                    // y:300
                                        // 20 300 3
it2.next( val1 / 4 );                    // y:10
                                        // 200 10 3

Warning: The most common usage of multiple instances of the same generator running concurrently is not such interactions, but when the generator is producing its own values without input, perhaps from some independently connected resource. We’ll talk more about value production in the next section.

Let’s briefly walk through the processing:

Both instances of *foo() are started at the same time, and both next() calls reveal a value of 2 from the yield 2 statements, respectively.
val2 * 10 is 2 * 10, which is sent into the first generator instance it1, so that x gets value 20. z is incremented from 1 to 2, and then 20 * 2 is yielded out, setting val1 to 40.
val1 * 5 is 40 * 5, which is sent into the second generator instance it2, so that x gets value 200. z is incremented again, from 2 to 3, and then 200 * 3 is yielded out, setting val2 to 600.
val2 / 2 is 600 / 2, which is sent into the first generator instance it1, so that y gets value 300, then printing out 20 300 3 for its x y z values, respectively.
val1 / 4 is 40 / 4, which is sent into the second generator instance it2, so that y gets value 10, then printing out 200 10 3 for its x y z values, respectively.

That’s a “fun” example to run through in your mind. Did you keep it straight?

Interleaving

Recall this scenario from the “Run-to-completion” section of Chapter 1:

var a = 1;
var b = 2;
function foo() {
    a++;
    b = b * a;
    a = b + 3;
}
function bar() {
    b--;
    a = 8 + b;
    b = a * 2;
}

With normal JS functions, of course either foo() can run completely first, or bar() can run completely first, but foo() cannot interleave its individual statements with bar(). So, there are only two possible outcomes to the preceding program.

However, with generators, clearly interleaving (even in the middle of statements!) is possible:

var a = 1;
var b = 2;
function *foo() {
    a++;
    yield;
    b = b * a;
    a = (yield b) + 3;
}
function *bar() {
    b--;
    yield;
    a = (yield 8) + b;
    b = a * (yield 2);
}

Depending on what respective order the iterators controlling *foo() and *bar() are called, the preceding program could produce several different results. In other words, we can actually illustrate (in a sort of fake-ish way) the theoretical “threaded race conditions” circumstances discussed in Chapter 1, by interleaving the two generator iterations over the same shared variables.

First, let’s make a helper called step(..) that controls an iterator:

function step(gen) {
    var it = gen();
    var last;
    return function() {
        // whatever is `yield`ed out, just
        // send it right back in the next time!
        last = it.next( last ).value;
    };
}

step(..) initializes a generator to create its it iterator, then returns a function which, when called, advances the iterator by one step. Additionally, the previously yielded out value is sent right back in at the next step. So, yield 8 will just become 8 and yield b will just be b (whatever it was at the time of yield).

Now, just for fun, let’s experiment to see the effects of interleaving these different chunks of *foo() and *bar(). We’ll start with the boring base case, making sure *foo() totally finishes before *bar() (just like we did in Chapter 1):

// make sure to reset `a` and `b`
a = 1;
b = 2;
var s1 = step( foo );
var s2 = step( bar );
// run `*foo()` completely first
s1();
s1();
s1();
// now run `*bar()`
s2();
s2();
s2();
s2();
console.log( a, b );    // 11 22

The end result is 11 and 22, just as it was in the Chapter 1 version. Now let’s mix up the interleaving ordering and see how it changes the final values of a and b:

// make sure to reset `a` and `b`
a = 1;
b = 2;
var s1 = step( foo );
var s2 = step( bar );
s2();        // b--;
s2();        // yield 8
s1();        // a++;
s2();        // a = 8 + b;
            // yield 2
s1();        // b = b * a;
            // yield b
s1();        // a = b + 3;
s2();        // b = a * 2;

Before I tell you the results, can you figure out what a and b are after the preceding program? No cheating!

console.log( a, b );    // 12 18

Note: As an exercise for the reader, try to see how many other combinations of results you can get back rearranging the order of the s1() and s2() calls. Don’t forget you’ll always need three s1() calls and four s2() calls. Recall the discussion earlier about matching next() with yield for the reasons why.

You almost certainly won’t want to intentionally create this level of interleaving confusion, as it creates incredibly difficult to understand code. But the exercise is interesting and instructive to understand more about how multiple generators can run concurrently in the same shared scope, because there will be places where this capability is quite useful.

We’ll discuss generator concurrency in more detail at the end of this chapter.