Breaking Run-to-Completion
In Chapter 1, we explained an expectation that JS developers almost universally rely on in their code: once a function starts executing, it runs until it completes, and no other code can interrupt and run in between.
As bizarre as it may seem, ES6 introduces a new type of function that does not behave with the run-to-completion behavior. This new type of function is called a “generator.”
To understand the implications, let’s consider this example:
var x = 1;
function foo() {
x++;
bar(); // <-- what about this line?
console.log( "x:", x );
}
function bar() {
x++;
}
foo(); // x: 3
In this example, we know for sure that bar()
runs in between x++
and console.log(x)
. But what if bar()
wasn’t there? Obviously, the result would be 2
instead of 3
.
Now let’s twist your brain. What if bar()
wasn’t present, but it could still somehow run between the x++
and console.log(x)
statements? How would that be possible?
In preemptive multithreaded languages, it would essentially be possible for bar()
to “interrupt” and run at exactly the right moment between those two statements. But JS is not preemptive, nor is it (currently) multithreaded. And yet, a cooperative form of this “interruption” (concurrency) is possible, if foo()
itself could somehow indicate a “pause” at that part in the code.
Note: I use the word “cooperative” not only because of the connection to classical concurrency terminology (see Chapter 1), but because as you’ll see in the next snippet, the ES6 syntax for indicating a pause point in code is yield
— suggesting a politely cooperative yielding of control.
Here’s the ES6 code to accomplish such cooperative concurrency:
var x = 1;
function *foo() {
x++;
yield; // pause!
console.log( "x:", x );
}
function bar() {
x++;
}
Note: You will likely see most other JS documentation/code that will format a generator declaration as function* foo() { .. }
instead of as I’ve done here with function *foo() { .. }
— the only difference being the stylistic positioning of the *
. The two forms are functionally/syntactically identical, as is a third function*foo() { .. }
(no space) form. There are arguments for both styles, but I basically prefer function *foo..
because it then matches when I reference a generator in writing with *foo()
. If I said only foo()
, you wouldn’t know as clearly if I was talking about a generator or a regular function. It’s purely a stylistic preference.
Now, how can we run the code in that previous snippet such that bar()
executes at the point of the yield
inside of *foo()
?
// construct an iterator `it` to control the generator
var it = foo();
// start `foo()` here!
it.next();
x; // 2
bar();
x; // 3
it.next(); // x: 3
OK, there’s quite a bit of new and potentially confusing stuff in those two code snippets, so we’ve got plenty to wade through. But before we explain the different mechanics/syntax with ES6 generators, let’s walk through the behavior flow:
- The
it = foo()
operation does not execute the*foo()
generator yet, but it merely constructs an iterator that will control its execution. More on iterators in a bit. - The first
it.next()
starts the*foo()
generator, and runs thex++
on the first line of*foo()
. *foo()
pauses at theyield
statement, at which point that firstit.next()
call finishes. At the moment,*foo()
is still running and active, but it’s in a paused state.- We inspect the value of
x
, and it’s now2
. - We call
bar()
, which incrementsx
again withx++
. - We inspect the value of
x
again, and it’s now3
. - The final
it.next()
call resumes the*foo()
generator from where it was paused, and runs theconsole.log(..)
statement, which uses the current value ofx
of3
.
Clearly, *foo()
started, but did not run-to-completion — it paused at the yield
. We resumed *foo()
later, and let it finish, but that wasn’t even required.
So, a generator is a special kind of function that can start and stop one or more times, and doesn’t necessarily ever have to finish. While it won’t be terribly obvious yet why that’s so powerful, as we go throughout the rest of this chapter, that will be one of the fundamental building blocks we use to construct generators-as-async-flow-control as a pattern for our code.
Input and Output
A generator function is a special function with the new processing model we just alluded to. But it’s still a function, which means it still has some basic tenets that haven’t changed — namely, that it still accepts arguments (aka “input”), and that it can still return a value (aka “output”):
function *foo(x,y) {
return x * y;
}
var it = foo( 6, 7 );
var res = it.next();
res.value; // 42
We pass in the arguments 6
and 7
to *foo(..)
as the parameters x
and y
, respectively. And *foo(..)
returns the value 42
back to the calling code.
We now see a difference with how the generator is invoked compared to a normal function. foo(6,7)
obviously looks familiar. But subtly, the *foo(..)
generator hasn’t actually run yet as it would have with a function.
Instead, we’re just creating an iterator object, which we assign to the variable it
, to control the *foo(..)
generator. Then we call it.next()
, which instructs the *foo(..)
generator to advance from its current location, stopping either at the next yield
or end of the generator.
The result of that next(..)
call is an object with a value
property on it holding whatever value (if anything) was returned from *foo(..)
. In other words, yield
caused a value to be sent out from the generator during the middle of its execution, kind of like an intermediate return
.
Again, it won’t be obvious yet why we need this whole indirect iterator object to control the generator. We’ll get there, I promise.
Iteration Messaging
In addition to generators accepting arguments and having return values, there’s even more powerful and compelling input/output messaging capability built into them, via yield
and next(..)
.
Consider:
function *foo(x) {
var y = x * (yield);
return y;
}
var it = foo( 6 );
// start `foo(..)`
it.next();
var res = it.next( 7 );
res.value; // 42
First, we pass in 6
as the parameter x
. Then we call it.next()
, and it starts up *foo(..)
.
Inside *foo(..)
, the var y = x ..
statement starts to be processed, but then it runs across a yield
expression. At that point, it pauses *foo(..)
(in the middle of the assignment statement!), and essentially requests the calling code to provide a result value for the yield
expression. Next, we call it.next( 7 )
, which is passing the 7
value back in to be that result of the paused yield
expression.
So, at this point, the assignment statement is essentially var y = 6 * 7
. Now, return y
returns that 42
value back as the result of the it.next( 7 )
call.
Notice something very important but also easily confusing, even to seasoned JS developers: depending on your perspective, there’s a mismatch between the yield
and the next(..)
call. In general, you’re going to have one more next(..)
call than you have yield
statements — the preceding snippet has one yield
and two next(..)
calls.
Why the mismatch?
Because the first next(..)
always starts a generator, and runs to the first yield
. But it’s the second next(..)
call that fulfills the first paused yield
expression, and the third next(..)
would fulfill the second yield
, and so on.
Tale of Two Questions
Actually, which code you’re thinking about primarily will affect whether there’s a perceived mismatch or not.
Consider only the generator code:
var y = x * (yield);
return y;
This first yield
is basically asking a question: “What value should I insert here?”
Who’s going to answer that question? Well, the first next()
has already run to get the generator up to this point, so obviously it can’t answer the question. So, the second next(..)
call must answer the question posed by the first yield
.
See the mismatch — second-to-first?
But let’s flip our perspective. Let’s look at it not from the generator’s point of view, but from the iterator’s point of view.
To properly illustrate this perspective, we also need to explain that messages can go in both directions — yield ..
as an expression can send out messages in response to next(..)
calls, and next(..)
can send values to a paused yield
expression. Consider this slightly adjusted code:
function *foo(x) {
var y = x * (yield "Hello"); // <-- yield a value!
return y;
}
var it = foo( 6 );
var res = it.next(); // first `next()`, don't pass anything
res.value; // "Hello"
res = it.next( 7 ); // pass `7` to waiting `yield`
res.value; // 42
yield ..
and next(..)
pair together as a two-way message passing system during the execution of the generator.
So, looking only at the iterator code:
var res = it.next(); // first `next()`, don't pass anything
res.value; // "Hello"
res = it.next( 7 ); // pass `7` to waiting `yield`
res.value; // 42
Note: We don’t pass a value to the first next()
call, and that’s on purpose. Only a paused yield
could accept such a value passed by a next(..)
, and at the beginning of the generator when we call the first next()
, there is no paused yield
to accept such a value. The specification and all compliant browsers just silently discard anything passed to the first next()
. It’s still a bad idea to pass a value, as you’re just creating silently “failing” code that’s confusing. So, always start a generator with an argument-free next()
.
The first next()
call (with nothing passed to it) is basically asking a question: “What next value does the *foo(..)
generator have to give me?” And who answers this question? The first yield "hello"
expression.
See? No mismatch there.
Depending on who you think about asking the question, there is either a mismatch between the yield
and next(..)
calls, or not.
But wait! There’s still an extra next()
compared to the number of yield
statements. So, that final it.next(7)
call is again asking the question about what next value the generator will produce. But there’s no more yield
statements left to answer, is there? So who answers?
The return
statement answers the question!
And if there is no return
in your generator — return
is certainly not any more required in generators than in regular functions — there’s always an assumed/implicit return;
(aka return undefined;
), which serves the purpose of default answering the question posed by the final it.next(7)
call.
These questions and answers — the two-way message passing with yield
and next(..)
— are quite powerful, but it’s not obvious at all how these mechanisms are connected to async flow control. We’re getting there!
Multiple Iterators
It may appear from the syntactic usage that when you use an iterator to control a generator, you’re controlling the declared generator function itself. But there’s a subtlety that’s easy to miss: each time you construct an iterator, you are implicitly constructing an instance of the generator which that iterator will control.
You can have multiple instances of the same generator running at the same time, and they can even interact:
function *foo() {
var x = yield 2;
z++;
var y = yield (x * z);
console.log( x, y, z );
}
var z = 1;
var it1 = foo();
var it2 = foo();
var val1 = it1.next().value; // 2 <-- yield 2
var val2 = it2.next().value; // 2 <-- yield 2
val1 = it1.next( val2 * 10 ).value; // 40 <-- x:20, z:2
val2 = it2.next( val1 * 5 ).value; // 600 <-- x:200, z:3
it1.next( val2 / 2 ); // y:300
// 20 300 3
it2.next( val1 / 4 ); // y:10
// 200 10 3
Warning: The most common usage of multiple instances of the same generator running concurrently is not such interactions, but when the generator is producing its own values without input, perhaps from some independently connected resource. We’ll talk more about value production in the next section.
Let’s briefly walk through the processing:
- Both instances of
*foo()
are started at the same time, and bothnext()
calls reveal avalue
of2
from theyield 2
statements, respectively. val2 * 10
is2 * 10
, which is sent into the first generator instanceit1
, so thatx
gets value20
.z
is incremented from1
to2
, and then20 * 2
isyield
ed out, settingval1
to40
.val1 * 5
is40 * 5
, which is sent into the second generator instanceit2
, so thatx
gets value200
.z
is incremented again, from2
to3
, and then200 * 3
isyield
ed out, settingval2
to600
.val2 / 2
is600 / 2
, which is sent into the first generator instanceit1
, so thaty
gets value300
, then printing out20 300 3
for itsx y z
values, respectively.val1 / 4
is40 / 4
, which is sent into the second generator instanceit2
, so thaty
gets value10
, then printing out200 10 3
for itsx y z
values, respectively.
That’s a “fun” example to run through in your mind. Did you keep it straight?
Interleaving
Recall this scenario from the “Run-to-completion” section of Chapter 1:
var a = 1;
var b = 2;
function foo() {
a++;
b = b * a;
a = b + 3;
}
function bar() {
b--;
a = 8 + b;
b = a * 2;
}
With normal JS functions, of course either foo()
can run completely first, or bar()
can run completely first, but foo()
cannot interleave its individual statements with bar()
. So, there are only two possible outcomes to the preceding program.
However, with generators, clearly interleaving (even in the middle of statements!) is possible:
var a = 1;
var b = 2;
function *foo() {
a++;
yield;
b = b * a;
a = (yield b) + 3;
}
function *bar() {
b--;
yield;
a = (yield 8) + b;
b = a * (yield 2);
}
Depending on what respective order the iterators controlling *foo()
and *bar()
are called, the preceding program could produce several different results. In other words, we can actually illustrate (in a sort of fake-ish way) the theoretical “threaded race conditions” circumstances discussed in Chapter 1, by interleaving the two generator iterations over the same shared variables.
First, let’s make a helper called step(..)
that controls an iterator:
function step(gen) {
var it = gen();
var last;
return function() {
// whatever is `yield`ed out, just
// send it right back in the next time!
last = it.next( last ).value;
};
}
step(..)
initializes a generator to create its it
iterator, then returns a function which, when called, advances the iterator by one step. Additionally, the previously yield
ed out value is sent right back in at the next step. So, yield 8
will just become 8
and yield b
will just be b
(whatever it was at the time of yield
).
Now, just for fun, let’s experiment to see the effects of interleaving these different chunks of *foo()
and *bar()
. We’ll start with the boring base case, making sure *foo()
totally finishes before *bar()
(just like we did in Chapter 1):
// make sure to reset `a` and `b`
a = 1;
b = 2;
var s1 = step( foo );
var s2 = step( bar );
// run `*foo()` completely first
s1();
s1();
s1();
// now run `*bar()`
s2();
s2();
s2();
s2();
console.log( a, b ); // 11 22
The end result is 11
and 22
, just as it was in the Chapter 1 version. Now let’s mix up the interleaving ordering and see how it changes the final values of a
and b
:
// make sure to reset `a` and `b`
a = 1;
b = 2;
var s1 = step( foo );
var s2 = step( bar );
s2(); // b--;
s2(); // yield 8
s1(); // a++;
s2(); // a = 8 + b;
// yield 2
s1(); // b = b * a;
// yield b
s1(); // a = b + 3;
s2(); // b = a * 2;
Before I tell you the results, can you figure out what a
and b
are after the preceding program? No cheating!
console.log( a, b ); // 12 18
Note: As an exercise for the reader, try to see how many other combinations of results you can get back rearranging the order of the s1()
and s2()
calls. Don’t forget you’ll always need three s1()
calls and four s2()
calls. Recall the discussion earlier about matching next()
with yield
for the reasons why.
You almost certainly won’t want to intentionally create this level of interleaving confusion, as it creates incredibly difficult to understand code. But the exercise is interesting and instructive to understand more about how multiple generators can run concurrently in the same shared scope, because there will be places where this capability is quite useful.
We’ll discuss generator concurrency in more detail at the end of this chapter.