Generator functions — To Infinity And Beyond

Shlomit Ahituv
Wix Engineering
Published in
6 min readApr 18, 2021

Introduction

You’ve probably encountered this before — you wanted to hold a really big list in memory. After careful consideration, you’ve come to the conclusion that it is either not feasible, since memory is a precious and limited resource, or not even needed since we only need one item at a time.
This is when “Generators” should come into mind.

What are generators?

Generators are functions that create elements of a sequence, one by one.
This gives us the ability to save only one item in memory at a time. This, of course, makes our program much more space-efficient.
Generator functions exist in several popular languages, including Javascript, Python, C# and Ruby.
The implementation is based on the concept of “Lazy Lists”.

One way to represent lazy lists is with the following definition —

A Lazy List is a pair of the current element (indexed i) and a generator function that creates the rest of the list, lazily.

We can define it recursively as —

LazyList(i) = (Element(i), Function(() => LazyList(i+1)))

Let’s assume we have an infinite list of integers, let’s say the prime numbers — 2, 3, 5, 7, 11
This sequence of integers can be represented using a lazy list as —

LazyListPrime(0) = (2, function() => LazyListPrime(1))
LazyListPrime(1) = (3, function() => LazyListPrime(2))
LazyListPrime(2) = (5, function() => LazyListPrime(3))
LazyListPrime(3) = (7, function() => LazyListPrime(4))
LazyListPrime(4) = (11, function() => LazyListPrime(5))

Generally —

LazyListPrime(i) = (Prime(i), function() => LazyListPrime(i+1))

Where Prime(i) is a function that returns the i-th prime number.

Obviously, it’s impossible to create this list and store it in memory, as it is infinite.
So we could create this list using a Generator.

Let’s talk tech

Languages that natively support generators use the yield keyword to implement them.
Such languages, for example, are Javascript, Python, Ruby and C#.
The general idea behind generators is that the state of the executing generator function must be preserved between calls. Different languages implement this differently.
In this part of the article we will focus on generators in Node.JS.

Generators in Node.JS

Generators in Javascript (or Typescript) are defined using the function* keyword. Inside generators we can suspend execution of the function using the yield keyword.
Note: A generator function can also contain a return statement, which will stop the function execution just like a regular function.

Let’s continue with our prime numbers example —

function* primeGenerator() {
for (let num = 2; ; num++) {
if (isPrime(num)) {
yield num;
}
}
}

function isPrime(num) {
for (let i = 2, s = Math.sqrt(num); i <= s; i++) {
if (num % i === 0) {
return false;
}
}

return true;
}

The primeGenerator function is a basic generator. We know that since it’s declared using function*.
While a generator function doesn’t have to contain a yield statement, it’s meaningless to define one as such.
All it does is iterate integers one by one, and if the integer is a prime number it returns it to the caller.

To instantiate the generator, we simply call the function —

const gen = primeGenerator();

Calling it like this does not execute the function, merely creates the generator object. If we were to inspect the object created, we would get —

gen = primeGenerator {[[GeneratorStatus]]: "suspended"}
__proto__ = Generator
[[GeneratorLocation]] = Object
[[GeneratorStatus]] = "suspended"
[[GeneratorFunction]] = function* primeGenerator() {
[[GeneratorReceiver]] = global
[[Scopes]] = Scopes[3]
Local (primeGenerator) {}
Closure {isPrime: Function}
Global {global: global, ...}

To actually call the generator that was created, we use gen.next() .
The first time gen.next() is called, the function begins execution from its first line. Any subsequent calls to gen.next() will resume execution from the last point it was suspended.

Let’s try to run this code. If we were to run the following line:

console.log(gen.next());

The output would be {value: 2, done: false} — where value is the current prime number which was generated, and done is the state of the generator — which means that the function can resume execution in the future. The gen object would look like this -

gen = primeGenerator {[[GeneratorStatus]]: "suspended"}
__proto__ = Generator
[[GeneratorLocation]] = Object
[[GeneratorStatus]] = "suspended"
[[GeneratorFunction]] = function* primeGenerator() {
[[GeneratorReceiver]] = global
[[Scopes]] = Scopes[4]
Block (primeGenerator) {num: 2}
Local (primeGenerator) {}
Closure {isPrime: Function}
Global {global: global, ...}

If we were to run gen.next()again, we would get {value: 3, done: false} .
To retrieve, for example, the first 10 prime numbers, we can iterate the generated primes. We can use the let...of syntax since generator functions are iterable —

let i = 0;
for (let prime of primeGenerator()) {
console.log(prime);
i++;
if (i > 10) {
break;
}
}

Or we can use a basic for loop —

const gen = primeGenerator();
for (let i = 0; i < 10; i++) {
console.log(gen.next());
}

Notice how the generator defined above creates an infinite list of elements.
This doesn’t have to be the case, and generator functions can produce a finite number of elements.
Let’s look at the following example:

function* lettersGenerator() {
const letters = 'abc';

for (let letter of letters) {
yield {letter: letter};
}
}

When we run —

const gen = lettersGenerator();
console.log(gen.next());
console.log(gen.next());
console.log(gen.next());
console.log(gen.next());

The first three calls to gen.next() will produce:
{value: {letter: a}, done: false}
{value: {letter: b}, done: false}
{value: {letter: c}, done: false}

The fourth (and any subsequent call to gen.next()) will produce
{value: undefined, done: true}, which indicates that the generator has exhausted and will not produce any more values.

Under the hood

So how do generators actually work in Node.JS?
As we know, Node.JS uses the V8 Engine to run Javascript code (just like Chrome). After parsing the code, V8 uses a compiler to create byte code from the Javascript code to be run, named Ignition.

As can be seen here, in the function Parser::ParseAndRewriteGeneratorFunctionBody, when the parser encounters a function* expression, it parses it as if an addition yield expression is prepended to the start of the function, which when starts running simply returns immediately to the caller.

The code that translates the yield operation to bytecode (executable code) can be found here. The ByecodeGenerator::VisitYield is the function that is called each time the engine encounters the yield keyword in the code.

VisitYield is called with a yield expression from the AST (Abstract Syntax Tree). If this is the first yield expression that was encountered it simply suspends the function execution without returning any value. If it’s not the first yield then the yielded value is also returned. In both cases, the VisitYield in turn calls the BuildSuspendPoint which is in charge of suspending the generator and yielding the result (if any) to the caller, while saving the state for the next iteration.
The BuildSuspendPoint does this by using the SuspendGenerator function that saves the entire state (registers, context, closure, parameters, the current suspend count, etc.) on the generator object, then returns the generated value and returns to the caller.
The next time the generator is called (using gen.next()), the function BuildSuspendPoint resumes execution from after the suspend point. Then, it simply calls the ResumeGenerator function, which is in charge of restoring the state (which was saved on the generator object). Afterwards, it continues execution from the point where it stopped in the VisitYield, and the rest of the generator function from after the yield will continue running.
The next time we will again get to a yield expression, this cycle will repeat itself.

V8 flow when gen.next() is called

Extensions

One concept that builds on generators is called Async Generators.
This concept exists in JavaScript, Python and C#.
It’s very useful in cases of data streaming, as the yielded results are Promises that resolve asynchronously.

We should always try to recognize the use cases where generators are applicable. This can immensely improve the runtime and memory of our program.

--

--