Async Generators in JavaScript

Published in

Thirdfort

13 min readAug 3, 2023

Address Searching API

Recently at Thirdfort, we identified a problem with the way we were using a third party API. The API service allows us to search for addresses with a single search string. The service works very well, but it is not as simple as it sounds.

Here is how it works:

Call the API with a search string
Get back a list of matching addresses, and a list of 0 to many container ids
For each container id, perform the search again with search string and the container id (go to 2)

You can see that the service does not just return addresses, but also ids that point to further address lists. You can think of this as a filesystem, where the addresses are files, and the container ids are folders that can contain files or more folders.

In this sense, traversing this type of hierarchy is a pretty well-trodden problem. However, in this case we don’t have the entire data structure at hand, and have to discover it with asynchronous calls to the API. What this means is that you have to wait for a result to come back before you can continue. Furthermore each request can spawn further requests, so we can think of this as an asynchronous tree traversal.

The Problem

Now comes the problem. Our original implementation used Redux actions, because the project was initially set up to use a middleware to fetch from APIs. Redux is great for lots of things, and it worked well here in most cases, but we found an edge case that was harder to resolve within the Redux implementation.

First of all, let’s see how the implementation of the algorithm above worked in Redux:

Dispatch an action to search for an address
In the action reducer function, add the addresses to the store and …
Dispatch more search actions from here for each container id

This worked, filling the store with addresses, which could be read by a Redux selector and displayed in the UI as they came in.

The main problem, as with many problems, comes when this scales.

Brazil

Brazil is a large country. It has many addresses. When you search using say, only three characters such as ‘bel’, the first action fires and from there we start a chain reaction of calls, returning thousands of addresses and container ids.

There were a few other problems, but these were easily solved: firstly searching began again each time a new character was added (just add a delay so the search is only initiated once the user stops typing), and it also began when the search string was too short (just increase the minimum length of the search string required for searching to begin).

Still, even with these fixes, we had the same problem. Too many results. So an attempt was made to place a limit on the search. This was done in the reducer function that dealt with the container ids to limit the number of actions that can be triggered. Unfortunately this approach also has problems

You might miss addresses
Even if you limit the number of actions per response, you could still traverse very deeply and return too many results. What you really want to do is place your limit on the total number of searches so that you can explore all the returned container ids in order without ignoring any higher up the “tree”.
Sometimes the same container id can be returned more than once, and you need to keep track of previously searched containers that are still in the store

The last two points hint at the problem with the implementation.

How do you keep track of what is happening with this search? When a new search is initiated you have to store the search string in the action, and clear the store. Then in the reducer you would either need to store a growing list of previously searched container ids in the action, or in the store, so that it could be accessed by subsequent reducers, as well as a total count of all the actions dispatched as part of the initiating search action. This can get messy fast.

What’s the ideal?

The problem is made trickier and harder to see here because the code is not cohesive. It’s not all contained within one module or function. What we ideally want is a function that just does all this for us and returns the addresses:

const addresses = getAddresses(search)

But of course, getAddresses has to call an API and wait, it’s asynchronous, so it becomes:

const addresses = await getAddresses(string)

There is another issue with the function written this way, that can be solved like this:

getAddresses(search, appendAddresses)

appendAddresses is a function passed as a parameter that getAddresses uses to add addresses to a growing list. This is because we don’t receive all the addresses at once, but over a number of asynchronous operations (calls to the API). So, rather than wait until they are all finished and give the complete list, we would like to start displaying the list while it is filling up. With this approach we can. Note that we now no longer have to await the call either.

In fact, in React appendAddresses can be a function that sets the state in a component, like so:

const DisplayAddressList = (props) => {
	const [ addresses, setAddresses ] = useState([]);
	const appendAddresses = (addresses) =>
		setAddresses((current) => [...current, ...addresses]);
	...
	useEffect(() => { getAddresses(props.search, appendAddresses), [props.search])
	...
}

React aficionados can probably see all sorts of problems here, but this is just a minimal, illustrative snippet. You get the idea: the list builds up and we can display it as it grows.

How to implement getAddresses()?

To break this problem down we can first focus on writing an algorithm that traverses this conceptual tree. We want to search breadth first, not depth first. This means that if we get back five container ids, we want to fire off requests for each of those siblings first before we begin searching among their children.

One way to write a function to traverse a structure like this is to have the function call itself, which is called recursion. With recursive functions it can sometimes be hard to follow the flow of execution. Each call to the function has its own set of local state, and if you want to know how many times the function has called itself, for example, you need to pass that information into the function as a parameter. However, it is always possible to turn a recursive loop into an iterative loop if you are willing to keep track of some things yourself. When you do this, you gain more control and oversight.

This is an abstract function to traverse a tree. It is abstract because the knowledge of how the tree data structure is organised and stored is not known to it, it’s abstracted out into the functions that are passed to it as parameters:

// Generator for traversing a tree, yielding a node on each
// iteration. Node result should contain look ups for children.
function* traverseTree( 
		root /* look up for root of tree */,
		getNode /* retrieve a node by a look up */,
		listChildren /* list the look ups for the children of a node */) {
	const queue = [root];
	const next = () => queue.shift();
	while (queue.length > 0) {
		const value = next();
		const result = yield getNode(value);
		queue.push(...listChildren(result));
	}
}

Our real version of this function keeps track of tree depth as well, but we don’t need that for our example.

The parameters for this function are:

root - the initial lookup
getNode - a function that accepts a lookup and returns a node
listChildren - a function that accepts a node and returns 0 or more child lookups

Terminology

A lookup is a way of getting a node.

A node is a value in the tree that contains 1) data and 2) information about its children.

You can see this is how we have abstracted the problem of traversing the tree. There is nothing in traverseTree about addresses, calling an API, or container ids. The specifics are decided by the person who writes the functions getNode and listChildren, and it is up to them also to define how a lookup should be defined.

In our specific case a lookup is a URL string used to call the address API, and a node is a response from that API (with some processing). We will look at how to implement these functions soon.

Beforehand there are further things to note about traverseTree itself. First, we are storing each lookup in a queue which starts with the root lookup. Then, while the queue is not empty, we take the first lookup from the front of it, get the node using the lookup with getNode, and then look up the node’s children with listChildren , adding more lookups to the end of the queue.

From this we see the queue is a FIFO queue — First In, First Out. This means, for example, if the root node had five children, all those children would be processed first before any of their children are reached, and so on. This ensures a breadth-first tree traversal. Anyone with a large family knows how much trouble can be caused if a sibling gets left out.

Generator Functions

traverseTree is a generator function (function*). We can get into how and why later. For now the benefit to highlight is that from the perspective of the code calling the function it seems just like a simple loop. There is no notion of the tree structure from outside the function. The calling code just loops through nodes as they come. You might expect the calling code to look something like this:

for (const value of traverseTree()) {
	// do something with value, we don't care about trees here, we just get the values as they come
}

Tree data

Before we move on with that, this abstract tree structure might need some explaining. Let’s use an example to show some simple implementations of getNode and listChildren that can traverse this data structure:

// root node
const root = { data: 0, children: ['a', 'b'] };

// child nodes (keyed by lookup, which is a single character)
const children = {
 a: {
  data: 1,
  children: ['c', 'd'],
 },
 b: {
  data: 2,
 },
 c: { data: 3, children: ['e'] },
 d: { data: 4 },
 e: { data: 5, children: ['f'] },
};

We can see that the tree data contains a root node and some child nodes. Each node has two properties: some data, and an array of child lookups.

Now let’s write getNode and listChildren for this data structure.

const getNode = (value) => (value ? children[value] : root);
const listChildren = (result) => result?.children ?? [];

I said before the we get to decide how the lookups work. Here we define that a falsey value is a lookup to the root node, and a letter string is a key lookup to the children object.

So we call traverseTree with a null lookup to start so that the root node is returned from getNode. The root node has child keys in its children array, which listChildren gets from the data structure’s children object. If a child does not exist, we get undefined. Note that in our example data, child e specifies a child f, which does not exist.

Traversing the Tree

Now let’s traverse the tree.

function iterate(iterator) {
  const values = [];
  let result = null;
  while (true) {
    const { value, done } = iterator.next(result);
    if (done) {
      return values;
    }
    result = value;
    values.push(value?.data);
  }
}

const values = iterate(traverseTree(
    null,
    getNode,
    listChildren
));
// expect values to be [0, 1, 2, 3, 4, 5, undefined]

Wait what?

That calling code looks way more complicated than the for..of loop!

const values = [];
for (const value of traverseTree()) {
	values.push(value?.data);
}

So why can’t we do it this way? It does not look that pretty or easy to use does it? Is this really how the calling code of a generator function looks? Well not usually no. Most code that calls generators either use for..of or does something like this:

const myList = [...myGenerator()]

In our example the iterate function is doing manually what JavaScript usually does for us in those two examples — with one important difference. A generator function actually returns an object with functions on it that can be used to iterate (for simplicity we’ll just call that object an iterator from now on). Here’s a simpler example of that:

const iterator = myGenerator();

let result = iterator.next();
while (!result.done) {
    // do something with result.value
    result = iterator.next();  
}

This is basically what JavaScript is doing under the hood with the for..of loop.

So the major difference between this while example and our iterate function comes down to how we call next() on the iterator. next() also accepts a parameter. In a for..of loop, nothing is passed to next(). In iterate we pass the previous value onto the next iteration.

This is something we definitely need when traversing a tree that we get from API calls, since subsequent iterations are determined by previous results: it is not something we know up front.

Iterator Flow

The flow of execution in an out of a generator function and the code that is interacting with the iterator might be hard to follow. When I first started writing this article I tried to list a step-by-step example of how the program ran, but that proved even harder to follow. The best way to explore this might be for you to write a small example and step through it in a debugger. Let’s call that homework. To summarise as clearly as I can let’s define some terms:

Generator code — code running inside the generator function
Calling code — code running outside the generator function
Iterator — the object used by the calling code to call the generator code

So in general terms, the code runs like so:

Generator function is called by Calling code to get the Iterator
Calling code calls iterator.next() (with or without a parameter to pass into the Generator code)
Now we are running Generator code until we hit a yield
Execution returns to the Calling code (with the result of the yield)
Calling code does something with that result, and
If iterator.done it finishes
Else it calls iterator.next() again, with or without any parameter needed to control subsequent iterations

Asynchronous Generators

And finally we get to the the title of this article. Thank you for getting this far with me. Our iterate function example is actually just to explain how we need to write our calling code manually this way so that we can pass results of previous iterations onto next(). It is not suitable for our address API code because it collates the results internally in an array and returns them at the end. Our calls to getNode() are also asynchronous and return promises, not the results themselves.

To deal with this we have another helper function. This one is simple and is in our actual code.

async function* chainedIterator(iterator) {
	let result = null;
	while (true) {
		const { value, done } = iterator.next(result);
		if (done) {
			return;
		}
		result = await value;
		yield value;
	}
}

This async helper function controls the passing of results onto the next iteration, and also awaits the value (assuming it is a Promise). You can write a sync version of this easily by just removing async and await but that’s not what we need here. This lets us wrap the traverseTree (or any generator that needs the previous result passed to next() to make another generator…

const simpleTraverseTree = (params) => chainedIterator(traverseTree(params))

simpleTraverseTree isn’t the name we have in production, but let’s not get bogged down with naming concerns. I also didn’t list out the individual params of traverseTree to keep the example concise.

… And Finally

And now finally we can implement that getAddresses function.

const getAddresses = async (
	searchString,
	appendAddresses
) => {
	for await (const result of simpleTraverseTree(/*params go here*/)) {
		appendAddresses(result.addresses);
	}
};

This is a cut down version of the implementation to highlight some important points. We see that we are now able to iterate over the simpleTraverseTree without worrying about passing results onto next() — chainedIterator handles this for us.

We also note that we are looping with for..await now. This is how we can ensure we await the result in a for loop, and is just one more nifty thing that JavaScript can do for us.

The above example does not quite address (ahem) all the issues we outlined at the start, so let’s wrap up with a more complete example. (Note this is still not our actual code, but is enough to show the idea. The actual code needs to process the API response a bit since the third party response doesn’t just give us a list of addresses and a string list of container ids as shown here).

const getAddresses = async (
	searchString,
	appendAddresses
) => {
	const usedContainers = new Set();
	const root = getAddressSearchURL(searchString);
	const getNode = (url) => fetchEndpoint(url);
	const listChildren = (result) => {
		const children = result.containers
			.filter((container) => !usedContainers.has(container))
			.map((container) =>
				getAddressSearchURL(searchString, container)
			);
		for (const id of result.containers) {
			usedContainers.add(id);
		}
		return children;
	};
	let requestsRemaining = MAX_REQUESTS;
	for await (const result of simpleTraverseTree(
		root,
		getNode,
		listChildren
	)) {
		appendAddresses(result.addresses);
		--requestsRemaining;
		if (requestsRemaining === 0) {
			break;
		}
	}
};

This final code keeps track of container ids already searched, and can also easily limit the total number of requests that are triggered off one search by simply maintaining a count and breaking out of the for..await loop. We don’t care where we are in the tree. We know it’s breadth first and that’s fine. We don’t have to put guards in anywhere to break out of a complex tree traversal code.

Also, the request count variable is just a simple local variable. Nothing sticks around in any kind of global state that can be accessed outside this function.

Everything we use to keep track of this search is local (requestsRemaining and useContainers) and gets disposed of once this function returns like normal. There can be no bugs where we forget to tidy things up in global state where unique searches using different strings can get mixed up.

This function is completely self contained, making use of abstract helpers to traverse a tree structure, and to handle generator iteration. From here it is straightforward to write other async generators to traverse any kind of async structure, and simply run around them in a simple loop in a single function.