Functional Programming with PHP Generators

Some background on generators

Generators are cool. They make it easy to write iterators by defining a function instead of building an entire class that implements Iterator. They also make it easy to write lazy lists and infinite streams. The big difference between a generator function and a regular function is that while a regular function can only return once (as in, once a function returns its execution stops), a generator function can yield multiple values over the time of its execution. In between each yield, the execution of the generator function is paused and waits to resume until the next time the generator is run. This means that generators can be made that yield lists of values that are lazily generated, meaning that each element in the list isn’t computed until it’s used or needed.

A common example to differentiate between eagerness and laziness, is a range function that takes start and end parameters and returns a sequence of integers that begins with the start parameter and ends one element before the end parameter. With a regular function, you must create a new list, add each element to the list, and then return the list. The major downside to this is that the amount of memory that range uses is proportional to the size of the range. Depending on your environment, range(1, 10000000) may exhaust all of the memory that your PHP process is allocated. This problem happens because range eagerly creates the entire list of elements before returning it to the caller.

Using a generator function, you can create a lazy range that uses constant memory. This implementation works by creating a while loop that yields the value of the start parameter and then increments the value of the start argument. This loop repeats until the value of start is equal to the end parameter. Once the while loop ends, the function reaches the end of its body and returns, ending the generator. With this approach, the generator function only has to keep track of where it is in the sequence of values to return, instead of keeping all values in memory. To take this example further, you could make an infinite range generator function that only takes a start argument. The implementation in this case would have a while loop with a predicate that’s always true, so that the while loop never ends. This allows the calling function to determine how many values to read from the generator. If the infinite generator is only called 100 times, it will only do the work to yield 100 values. If it isn’t called again before the end of the calling function, the generator will continue to stay paused and eventually be cleaned up by garbage collection. That being said if you try to iterate through a generate that creates an infinite stream of values using foreach, the generator will iterate forever. The PHP docs have a pretty solid overview of how generators work in PHP and of the range function example.

A more practical example would be a program that needs to consume 200000 objects from some external API, filter out some of the objects, extract a subset of the data out of each document, and then insert each of the modified objects into some data store. To do this eagerly, you would have to first build a list of all 200000 objects, then remove elements from that list, and then modify each element before finally inserting them into the data store. If you used functional tools like array_filter and array_map for those steps, each of those operations would create an intermediate list. By composing generators together, you can build a pipeline of steps that lazily evaluate each step in the pipeline. You could use a generator that lazily grabs objects from the API and yields each object individually. That generator could be consumed by a generator that only yields objects that pass a certain validity check, and that generator could be consumed by a generator that yields a modified version of the object from the API.

Generators in PHP

So generators are pretty neat. Since the generator functions actually return an object that implements Iterator, they can be used anywhere iterators can, such as foreach. Unfortunately they can’t be dropped in anywhere an array can be used. This is true for things like array_map, array_reduce, and array_filter. That’s a bummer, because I prefer to use those functions over the more imperative alternatives like for loops and foreach loops. At iFixit, we came up with a few alternatives that perform the same functionality as array_map, array_filter, and array_reduce but on Iterators instead of just arrays.

Disclaimer: These examples use functionality from PHP 7.1. They won’t run on earlier versions of PHP.

Map

Map transforms or “maps” one list of values to another by applying a supplied mapper function on each value in the input list and adding the result to the output list. iterator_map has the same idea, but yields the result of calling the mapper function on each member of the iterable that is given as the second argument.

<?php
// assuming range_generator is a range function that returns a
// generator and not a list.
$bigList = range_generator(1, 100000000);
$bigListPlusOne = iterator_map(function($num) {
return $num + 1;
}, $bigList);
// At this point no work has been done.
// $bigListPlusOne is a generator that will lazily produce
// all numbers from 2 to 100000000
// (100000000 because the end of the range is exclusive)

Filter

Filter consumes a list and produces a new list that only includes elements from the input list that produce a truthy value when passed to a filter function or a predicate. iterator_filter has the same idea, but only yields values that produce a truthy value for the predicate. If a value does produce a truthy value, the generator will not yield that element and move on to the next element from the input iterable until one returns a truthy value from the predicate. From there it yields the value and waits until it’s called again.

Reduce

Reduce consumes a list and an optional initial value and outputs a value that is the result of calling a reducer function on each element. The reducer function has a carry parameter, which contains the result of calling the reducer function on the previous element, and a current parameter which represents the current element in the iteration. This operation is called fold in some other languages.iterator_reduce is slightly different compared to the last two functions, because it doesn’t have an iterable return type declaration. iterator_reduce can produce single values like a number, boolean, or string as well. This can be helpful when you want to take a list or generator and derive a grouped or aggregated value out of it, such as the sum of the price property on a list of Product objects.

Putting it all together

Now that we’ve gone over these utilities, it’d be helpful to put them all together in a little example.

In this example we are pulling data down from an online store service (we can call it Storeify). The goal of this program is to pull down all orders from the previous day and compute the daily revenue for a product we sell on the shop.

In this hypothetical world our store completes between 100 and 1000000000 orders per day, so we can’t just grab all orders from their API without having a redonkulous AWS bill for a machine big enough to hold all of those orders in memory at the same time. Luckily we just learned a little about generators, so we can create a generator to lazily grab orders from the Storeify API as we need them.

Using things like map, filter, and reduce, we can break up this problem in order to make the program easier to maintain and understand. Since we’re only really interested in the line items or products on each order we can use iterator_map to return only the line items of each order and a function called flatten to turn the list of lists of line items into a flat list of line items. Now that we have a nice list of products to work with, we can use iterator_filter to filter out any that aren’t the product we’re trying to analyze. The third stage is to take the stream of filtered products and use iterator_reduce to reduce their price and quantity fields into a total revenue amount for that product for the last day.

Conclusion

Generators are cool and functional programming is pretty swell too. Using generators with functional programming could probably be called dope. Unfortunately PHP is not a perfect place where everything is dope, but hopefully you have all the tools to combine generators and functional concepts like map, filter, and reduce. Thanks for reading! Feel free to let me know if anything could be explained better.

Update (2018–03–18)

A commenter from a thread in r/programming mentioned the iter library, written by the person who implemented generators in PHP, implements all of the examples in this article and more, I would definitely recommend checking it out if you’re planning on using generators in your codebase.