Perl 6 small stuff #5: Gather around, it’s time to get lazy (…but why?)

After a few years away from programming, I’m trying to get up to speed again by learning Perl 6. This series is meant to be sort of a progress report, showcasing not only what I’ve learnt but also all of my misunderstandings and errors.

One of the best things with Perl 6, in my opinion, are lazy lists. Those are lists with elements that are not calculated before you ask explicitly for them.

This means that you for instance can create lists containing 0..∞ without generating anything before accessing specific elements. That’s a good thing, because if it had, it would have taken an infinite amount of time and required an infinite amount of RAM. Instead elements 0..n aren’t populated until you actually ask for element n.

Consider this example — an example you can see lots of places on the internet. Here I create an infinite list of fibonacci numbers that also are prime numbers:

my @list = (1, 1, * + * ... *).grep({ $_.is-prime });
say @list[4]; # This generates elements 0..4.
# Elements 5..∞ haven't been generated yet
# Output:
# 89

You can implement lazy lists yourself using whatever conditions you like. For this you can use the lazy gather/take keywords [1]. To show you how, I’ll implement the functionality of the above code using lazy gather. It may seem a little redundant, but it’s meant to inspire more than be usable in and of itself.

my @list = lazy gather for (1, 1, * + * ... *) {
take $_ if $_.is-prime;
}
say @list[^10];
# Output:
# (2 3 5 13 89 233 1597 28657 514229 433494437)

With lazy gathering, the list is — explained in layman’s terms — bound to the code block. The elements of the list doesn’t exist before they’re explicitly asked for. In this case the list @list doesn’t contain anything before an element is accessed. If you access element 0, @list[0], you get the answer 2. But elements 1 through infinity has not been calculated yet. As was the case with the first example, everything here is calculated on demand.

Let’s modify the code so that it’ll visualize this behavior:

my @list = lazy gather for (1, 1, * + * ... *) {
if $_.is-prime {
say "Is prime: $_";
take $_;
}
}
say @list[0];
# Output:
# Is prime: 2 <-- output by the gather code block
# 2 <-- output by say @list[0]

So again: If you want to access element 4 in the list, the code has to calculate element 0 through 3 before it can give you element 4. You might ask why it has to calculate 0 through 3 if you’re only interested in 4. The reason is that the element in question may rely on the values of the previous, just as element 4 does in this example.

If you try to access element 4, the output would now be something like this:

say @l[4];
# Output:
# Is prime: 2
# Is prime: 3
# Is prime: 5
# Is prime: 13
# Is prime: 89
# 89 <-- output of say @l[4]

Element 5 and up hasn’t been produced yet. As before, they just don’t exist before you ask for one of them. That’s a good thing because this task gets more and more time consuming the higher the element you ask for. Let’s inspect element 23 in the list to see just how long things can take. If you start from scratch, the code obviously has to calculate element 0 through 22 to give you 23. So then you get…

say @l[23];
# Output:
# Is prime: 2
# Is prime: 3
# Is prime: 5
# Is prime: 13
# Is prime: 89
# Is prime: 233
# Is prime: 1597
# Is prime: 28657
# Is prime: 514229
# ...and 15 more of those, until...:
# 2930441286939258055400798002200365702070383356835498296052929375099603314612750189999474839964041401824000060018138590135001102587299815016150326470813100791358366141506590207652108187961822226141844888227246720852693565292520772736233258137883131633884772691595926733322344165957105221939274575774318941464860860848233258397675536343685662420797069118608961212878580658178971916344998597383060635840095569469168798433682688690011514459699339511226211249166351120903825586706980141114247391189112452169301152785673208885984700944382085177499062708435697669359679050984522704457968238198039501357594162223897683190805618618838233503289429282246135842585430668875653560981464799218782095773523505176978173330569112583486848660662068908749844158638271675670908226354975956636181462553977591227553535134946781776535903879669168739541718125413163115489438127173145671369147353184448182943977584875431449095152717023151009740993170461840098542328540055499213188472025194001595310097403036158566599062407003698164302276578800844789715519909977450325416955326392430087122738870988283653737701175138504893394815816782040327194725855833

On my MacBook Pro the above code takes 13.3 seconds to run. Not a speed daemon, perhaps, but the code runs and completes, returning this beautiful large number.

But why use gather when you just as easily can use grep or to a certain extent map? I admit I have trouble seeing why, except for a few things.

  1. Given that a grep code block has to return True or False, more complex code will at the least be more complicated to implement. The reason why is that with grep, a return is final; nothing will be executed after that. With gather your code can continue after the take if needed. It’ll also be somewhat easier to use separate subroutines or class methods, because you can call take from there.
  2. gather/take can, in a way, combine grep and map into one operation, eliminating the need of chaining functions.
  3. Grep returns lists with (normally) fewer elements than the original; map returns lists with exactly the same number of elements as the original. But with [lazy] gather there’s no limit as to how many take you can invoke at a single point of a loop trough a list. I.e. if you loop trough a list of n elements you can return more values, ending up with an array with more elements than the original. Below is an example.
  4. Apart from this, the clearest difference is that you can more sophisticatedly use next, last and redo to manipulate your control flow. Do a next in a grep code block and you can’t return an explicit return value. Just as return is final in grep code blocks, next is as well. But with gather you can take your return value and continue on with the conditions that would result in a next (or last or redo).
# Example of #3, above
(
gather for <1 2 3 4 5 6> {
take $_;
take $_ * 1.5;
}
).say;
# Output:
# (1 1.5 2 3 3 4.5 4 6 5 7.5 6 9)

My conclusion is that with gather I’ve stumbled across something that feels both important and as an improvement, but I’m not quite sure how or why that is yet.

If you can enlighten me, I’d be grateful :-)

NOTES
[1] You can of course do non-lazy gathering as well, the difference being that given a list, the loop is executed immediately. The resulting list has been calculated and populated in full before it’s returned.

Like what you read? Give Jo Christian Oterhals a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.