The Enumerable module in Ruby: Part II
In this article we’re going to explore the following topics:
- The
Enumerator::Generator
class - The
Enumerator::Yielder
class - The
Enumerator::Lazy
class
NB: feel free to have a look to the part I if you’re unfamiliar with the
Enumerable
module in Ruby.
The Enumerator::Generator class
Another way to instantiate a new enumerator is by using the Enumerator::new
method
The Enumerator::new
method returns an enumerator with an instance of Enumerator::Generator
as the data source and the each
method as the default data consumer.
The built-inEnumerator::Generator
class is not supposed to be directly manipulated by the developer. It’s an internal class whose instances have a very particular job.
This job is to save the block passed to the Enumerator::new
method — as a Proc
— in the returned instance of the Enumerator
class.
Then Ruby will execute this Proc
to build the data source provided to the data consumer method — the n
argument of the #map
block in the previous example.
Note that this Enumerator::Generator
includes the Enumerable
module. So, it implements its own each
method.
NB: feel free to read the Proc and Lambda article if you’re unfamiliar with the
Proc
class in Ruby.
Now, let’s pay attention to the yielder
argument of the Enumerator::new
block.
The Enumerator::Yielder class
The yielder
argument is an instance of the built-in Enumerator::Yielder
class.
This class is in charge of building and serving the data source using the Enumerator::Yielder#yield
and Enumerator::Yielder#<<
methods.
The data source is built and served on the fly for each call to these methods
Let’s describe step by step what happens during the e.map
iteration.
During the first iteration, the yielder
yields the value 1
as the n
argument of the block passed to the e.map
method call.
Then the yielder
yields the value 2
as the n
argument of the block passed to the e.map
method call.
So the data is built and served to the map
enumeration for each iteration.
The only difference between the
Enumerator::Yielder#yield
andEnumerator::Yielder#<<
methods is that the first one returnsnil
when the second one returnsself
.
The Enumerator::Lazy class
What happens when we want to chain enumerations?
(1..100_000).map {|n| n * 2}.first(10)
In the above example, the first(10)
enumeration will wait for the returned collection of the map {|n| n * 2}
enumeration before starting to enumerate.
So total, there will be ~100 010 iterations to get the 10 first values of the map
enumeration.
From a “performance” side, this is terrible.
So how to solve this performance issue? The answer is by using the Enumerator::Lazy
class
(1..100_000).lazy.map {|n| n * 2}.first(10)
The Enumerable#lazy
method returns an instance of the Enumerator::Lazy
class.
This class includes the Enumerable
module but it redefines almost all of its methods.
Let’s have a look at the following benchmark before detailing the concept of lazy enumeration
produces
$> ruby benchmark.rb
Warming up --------------------------------------
Enumerations 16.000 i/100ms
Lazy Enumerations 13.679k i/100ms
Calculating -------------------------------------
Enumerations 162.247 (± 1.2%) i/s - 816.000 in 5.030263s
Lazy Enumerations 142.999k (± 0.8%) i/s - 724.987k in 5.070187sComparison:Lazy Enumerations: 142999.2 i/s
Enumerations: 162.2 i/s - 881.37x slower
As we can see, using a lazy enumeration instead of a normal one can be a huge gain in speed.
So, how doesEnumerator::Lazy
work?
(1..100_000).lazy.map {|n| n * 2}.first(10)
The instance of the Enumerator::Lazy
returned by the Enumerable#lazy
method will call the Enumerator::Lazy#map
method.
This method — as almost all the methods of this class — acts in a particular way. Each iteration follows the following execution flow:
- 1/It fetches the first value of the
1..100_000
data source (1
) and yields it to the{|n| n * 2}
block - 2/ It gets the return value of the block, and passes it to the next enumeration (
first(10)
) through ayielder
This mechanism is called the enumeration chain.
Ok.. But why does this allow Ruby to avoid iterating through the entire collection?
In lazy enumeration, the final enumeration — first(10)
in our case — is in charge of controlling how long the enumeration runs.
So, it’s enough intelligent to say: “I’ve got enough data. So we can stop the enumeration chain”.
Ruby Mastery
We’re currently finalizing our first online course: Ruby Mastery.
Join the list for an exclusive release alert! 🔔
Also, you can follow us on x.com as we’re very active on this platform. Indeed, we post elaborate code examples every day.
💚