The Enumerable module in Ruby: Part II

In this article we’re going to explore the following topics:

  • The Enumerator::Generator class
  • The Enumerator::Yielder class
  • The Enumerator::Lazy class
NB: feel free to have a look to the part I if you’re unfamiliar with the Enumerable module in Ruby.

The Enumerator::Generator class

Another way to instantiate a new enumerator is by using the Enumerator::new method

The Enumerator::new method returns an enumerator with an instance of Enumerator::Generator as data source and the each method as default data consumer.

The built-inEnumerator::Generator class is not supposed to be directly manipulated by the developer. It’s an internal class whose instances have a very particular job.

This job is to save the block passed to the Enumerator::new method — as a Proc — in the returned instance of the Enumerator class.

Then Ruby will execute this Proc to build the data source provided to the data consumer method — the n argument of the #map block in the previous example.

Note that the Enumerator::Generator includes the Enumerable module. So, it implements its own each method.

NB: feel free to read the Proc and Lambda article if you’re unfamiliar with the Proc class in Ruby.

Now, let’s pay attention to the yielder argument of the Enumerator::new block.

The Enumerator::Yielder class

The yielder argument is an instance of the built-in Enumerator::Yielder class.

This class is in charge of building and serving the data source using the Enumerator::Yielder#yield and Enumerator::Yielder#<< methods.

The data source is built and served on the fly for each call to these methods

Let’s describe step by step what happens during the iteration.

During the first iteration, the yielder yields the value 1 as the n argument of the block passed to the method call.

Then the yielder yields the value 2 as the n argument of the block passed to the method call.

So the data is built and served to the map enumeration for each iteration.

The only difference between the Enumerator::Yielder#yield and Enumerator::Yielder#<< methods is that the first one returns nil when the second one returns self.

The Enumerator::Lazy class

What happens when we want to chain enumerations ?

(1..100_000).map {|n| n * 2}.first(10)

In the above example, the first(10) enumeration will wait the returned collection of the map {|n| n * 2} enumeration before to start to enumerate.

So totally, there will be ~100 010 iterations to get the 10 first values of the map enumeration.

From a “performance” side, this is terrible.

So how to solve this performance issue ? The answer is by using the Enumerator::Lazy class

(1..100_000) {|n| n * 2}.first(10)

The Enumerable#lazy method returns an instance of the Enumerator::Lazy class.

This class includes the Enumerable module but it redefines almost all of its methods.

Let’s have a look to the following benchmark before to detail the concept of lazy enumeration


$> ruby benchmark.rb
Warming up --------------------------------------
Enumerations 16.000 i/100ms
Lazy Enumerations 13.679k i/100ms
Calculating -------------------------------------
Enumerations 162.247 (± 1.2%) i/s - 816.000 in 5.030263s
Lazy Enumerations 142.999k (± 0.8%) i/s - 724.987k in 5.070187s
Lazy Enumerations:   142999.2 i/s
Enumerations: 162.2 i/s - 881.37x slower

As we can see, using a lazy enumerations instead of a normal can be a huge gain of speed.

So, how the Enumerator::Lazy works?

(1..100_000) {|n| n * 2}.first(10)

The instance of the Enumerator::Lazy returned by the Enumerable#lazy method will call the Enumerator::Lazy#map method.

This method — as almost all the methods of this class — acts in a particular way. Each iteration follows the following execution flow:

  • 1/ it fetches the first value of the 1..100_000 data source (1) and yields it to the {|n| n * 2} block
  • 2/ It gets the return value of the block, and pass it to the next enumeration (first(10)) through a yielder

This mechanism is called the enumeration chain.

Ok.. But why does this allow Ruby to avoid to iterate through the entire collection ?

Because in lazy enumeration, the final enumeration — the first(10) in our case — is in charge of controlling how long time the enumeration runs.

So, it’s enough intelligent to say: “I’ve got enough data. So we can stop the enumeration chain”.