#scan_left: A lazy, incremental alternative to Ruby’s #inject

Parker Finch
Building Panorama Education
4 min readJun 17, 2020

tl;dr: There’s not much code and it lives (along with a summary of this post) here. Get it from rubygems with gem install scan_left!

Background

At Panorama we use an event-sourced architecture to persist schools’ data. Rather than just storing the current state of a school’s data, like we would with a CRUD approach, we instead store events that describe how the data changes. (Stay tuned for a future post focused on event sourcing!)

We view schools’ data at a granular level — for example, we record an event every time we observe that a student’s GPA (grade point average) has changed. From these events we want to generate not only the student’s final GPA, but also how that student’s GPA changed over time.

The fold operation, ubiquitous in functional programming and event sourcing, would get us the final result but discards the intermediate states. In order to record each step along the way we leverage the scan operation. This is similar to a fold, in that it applies a binary operation over the elements of a collection, but retains the intermediate states that are generated. The following pseudocode (based on Ruby syntax) shows this difference:

[1, 2, 3].fold(0, &:+) => 6
[1, 2, 3].scan(0, &:+) => [0, 1, 3, 6]

If we apply a scan to our event-sourced application then, by implementing a state transition function that takes the current state and an event and applying that function over all the events, we get a view of all the states that occur!

Concretely, say we had three events:

  1. On May 13, GPA for Student A is 3.5.
  2. On May 20, GPA for Student A is 3.6.
  3. On May 27, GPA for Student A is 2.8.

In the end, we want to be able to display the student’s final grade and how that grade is trending, either up or down. Whether we fold or scan, the final result is that the student’s GPA is 2.8. However, if we fold over these events then we lose the ability to tell how the student’s grade is trending. We know only the latest grade, not how it compares to the grade before. If we scan then we retain that historical context and can signal to the user that the student’s grade has significantly declined — this allows educators to check in with a student when there’s an abrupt change in the student’s performance.

Implementation

We created (and are now open-sourcing) a ScanLeft class with a single public method, #scan_left. The “left” part indicates that function application proceeds left to right, that is, the operation is applied with the first element of the array, then the second element, and so on. Instances of ScanLeft are initialized with some enumerable, and the #scan_left method takes an initial element (the starting state) and a block (the binary operation). This is similar to Scala’s scanLeft and Haskell’s scanl functions.

In order to scan over the GPA events above, we would have something like:

> gpa_events = [3.5, 3.6, 2.8]
> ScanLeft.new(gpa_events).scan_left(State.new) { |state, gpa| state.gpa = gpa }

The syntax there is a little clunky, so we added a refinement on Enumerable called EnumerableWithScanLeft. This is a thin wrapper around the ScanLeft object that allows us to call #scan_left directly on any enumerable:

> using EnumerableWithScanLeft
>
> gpa_events = [3.5, 3.6, 2.8]
> gpa_events.scan_left(State.new) { |state, gpa| state.gpa = gpa }

In practice we handle many more than three events — over time a single school generates more data than can fit into the memory of a typical machine. Thus it is important that we are able to process events lazily.
Ruby supplies lazy enumerators and our implementation supports chaining a scan as a lazy operation. In fact, the scan operation generalizes to lazy enumerables much more naturally than the fold operation. Since a fold represents a state after processing all the elements of a collection it is impossible to evaluate it incrementally or apply it to an infinite sequence. The scan operation, on the other hand, has a distinct output element for each input. This means that it can be processed incrementally and applied to infinite sequences. For example, (1..).lazy.scan_left(0, &:+) returns a sequence of the triangular numbers.

Conclusion

We’re hopeful that the scan operation will be added to Ruby proper and have proposed doing so in Ruby’s bug tracker. (Indeed, part of the motivation for this blog post is to document concrete usage of this functionality.) We think this is a common requirement — for example, there has been discussion of this on Stack Overflow. Furthermore, Ruby currently has no built-in operation to fold over a lazy enumerator while preserving laziness. Adding a scan_left method to Enumerable and Enumerator::Lazy will fill this gap.

Scanning over large sequences is fundamental to our product and we are hopeful that this discussion will contribute in some small way to the evolution of Ruby and its adoption of functional programming patterns.

Edited 2020–07–14: Added link to the now-created Ruby issue to add the scan operation.

--

--