Avoiding the Primitive Obsession Pit in Ruby: Customized Collections

https://commons.wikimedia.org/wiki/File:Hebbuz.jpeg

Ruby and collections-handling go well together. Whether it’s an Array, a Set or a Hash, the language provides us with a variety of useful methods that ease iterating, selecting and manipulating collections.

Sometimes, though, we find ourselves dealing with long lines of code that deal with the issue of picking the right objects out of a collection. While the tools supplied by Ruby are strong and powerful, it is up to us to design our objects in a way that makes their usage simple and smooth. In this post I will try to demonstrate one such approach to the problem of easing complex querying upon collections.

Books, Anyone?

Let’s get us a basic Book class:

Next, let’s create some books and store them in an array named books :

Querying the Collection

All set-up, we can now get to the matter of selecting/rejecting objects from the array due to our needs. For example, let’s assume we only need the novels from our books array; Ruby equips us with the useful select method:

books.select { |book| book.category == Book::CATEGORY_NOVEL }

That would return an array with the three novels, as expected:

=> [#<Book:0x007f89398510c8 @title="1984", @author="George Orwell", @category="novel", @page_count=328>, #<Book:0x007f8939838ac8 @title="The Dwarf", @author="Pär Lagerkvist", @category="novel", @page_count=240>, #<Book:0x007f893981a118 @title="Lolita", @author="Vladimir Nabokov", @category="novel", @page_count=317>]

Of course, we can combine several attributes in order to narrow down our picking and be more specific. For example, we might want to fetch only short novels:

books.select do |book|
book.category == Book::CATEGORY_NOVEL &&
book.page_count < Book::HIGH_PAGE_COUNT
end

That would return solely The Dwarf (heartily recommended, btw!):

=> [#<Book:0x007fe4cb803420 @title="The Dwarf", @author="Pär Lagerkvist", @category="novel", @page_count=240>]

Up till here, that’s all plain, simple Ruby. If our querying needs happen to be that direct and simple, it might be a reasonable idea to leave the implementation as it is. However, as we shall soon see, the case might be different at times.

Make the Object Talk

It is probably a good call to equip our Book class with domain-relevant boolean methods, in order to allow us easier querying:

With these new methods, things can get prettier. Crafted right, they allow us to drop the attr_reader defined at the top of the class, and the querying syntax becomes more fluid:

books.select(&:novel?) => [#<Book:0x007fe4cb82b308 @title="1984", @author="George Orwell", @category="novel", @page_count=328>, #<Book:0x007fe4cb803420 @title="The Dwarf", @author="Pär Lagerkvist", @category="novel", @page_count=240>, #<Book:0x007fe4cb00a480 @title="Lolita", @author="Vladimir Nabokov", @category="novel", @page_count=317>]books.select(&:long?) => [#<Book:0x007fe4cb82b308 @title="1984", @author="George Orwell", @category="novel", @page_count=328>, #<Book:0x007fe4cb00a480 @title="Lolita", @author="Vladimir Nabokov", @category="novel", @page_count=317>, #<Book:0x007fe4ca152320 @title="The Odyssey", @author="Homer", @category="poetry", @page_count=560>]

We cannot, though, combine the two query conditions into one. Something like books.select(&:novel?, &:long?) results in SyntaxError.

Our Book class is a fairly simple example, but if you happen to be working on some real-world codebase, you might have objects with a significant amount of attributes and methods. In this case, selecting and rejecting the relevant objects from a collection might become a true problem: at best, you end up with a lot of repetitive, boilerplate code; at worst, you’ll find yourself struggling with inconsistencies and business-level mistakes due to different usages.

Defining a binary method for each common combination can be seen as a possible solution, but in fact it often deepens the clutter, since it creates an interface which is too rigid (yes, I am talking about you, long_novel_with_all_numeric_title). Users (i.e. other coders) tend not to fix such cases but make up ‘creative’ (and possibly destructive) ways to go around them.

So how could we better handle the process of picking the right objects?

Come in BookCollection

That select method we’ve been using up till here is available on instances of Array because Array includes the Enumerable module. Yet, perhaps we need a different tool for the job in question; perhaps we need some object more specific than a general array.

I like to solve design problems by imagining the interface I’ll end up using. It helps me get some kind of an idea about the object that I need to be building.

For me, ideally I could use a syntax like this for getting all the long novels:

books.select_by_all(:novel?, :long?)

Or something like this in order to fetch all the books that are either poetry or long:

books.select_by_any(:poetry?, :long?)

That feels so clear! Multiple attributes querying needs to specify the relation between them — and or or — and these two methods convey this intention clearly.

To achieve that, what we need is a different object to hold our collection of books other than an array. In a way, sticking to a mere-array in this case might be a sort of a Primitive Obsession.

A decorated-array, suited to our needs, might be what we are looking for. Using Ruby’s SimpleDelegator, available to us through the Ruby standard library, this is a rather simple task. Consider the following class as a possible solution:

Basically, inheriting from SimpleDelegator gives us decorating abilities upon an object: the new object will have all the functionality of the object with which it was initialized, along with any additional behavior we wish to add to it. On a side note, SimpleDelegator‘s source code is a very impressive display of Ruby metaprogramming abilities which totally worth a separate post, and I encourage you to look it up and toy with it.

Eventually, inheriting from SimpleDelegator allows BookCollection to use methods like select, reject, any? and all? in a straightforward way. We evaluate each trait using dynamic dispatching with book.send(trait).

We are now ready to initialize our new class:

book_collection = BookCollection.new(books)

And perform our desired query, using our shiny new interface:

book_collection.select_by_all(:novel?, :long?) => [#<Book:0x007fe4cb82b308 @title="1984", @author="George Orwell", @category="novel", @page_count=328>, #<Book:0x007fe4cb00a480 @title="Lolita", @author="Vladimir Nabokov", @category="novel", @page_count=317>]

Since each of BookCollection interface methods return a filtered new BookCollection object, we can chain messages to sharp up our queries even further. This is the reason for which I’ve implemented select_by and reject_by , which take a single trait, similarly to the familiar select and reject from Enumerable.

Complex Querying

Assuming Book would have the following methods as well:

We could easily get all the books which are novels, long and whose title is not all numeric:

book_collection.select_by_all(:novel?, :long?)
.reject_by(:all_numeric_title?)

Which happens to be only Lolita:

=> [#<Book:0x007fe4cb00a480 @title="Lolita", @author="Vladimir Nabokov", @category="novel", @page_count=317>]

Or, to level things further up, we could query for all the books which are long novels whose title is either all-numeric or made up of a single term (‘word’):

book_collection.select_by_all(:novel?, :long?)
.select_by_any(:all_numeric_title?, :single_term_title?)
=> [#<Book:0x007fe4cb82b308 @title="1984", @author="George Orwell", @category="novel", @page_count=328>, #<Book:0x007fe4cb00a480 @title="Lolita", @author="Vladimir Nabokov", @category="novel", @page_count=317>]

Had we stuck to a regular array, that last query would probably look something like this:

books.select do |book|
book.long? && book.novel? &&
(book.all_numeric_title? || book.single_term_title?)
end

Not too messy, yet again, on a larger system the clutter might be more intimidating.

An Array, Enhanced

Thanks to the use of SimpleDelegator, any Array method is available on any instance of our BookCollection. One thing to notice is that while those methods are available to us, they will cause our BookCollection object to become a regular Array (with the same collection of objects in it, no worries!) — hence, for optimal usage, it is better to perform the selection and rejection methods before any other array method.

This pattern could be stretched and extended to support more complicated use cases; from adding a titles method that maps all the titles in a collection to including a reference to another object (for example, a case in which each Book had a few related RecommendedBooks). Take that object for what it is — an enhanced Array.

I might post about those use cases sometime soon, yet the main idea of this post is that while Ruby’s collection methods are truly awesome, sometimes business logic or certain data modeling decisions should trigger us to avoid the generic array and use a customized collection object for our queries and manipulations. The Primitive Obsession is a common and easy-to-miss pit; Just as we would normally use a Money class to handle money objects (and not stick to Float), arrays are no different.

This is my first post here on Medium. I hope you enjoyed reading it as much as I enjoyed writing it! If you found it interesting or useful, please support it by clicking the 💙 below.

Software Developer @ Riskified; Theorbo Player