This article will talk about current state of collection pipeline libraries in PHP. It will go over the introduction what a collection pipeline actually is, then it will go over some of the libraries that are available in Packagist and will reason about why i created another one of these. In the follow up articles i will cover their usage in the wild and refactoring loops into collection.
Introduction to collection pipelines
Some time last year i was reading an article by Martin Fowler named Refactoring with Loops and Collection Pipelines and it’s enough to say that i was really impressed with the readability achieved by refactoring the logic of transforming a dataset into collection pipeline.
What the hell is a collection pipeline you might ask. Well, let’s borrow the definition from Mr. Fowler here:
Collection pipelines are a programming pattern where you organize some computation as a sequence of operations which compose by taking a collection as output of one operation and feeding it into the next.
If that doesn’t really tell you much, it works much like chaining UNIX commands using pipe where the output of one command is passed as an input to the next. The main advantage of using a collection pipeline over nested loops/ifs is readability and laziness. It might and probably will have performance implications, but it’s nothing serious and CPU time is cheap nowadays. So a OOP collection pipeline in PHP might look something like this:
Coincidentally i was learning Clojure (as it’s customary that everyone should know the functional programming principles today) around the same time as this article came out, and i was amazed by how everything in Clojure — even the function definition — is a collection (they call them sequences) and how powerful its sequence functions are. Since everything in Clojure is a collection, its sequence functions are really robust and cover every use case.
Collection pipelines in PHP
So when i came back to a PHP project, of course i was looking around for a collection pipeline library. What i found was, that while there were some, they either weren’t general enough (didn’t implement all the functions you would need or implemented bogus that was specific to one domain), immutable, lazy or all of the above. So what were those libraries i have looked into (in case i did overlook some awesome library)?
A split from the CakePHP framework, this library is existing to provide utility to it so it has some functions that have limited uses like `nest()` while not being generalized enough. That being said, it was probably the best from the libraries i saw. It is lazy where possible, nicely implemented, tested and it even provides a CollectionTrait that you can use in your collection-like objects. So if you want time tested collection pipeline library that provides most of the required functionality, this could be your choice. Of course, if you are using CakePHP as it is already, there is no need for another collection pipeline library.
If you ever used the hypiest PHP framework of these days, you will know this one. For example, the Eloquent (Laravel’s ORM) collection extends this. It is available as a split from the main Larave repo. As with the Cake’s collection, the main issue with this one is that it is just a “support” for Laravel — hence the namespace. It does not implement all the functions that you’d expect from collection pipeline, but more importantly it works with arrays (which means no laziness) and has mutable operations. With that said, if you are using Laravel/Lumen, aren’t working with large datasets (you don’t need the laziness) and use the mutable functions carefully, you might not need any other collection pipeline library.
The best framework agnostic collection pipeline library i have found. It’s well written, although overengineered a little. It is lazy and immutable. The biggest drawback is that it’s not very well commented in code and it’s wiki/docs are written in japanese. The fact that i considered this overengineered, basically undocumented library best at what it does, is a statement about the state of collection pipelines in PHP.
Can we do better?
Once i felt like i reviewed all available options, still remembering the power of Clojure’s sequences, i decided to create a collection pipeline library that would mirror Clojure’s sequence operations. So after some few months and few rounds or rewriting everything i have released the version 1.0 of my collection pipeline library Knapsack.
So what makes it better? Well for starters its design is very simple. All of the operations are defined as functions so you can use them on their own. If they return a collection, it will be an instance of Knapsack’s Collection class. The Collection is an iterator implementor that accepts Traversable object, array or even a callable that produces a Traversable object or array as constructor argument. It provides most of Clojure’s sequence functionality plus some extra features. It is also lazy where possible and immutable — operations preformed on the collection will return new collection (or value) instead of modifying the original collection. It also provides a CollectionTrait as i really liked the feature on Cake’s collection.
The library itself has 100% code coverage and is semver’d. It’s used in production already and there has been only positive reactions to it.
So what’s next? Well, i hope this (or the more generalized and in-depth linked Martin Fowler’s article) has convinced you to start using collection pipelines and hopefully Knapsack :) I will follow this article with more on why/when/how to refactor logic to collection with real world examples and so on. Thanks for reading.