Developing an Algorithm and Data Structures collection in Elixir Part I — Priority Queue
This will be a multi-part series about developing an Algorithm and Data Structures collection with Elixir. The source can be found at https://github.com/sashaafm/exads.
Exads stands for EliXir Algorithms and Data Structures. This is a side project of mine with the aim to build the most complete and correct library/collection of data structures and algorithms. A roadmap can be found at the README. Since I’m developing on my own free time I’ll gladly take pull requests for new ADS’s, algorithms or corrections to my implementations, I’m a beginner at Elixir after all :)
There is already an implementation for the Stack and Queue Abstract Data Structures, however since these ADS are trivial to implement I’ll refrain from presenting them. This first post will be about the Priority Queue ADS.
Specification
The Priority Queue is an ADS similar to the Queue, where each element has an associated priority. Instead of removing/serving the elements by a First In, First Out policy, the PQ serves the element with the highest most associated priority. Only if there are elements sharing the same priority will the PQ resort to the FIFO policy. Introduction to Algorithms, 3rd Edition (2010) offers the following specification for the Priority Queue:
INSERT(S, x) inserts the element x into the set S, which is equivalent to the operation S =S U {x}.
MAXIMUM(S) returns the element of S with the largest key.
EXTRACT-MAX(S) removes and returns the element of S with the largest key.
INCREASE-KEY(S, x, k) increases the value of element x’s key to the new value k,
which is assumed to be at least as large as x’s current key value.
The book also refers the Priority Queue as usually being implemented with the aid of a Heap Sort ADS. Since one of my aims is to keep all implementations close to Elixir’s main library (for performance and correctness purposes), I shall implement mine resorting to the language’s List module.
Implementation
Although Elixir is not Object-Oriented, my first instinct was to write a new() function to return a new empty Priority Queue (PQ). Since my implementation is done by the means of a List, this function does nothing more than to return an empty List (an interesting concept in Elixir and other Functional languages is that even if you do not intend the function to return anything, it will always return the result of the last operation. Over time, this builds a good sense of how data is transformed and pipelined around in FP languages).
Next, I implemented the INSERT(S, x) operation, which I named insert_with_priority(queue, {_item, _prio} = elem). Right away we can get on with some great pattern matching, one of my favourites FP features and specially Elixir’s amazing implementation of it. In the parameters, I pass {_item, prio} = elem, which means the function expects a tuple with two parameters I called item and prio (the priority of the element) and then I matched that tuple with elem to enable the calling of the whole tuple, instead of adressing the contents individually. This leads to great expressiveness and to writing short and clear code. The item also takes an underscore as a prefix in order to tell the Virtual Machine that although I match that parameter and give it a name, I won’t be using it in the function. This makes for a pretty good way to make the code more readable and understandable, without having to resort to bad practices (I could just name the parameter and just leave an underscore, which in Elixir means that a parameter is passed, but it doesn’t care what it is and doesn’t try to match it, however this way makes it quite clear that the elements are supposed to come in a tuple format, where the first parameter is the item/value of the element). The same applies to prio.
Quite a short implementation. The specification tells it expects to be passed a list of tuples and tuple and that a list of tuples will be returned. The function itself just concats the original queue with the desired element in a list.
Next is the EXTRACT-MAX(S) which I named get_frontmost_element(queue). This function returns the element that is next up to be served and the remainder of the queue in a tuple {element, new_queue}. This function actually just makes a call to a private helper function that gave me quite some trouble to implement. This was the first version of the helper (don’t bother trying to understand):
Not pretty, right? I tried debugging and testint this mess for nearly two hours before I decided there should be an easier way. Then I remembered my intent of using Elixir’s main library as much as possible, and alas I found the perfect function in Enum. Enum is a module for operating on any other module that implements the Enumerable Protocol. The List module obeys to this requirement and after reading the available functions with some attention I found group_by(enumerable, dict \\ %{}, fun) or simply group_by/3. This function takes an enumerable, an optional Dictionary and a function. It then groups all the elements of the enumerable into a Map where each group is defined according to the specified function (it may be the length of each string in a list of string, the date in a map of dates or in my case a list of priorities). After the elements are grouped I could just take the last element in the highermost group (since the elements are grouped as they are processed, the first element with the highest priority will go in the end of highermost group) and then remove it from the original queue:
Much easier, clearer and readable than my first version. A tuple is pattern matched against the group_by/3 function and then the Map is transformed into a List and that last element (the highermost group) is retrieved from the List. Finally a tuple is built with the first item being the last element of that highermost group (it will work even if the list has size 1) and the second item is the original list minus that element. At the top we can also see Elixir’s pattern matching at work on function definitions, where in case of the queue coming as an empty List, it will just return nil.
MAXIMUM(S) is pretty much the same as the previous function’s implementation, however, instead of returning the tuple with the removed element and the new queue without the said element, it just returns that highest priority element. The spec shows that a list of tuples must be passed and that either a tuple or nil will be returned. Once again, pattern matching is used to reuse the helper function and just return the highermost element, not caring about the transformed queue.
Finally, I implemented the INCREASE-KEY(S, x, k) operation. This function actually uses another public function I implemented (position_by_order/2 which retrieves the position by FIFO policy of a given element). I named the function increase_element_priority(queue, {item, prio} = elem, new_prio) when new_prio >= prio. This last bit is important. This is a guard which are Elixir macros to pattern match functions, when they obey the condition it stablishes. Its purpose is to comply with the operation’s specification of the new priority being at least as large as the old one. The function itself uses the Enum module again, for the functions take/2 and slice/3. First it determines the position of the element in the queue. Then it uses take/2 to get the first part of the queue, up to where the element is (which before being stored into the variable, is removed from this first part). Slice/3 gets the remaining part of the queue up to its length. Lastly, a new queue is returned through the concatenation of the first part (without the element), the element with the new priority and the remainder of the queue. This whole concatenation section is necessary to preserve the FIFO order of the queue.
Testing
Since testing this ADS didn’t show anything particularly interesting, I won’t include any details in this first part. However, my test suite wasn’t that strong or robust (I pretty much just did some border/boundary value tests) so I’ll probably get back to it in the future to increase my confidence in this implementation.
Conclusion
This was the implementation of the Priority Queue in Elixir for my Exads side-project. It has been quite interesting so far to develop these known ADS’s and algorithms using a languages particular mechanisms and features. Pattern matching, specially, is the most useful feature that brings a lot of expressiveness and readibility to the source code. The Enum module is also a great asset to rely on if a developer decides to his implementations through Enumerables. Although this post only focuses on the four main operations of the PQ, I also implemented other common operations and some others I thought, along the way, could be useful for anyone using this implementation.
Next on my roadmap is the famous Tree data structure. Like I said at the beginning of this story, my project is available on GitHub (https://github.com/sashaafm/exads) and I will gladly take all the help I can take to make it the most complete Algorithms and Data Structures collection for Elixir. If you want to contribute, just clone the code and do a Pull Request, or hit me on Twitter :)
Have a good one!