Binary Search 🌳

Lets make a Binary Search Tree

Kennet Postigo
ReasonTraining
Published in
11 min readJun 26, 2018

--

Throughout this post we’re going to learn some of the basics of ReasonML and use them to build a binary search tree (BST). ReasonML provides a C-like syntax on top of the OCaml programming language, and tools to make building and sharing projects easier if you’re coming from JavaScript. OCaml is a 30+ year old programming language with a long academic history and lots of industry usage.

The goal of this post is to use enough of the semantics and type system of ReasonML to make you more comfortable in using it in the future.

What is a Binary Search Tree?

Sample BST with values

A Binary Search Tree is a tree where each of it’s nodes has at most two children, a left and a right child. Hence binary in the name , but theres a constraint that is very important. Lesser values must be placed on the left child of the node, and greater values must be placed on the right child node. In doing so all the values that are placed in the binary search tree are sorted. A node consists of a value for the current node, and a pointer to the left and right child of that node.

Why would I ever use a BST?

They have some very good characteristics that are suitable for several problems. They keep keys in order and provide fast look up of a key. This is the whole basis of binary search, on average you’ll only have to traverse the left or right subtree(half). Very often the case is that you’ll have fast look up, insertion, deletion. Of course a BST could become unbalanced, where one subtree begins to look like a list and causes look up to be slower(linear), but lets not worry about that, after all this is a post about ReasonML!

A ReasonML Primer

Lets take a moment to get acquainted with some of the basics and language features that are unique and incredibly empowering!

Primitives

In ReasonML, we have much of the same primitives that are found in most languages– string, int, float, char, and bool. There is also a less common but extremely useful primitive called unit, it is used to represent the type of expressions that have “no value”, expressions that are evaluated for a side-effect only. Lets take a look at these:

Function

Functions are pretty much just like functions in your typical language, except the last value is implicitly returned, no need for return keyword:

Types

In ReasonML you aren’t forced to annotate every value and function, the compiler is smart enough to be able to infer most values. If you want to annotate them, you can still go right along and do so:

Tuples

Tuples are immutable, a product type allowing us to represent an “And” relationship, they can hold multiple values and data types. Tuples have a fixed size that you determine and maintain the order you define. They are created with parens, and it’s values are delimited with commas.

Records

Records are immutable by default, a product type, they also allow us to represent “And” relationships. Records are similar to objects and dictionaries. You can think of them as Typed objects, but you can’t use them without declaring it’s shape ahead of time with a type constructor, for example the type dog in the next example must be declared before myFirstDog record.

Variants

Variants are the sum type, they allows us to represent “Or” relationships. They are very handy when describing different states or possible values that are allowed for a type.

Alone this may not seem all that amazing but boy are you in for an awesome ride when you mix them with pattern matching.

Pattern Matching

In ReasonML, you would use pattern matching mainly for matching an expression to a pattern, and destructuring data. Lets take a look at the typical syntax for pattern matching:

Pattern Matching is especially powerful because of the help the compiler gives you. Exhaustive pattern matching that will warn you when you miss a case, warn you when you handle a specific case twice, and when you handle a case that can never happen. Lets look at some examples:

Algebraic Data Types and Pattern Matching

You made it through the primer! In ReasonML, there are few types that are particularly special, those are Tuples, Records, and Variants. They comprise what we know as Algebraic Data Types in ReasonML. Why are these special? Well they are types that are created by combining/composing other types, often referred to as composite types. Lets take a look at a more “real world” example:

Algebraic Data Types + Pattern Matching = ❤

In the code above, you see how you can combine values and then pattern match against them, and the kicker is that if you ever forget to handle a case, you are guaranteed to be warned about it by the compiler. This is especially powerful to prevent bugs and ensures that the code you wrote is safe and prevents your app from getting into inconsistent states! Pattern matching also allows us to get rid of a lot of control flow code that you may have had to use in other languages before.

Modeling our Binary Search Tree in ReasonML

A binary search tree is composed of nodes that may point to other nodes. A node, should have the current node value, point to the left node, and a point to the right node. From there you can keep adding nodes until you find yourself with a full blown tree!

We model a BST in ReasonML by creating node and binarySearchTree types. A node type is a record comprised of the current node value , the left , and right child node fields. The binarySearchTree has value constructors of Empty or Node with a record node as an argument.

In ReasonML theres no hoisting of values (thankfully), you can only reference values that are already in scope. Pay close attention to the type declaration of binarySearchTree, it references the record type node in it’s Node value constructor, but node isn’t yet defined. You might suggest defining node first, but if we were to switch the order in which they are defined node would reference a yet to be defined binarySearchTree. You might call this a mutual recursive relationship. This is precisely why after the definition of binarySearchTree we use the and keyword to signify this relationship between the binarySearchTree and node types.

You probably noticed the ‘a type parameter. This is a placeholder for a more concrete type like int, string, or some user defined type. Don’t confuse this with the any type that you may see in flow or typescript. Think of it as any one type can go here. For example, the variable bst in the snippet above has binarySearchTree of type int, you can tell by looking at the bst value field that has been set to the integer 1. This means that all nodes (left and right) must also be of type int. You cannot mix and match this with other types. Then we define two BSTs, one thats Empty, and one populated with a Node. We can then infinitely compose them together manually.

Insert

To insert a node into our tree we must obey the rules of a binary tree. Nodes with lesser values must be placed on the left side of a tree and Nodes with greater values must be placed on the right of a tree. This check is done at every level of our tree until we reach a leaf node, a node without children. We should keep this in mind when implementing our insert function.

Let’s unpack what is going on in the code above. Our insert function has a nuance, you are probably wondering what rec is doing in our function. It is a keyword in ReasonML meant to signal to the compiler that this function will be recursive. This is not something you see everyday, but it grows on you and soon you will be able to quickly scan through code knowing what functions are recursive before diving into it’s meat and potatoes. Our insert function will take 3 arguments, a tree that we are inserting into, a compare function that takes 2 arguments and compares them to determine where to place our new node, and a v (a value) that we want to insert.

We start off by pattern matching our tree to determine what we are working with, Empty or Node. If we come across an Empty, we create a new Node with the v argument as our value, and Empty in our left and right fields. If we come across an existing Node, we pattern match our Node and pun it’s record fields so we can refer to them. In the body of our patterns expression, we want to compare our argument v and the Node's value in order to determine whether we should insert the function on the left or right of the Node. A compare function will take 2 values of the same type that will be compared, but they can be of any one type because of the type constructor argument of ‘a and it will return an int. If the first value is smaller than the second it will return -1. If the values are the same it will return 0. If the first value is larger than the second value it will return 1. Lets take a look at a compare function for int:

This is rather straight forward for int, but you can write a compare function for any type using rules you determine as long as it conforms to the compare signature. In the body of our pattern for Node({…}) we have 2 checks, one for the compare returning -1 where the v is lesser than the value at the node we are comparing. In this case we will return a Node that calls insert on the tree’s left node. Another check for compare returning 1 where v is greater than the value at the node we are comparing. In this case we will return a Node that calls insert on the tree’s right node. Otherwise, if they are equal we just return the tree, because there can only be that value once in our tree. The recursion ends when we call insert on a Empty left or right field of a Node or when we match on the Empty pattern. In this case we just insert our value with Empty children. That is all there is to our insert function! I forgot to mention, we aren’t mutating the tree that was passed in! We are creating a new one! We couldn’t mutate our tree even if we wanted to because the node type is a record that by default are immutable in ReasonML!

How should we go about removing a node from our tree?

Remove

The process of removing a Node from our BST requires more scenarios to check for.

  1. Empty: return Empty
  2. A Node without children: if the value of the Node is equal to the value we want to remove, return Empty, otherwise return the Node.
  3. A Node with one child: If the value of the Node is equal to the value we want to remove, return the child of the Node, otherwise return the Node calling remove it’s one child.
  4. A Node with two children: If the value of the Node is less than the value we want to remove, return the Node calling remove on it’s left child. If the value of the Node is greater than the value we want to remove, return the Node calling remove on it’s right child. If it is equal then find the smallest child in the right subtree to replace the Node being removed.

ReasonML makes it easy for us to pattern match on these cases that we might run into!

Our remove function will take 3 arguments, a tree that we are removing a value from, a compare function that takes 2 arguments and compares them to determine if we should remove the node. This is the same compare function that we defined above. Lastly a v (a value) that we want to remove from the tree.

We kick off our function by pattern matching on the tree that was passed in, If the pattern that is hit is Empty we immediately stop and return Empty. This is our base case that will end the recursion. If the pattern that is matched is a Node with no children and it is equal to the value, then we return Empty in order to remove it from the tree, otherwise if this is not a match there's nothing more we can do because there are no children, in this case we simply return the Node and recursion will end.

Then we have 2 patterns that are nearly identical that match when a child has exactly one Node. The distinction between the both of them is on which side the Node is found, however both work the same. If the value from the Node is equal to v return it’s only child, otherwise return Node calling remove on it’s only child.

Our last pattern, is a Node with two children in which case we check if v is less than the Node's value, if so we return a Node calling remove on it’s left child, if it’s v is greater than the Node’s value we return a Node calling remove on it’s right child. If it happens to be equal to the Node’s value which child should we return? It has two children but how can we be sure that the child we do replace the Node we’re removing abides by the rules of a binary search tree? We’re looking for a child node that will be greater than all the child nodes on the left of the Node we are removing AND that is less than all the child nodes on the right of the Node we are removing. This will be the smallest value on the right of the Node we are removing. We’re going to use a min function that will find the smallest value on the right subtree of the Node being removed and replace it at the removed Node location. We pattern match on the result of min, if the result is Empty return Empty, otherwise return a Node with the min value and calling remove on the right node to remove the min value from it’s previous location.

You might be wondering how the min function works, so here it is, it simply recursively traverses the left side of the node that was passed to it until it finds the Node with the smallest value:

There are a few other functions that can be applied to the Binary Search Tree, search, traverseInOrder, etc. For a full look at these functions I’ve made a gist housing the implementation of the rest of the functions here.

Last regards

We hope that after reading this post you can see a few of the many benefits that ReasonML has to offer — safety, immutability, expressivity, correctness, and more.

We could have simplified how we modeled our BST, by using a tuple instead of a record as the node type, but we wanted to touch on mutual recursive types we’ve noticed people often come into discord or reason.chat asking about for types and functions.

We have yet to actually run any of the code, I have confidence that the tooling and compiler has caught any bugs and issues with the code. If you do decide to run this code, let me know how it goes!

I decided to implement this module by passing a compare argument to most functions, however this could have been avoided by using a functor, but functors really deserve to have a dedicated post of it’s own. Which may be coming soon.

We’d like to give huge thanks to all the nice ReasonML/OCaml community members who have take the time out to review this post and give feedback, but especially Anton Bachin for reviewing several drafts of this post!

Until Next time!

If you want to receive updates about when ReasonTraining is going to be fully launched and notifications when we release new blog posts, please go ahead and drop your email here and follow us on twitter.

👋

--

--