From List to Immutable Hierarchy-Tree (with Scala)

Tomer Sela
HiBob Engineering & Security Blog
8 min readDec 1, 2019

--

As part of my job at Hibob, I’ve been looking for a way to construct an Organizational-Chart out of a list of employees. This article shows one approach to creating an immutable tree structure out of a list of hierarchical objects.

Org-Chart / bob

Our input is a list of employees — each employee entry has an “id” and “managerId” property. The managerId can be empty or reference another employee on the list as its manager.

List ⇒ Tree

I googled my way as a good lazy citizen does and found some efficient ways to do the job.

Basically, all the solutions I’ve found involve creating a “Node” object with an empty children array — then adding children to it on the fly pointing to other nodes. Adding children to a node object means we mutate this object after creation and therefore construct a tree made of mutable objects. This isn’t necessarily bad and depends on one’s needs, but I wanted to explore a way to construct nodes in an Immutable way without sacrificing much efficiency compared to mutable solutions.

The code examples in this article are written in Scala which encourages immutability as the default way of writing code, but the same concepts can be applied with any language.

Before we carry on, let’s define how a Node object looks. We’ll have 2 properties:

  • data — contains the original object from the list (Employee object in our case)
  • children — an array pointing to the children Nodes
A Tree made of Nodes

In Scala, we’ll use case-classes for Employee and Node objects:

Employee Node

Our Node object is coupled to the Employee object type. Later on, I’ll show a more generic version of Node (Node[N]) that can work with any type of parent-referencing object.

Let’s review the Employee object:

  • id — unique identifier of the Employee entry
  • managerId — Referencing the manager of the employee. This property is an Option as the employee might not have a manager (i.e. a CEO of a company).

Note the “name” property doesn’t play a role in the algorithm, it’s just a property on our Employee. In a real-world example, an Employee object is likely to have many other properties. But a name is enough for our example.

Immutably building a tree

An object is immutable only if all of its members and sub-members are immutable. So if we want to create an immutable node, it means we need to construct it with all of its children who also need to be pre-constructed with their children, and so on…

When creating a node, we can assure its children are already created by building our tree from bottom to top.

Bottom nodes (a.k.a “Leaf nodes”) don’t have children, so they can be constructed with an empty children list. Then their parent Nodes can be constructed with the already constructed leaf Nodes, then their parents can be constructed, and up we go until we reach the top node(s) (Nodes that don’t reference any parent).

It would be easier to process our entries from bottom to top if they were sorted by hierarchy — meaning, bottom nodes appear first, then their parents, up to the top. The algorithm assumes nothing about the ordering of the input list, so the first phase will be to sort the input entries by their depth in the tree.

We’ll define the depth of a node by the number of its ancestors (parent nodes).

Nodes depth

A top node will have a depth of 0. Children of a top node will have a depth of 1, their children have a depth of 2, etc…

In this article, I’ll also use the word “layer” to describe a set of nodes of the same depth.

Phase I — Sort entries by depth

The algorithm starts by grouping together entries with the same manager by creating a map with the manager-id as the key and a sequence of entries as value.

val nodesByParent = nodes.groupBy(_.managerId)

This will produce a map of type Map[Option[String], Seq[Employee]]. This map will be also used for the second phase of the algorithm.

Then, we’ll fetch the top Node(s) from the map. A top node has no manager, therefore mapped to the None value:

val topNodes = nodesByParent.getOrElse(None, Seq.empty)

Important to note — there might be several top nodes, it basically means there are multiple trees within our list input. There’s also a case of having an empty list of top-nodes — it can happen if the incoming list is empty or has a circular manager-pointing.

Now that we have our nodesByParent map and a list of top-nodes, we can sort by depth:

Sort nodes by depth

Breaking it down:

  • Notice the new EntryWithDepth case class. This is a temporary structure we’ll use to assign depth for every entry. The depth value will be used for the second phase of the algorithm.
  • The first sortNodesByDepth function is tail-recursive. It receives a list of nodes with the same depth. We’ll first run it with the topNodes list we fetched earlier (nodes of depth 0), then let the recursion do its thing and carry on to deeper nodes.
  • The function has an acc parameter that holds the sorted entries. On each iteration, we’ll add more entries to it.
  • The second sortNodesByDepth function just calls the first sortNodesByDepth with the initial recursion values (depth = 0, acc = Set.empty).

As said, we start with the top-nodes and depth = 0. On each iteration, we add the nodes in the depth to acc in reverse order:

val calculated = withDepth ++ acc

This means that depth 0 nodes will appear last in the list, and higher depth nodes will appear first.

We then fetch the next layer of nodes. We do this by going entry by entry in the current depth and fetch entries referencing them as manager using the nodesByParent map.

val children = nodesInDepth.flatMap(n => nodesByParent.getOrElse(Some(n.id), Seq.empty))

Last, we check whether the list of children is empty — in case it is, we’re at the deepest level and we can return acc (the sorted list).
If the list of children isn’t empty, we’ll recursively call sortNodesByDepth with the next depth and children as the entries in that next depth.

if (children.isEmpty)
calculated
else
sortNodesByDepth(depth + 1, children, nodesByParent, calculated)

The first phase is made by calling calculateNodesDepth:

val bottomToTop = calculateNodesDepth(topNodes, nodesByParent)

Phase II — Build a tree from the bottom

Great, we have entries sorted by depth. Now, to the main event of this evening — Building the tree!

This is also done recursively, but this time starting from the bottom layer of entries. Before building the tree, we need to know first how deep it is (this info will be used later). That’s done by checking the first entry in bottomToTop we created earlier.

val maxDepth = bottomToTop.headOption.map(_.depth).getOrElse(0)

Now we can call buildFromBottom:

Breaking it down:

  • Since this time we go from bottom to top, we’ll call buildFromBottom with depth = maxDepth.
  • The “remaining” parameter, represents the remaining entries for processing. On the first call, we’ll use the bottomToTop sequence we created in phase I. It will be reduced on every iteration of the recursive call.
  • We’ll use nodesByParent here as well to get the children of each node we process (this time we only need to id values).
  • processedNodesById will contain nodes we already created in previous iterations. Since we process the bottom layer first and go up, we guaranteed to have the previous layer of nodes contained in this parameter. This is a map of id→Node.

We start by splitting the “remaining” parameter:

val (nodesOnCurrentDepth, rest) = remaining.span(_.depth == depth)

We then process nodes of the current depth and construct them with their children.

val newProcessedNodes = nodesOnCurrentDepth.map { n =>
val nodeId = n.employee.id
val children = nodesByParent.getOrElse(Some(nodeId), Seq.empty)
.flatMap(c => processedNodesById.get(c.id))
nodeId -> Node(n.employee, children)
}.toMap

We first fetch the children’s id values using the nodesByParent parameter. Then we lookup for the initiated children instances using processedNodesById. On the first iteration (bottom layer) processedNodesById map will be empty, but since it’s the bottom layer, it’s fine as bottom nodes have no children to lookup for. On the next iterations (higher layer / lower depth), processedNodesById will contain the children when we lookup for them.

The final step is to check the recursion condition:

if (depth > 0) {
buildFromBottom(depth - 1, rest, nodesByParent, processedNodesById ++ newProcessedNodes)
} else {
// top nodes
newProcessedNodes.values.toSeq
}

If depth = 0, it means we’ve reached the highest layer — the nodes we’ve just processed are the top nodes and we return them, that’s it.

But if depth > 0, we still have higher layers to visit — we call buildFromBottom again, this time with depth decreased by one (one layer up). We set the “remaining” entires parameter with the nodes we haven’t processed yet.

Putting it all together

The following code example combines Phases I & II

In this Scala script, a new EmployeeTreeConstructor created. Then our tree constructor is being called with a list of employees.

val constructor = new EmployeeTreeConstructor()
constructor.construct(employees)

What about other trees?

An organizational employee-tree is just one use case. A Tree-structure can be used to model other types of hierarchies. Some classic examples would be:

  • Organizational tree — Employees and Managers
  • Categories/Sub-Categories of products in a store
  • Folders and files on a storage device

Bellow, you can find a generic version of Node and the algorithm:

The main differences to notice:

  • Node is now generic and can contain any type of data object (N) as long it can reference a parent object
case class Node[N](data: N, children: Seq[Node[N]])
  • We also need a way to get an identifier property of an entry and the identifier of parent entry. For that purpose, the “construct” method receives 2 functions —
    - id: N => Stringextracts id out of N
    - parentId: N => Option[String]extracts id of the parent node out of N
    In order to get the id of an entry, we’ll use id(entry) and for the parent id —parentId(n).
    Note — We can use the type-class pattern to be able to write our implementation with entry.id and entry.parentId instead of id(entry) and parentId but that’s out of the scope of this article and you’re welcome to challenge yourself and enhance TreeConstructor.

Conclusion

In this article, I presented a way to convert a list of a parent-referencing list into a tree structure by keeping Immutability. I hope this article will inspire some readers and encourage finding more approaches to the challenge.

--

--