JavaScript: Trees

12 min readMar 25, 2019

The tree is one of the most common data structures in computer science and the natural way to model certain subject areas. Trees (as a data structure) are found one way or another by all people, even those who are far from not only programming but also from computers in general. The most obvious example is the family tree , and the more specialized one is the file tree . And, of course, HTML (as well as json , xml and more), it also has a tree structure. Any hierarchy is a tree by definition.

There is one very interesting aspect to the trees. I often observe how the level of understanding of the topic of trees and the ability to work with them are incredibly strongly correlated with the level of the developer.

JS: Trees → Definitions

The key feature of the tree structure is that it is recursive . In other words, a tree consists of subtrees, which in turn consist of subtrees, which in turn … In the end, at the very bottom, there are leaves — and now they have no descendants and do not lead anywhere.

A tree consists of nodes and edges between them. The ribs in reality do not exist, they are needed only to visualize the connection and, if necessary, to describe it. Nodes are divided into two types: internal (those that have descendants) and leaf nodes (those that have no descendants). In the case of a file system, leaf nodes are represented by files, and internal ones are directories.

In trees, each vertex has a parent (or ancestor). The only exception is the root node — it has no parents, and the tree begins with it. The number of descendants at any internal vertex, in general, can be any. In addition, trees highlight the concept of depth (depth), which determines how many steps you need to go along the peaks from the root in order to reach the current (the one you are looking at). The peaks at the same depth with the common parent are called brotherly or sisterly.

Definition

The number of ways to describe trees is infinite. Here we consider only those that are based on the repetition of the tree structure in the data structure that describes it. The most primitive option is nested arrays:

In the examples above, the root is the array itself, and all its elements are children. If the child is not an array, then it is considered as a leaf node, otherwise — as an internal node. The internal node, in turn, consists of children. Such a structure has a limitation. Internal nodes cannot store data, but, in general, such a scheme can be used and encountered in life.

You can go further and achieve greater flexibility . Imagine each element of the tree as an array in which the first element is the value stored in the node, and the second element is the array of children. If the second element is missing, then we assume that the current node is leaf.

This option is more verbose, but allows you to store data (arbitrary) in any node, not even a sheet. Now, I hope you understand the principle of organization, and you can practice yourself inventing ways to pack trees into an array. By the way, an interesting detail. The source code for high-level languages also has a recursive structure. Look at the code:

Function arguments are expressions, which means they can be function calls, which causes recursion. If the code above is rewritten in the style of lisp languages, then we get the most real tree consisting of lists:

The agreement here is this: the first element of the list is a function (any operations are considered as functions), and everything else is its arguments.

Another definition is based on associative arrays, and specifically in javascript — on objects:

By and large, that the array, that the object itself can always be considered as a tree, regardless of its internal structure and the presence of nested elements. Such is their nature. The structure is present where you need to store exactly the tree data, for example, the file system. To begin with, I’ll reveal the secret; most file systems do not just have a tree structure, but are also represented at the implementation level as data structures.

We will work with the same structure, creating on top of it various support functions for searching and modifying content. Below is the code that creates the required tree object:

As a result of code execution, the following result is obtained:

Directory view:

File view:

Some statements regarding the tree above are:

Files — leaf nodes
The directories are internal. May contain both files and directories inside the propertychildren
meta - an object with arbitrary data, for example, size, date of creation, and so on
The directory from the file is distinguished, first of all, by the type specified by the corresponding property

About the last I will say a few words. The very understanding of OOP, which is talked about so much, is that the code is considered from the point of view of types, and you can see them. It is important to allocate them explicitly, and not to analyze them by indirect signs — for example, determining what kind of node is in front of us (what type it is!) Should be done through testing type, and not through checking the availability of the property children.

JS: Trees → Traversal

A step-by-step enumeration of tree elements along links between ancestor nodes and descendant nodes is called a tree walk . It is understood that during the crawl process, each node will be affected only once. By and large, everything is the same as bypassing any collection using a loop or recursion. Only in the case of trees are there more ways to go around than just left to right and right to left. One traversal order is used — detour in depth , since it is naturally obtained by recursive traversal. On the other ways you can read on Wikipedia or in the recommended books Heksleta.

Depth-first search

One of the tree traversal methods (graph in general). The strategy for this search is to go as far as possible into one subtree. This algorithm naturally falls on the recursive solution and turns out by itself.

Consider this algorithm on the example of the following tree:

I marked each non-leaf peak with an asterisk. Crawling begins at the root node.

Check if vertex A has children. If so, we run the traversal recursively for each child independently.
Inside the first recursive call is the following subtree:

Repeat the logic of the first step. We fall to the level below.

Inside there is a leaf element E. The function makes sure that the node has no children, does the necessary work and returns the result to the top.
Again we find ourselves in a situation:

In this place, as we remember, the recursive call was launched on each of the children. Since the first child has already been visited, the second recursive call enters the node Fand does its work there. After that, a higher return occurs, and everything repeats until it reaches the root.

JS: Trees → Map

An example of the previous lesson is extremely simple to generalize to a higher order function map. Let me remind you that map is a map . If earlier we displayed flat lists, now we will display the nested structure. From the point of view of semantics, nothing changes. map maps to each node a new node, so the structure of the entire tree remains the same, but the data inside the node changes.

JS: Trees → Filter

In contrast map, filterfor trees, in the form in which it is used for collections, practically unsuitable. Judge for yourself. Suppose we want to search the file system for files matching the pattern *.log. The task is immediately divided into two subtasks. Filtering leaves the leaf nodes and the subsequent filtering by the template. Obviously, after the first filter, we already want to work with the list, but not with the tree. Moreover, what will the tree look like in which there are only one files and what is the practical meaning of it? Like it or not, most filters of tree structures should produce flat lists at the output.

But still, sometimes, it filtercan be used, moreover, to complete the picture, it is worth going through all the steps. First you need to decide what you need to return in a situation if the node does not satisfy the predicate. It is logical to use null.

Looking at the implementation of the filter, it is clear that if the node does not satisfy the predicate, then its children are not considered at all. In the above example node B . Accordingly, her children e and Fare not even analyzed and filtered with Bed and .

It looks interesting and this entry children.map(c => filter(f, c)).filter(v => v). The fact is that our filter does not work with the list, but works with a specific root node. Accordingly, if we process children, then as a result of filtering the number of children does not decrease. In their place appears null. Therefore, children are crawled using mapan array childrenfollowed by filter. Then all nullelements will be filtered.

JS: Trees → Reduce

reducePerhaps the most interesting and useful function for working with trees. Data aggregation is a very common operation. Count the number of files in a folder, the total weight of all files in a folder, get a list of all files in a folder, find all files by template, it is more convenient to do all this with reduce.

The main feature reducein the presence of the battery, which is dragged through all the challenges to the very depths, and then rises to the top and, eventually, returns outside.

The above is an example test that is used reduceto count the number of nodes in a tree. With the help reduce, the task is executed in one line.

As well as in other functions of higher order, the node is transferred to the handler function as a whole, and not just the name. The most interesting thing happens when handling children. A call reduceon each child should occur with the battery dragged through the processing of each child. Because of this, it turns out that there is a reducefingering outside children, while each child takes the current accexternal input reduceand starts the internal one with this battery.

JS: Trees → Search

File system search is an operation that is performed extremely often. It opens a lot of interesting tasks. Let us examine one of them with several variations. Suppose we want to list all empty directories. Using reduce, to perform this task is trivial. Take the following file structure:

This structure has two empty directories: /etc/apacheand /etc/consul/data. The search algorithm is primitive:

We initialize an reducearray
If the type of the node directoryand it does not have children - add to the array

This example reveals the beauty of higher-order functions. We are completely hidden from the mechanism to bypass the tree, moreover, you can not even know that we work with the tree.

Let’s try to complicate the task. Find all the empty directories, but with a maximum depth of search level 2. That is, an empty directory /etc/apachematches this condition, but /etc/consul/datanot. Offhand, you can immediately call the following solutions:

Since our implementation reducedoes not provide information on how deep the current node is, we can perform a manual walk by passing two batteries: one with a final array, the second with a nesting level.
You can expand the description of the tree itself by adding information about the nesting level there. On the one hand, such a method will not require a change in the application code, since it suffices to change the implementation of the interface functions mkdirand mkfile. On the other hand, in such a scheme, the depth of all nodes will be calculated relative to the root, which is inconvenient.
You can write reducewho can tell how deep to go. And then solve the problem with its use.

Let’s try to solve this problem using the first method.

A few important points:

Now you need to keep track of the current depth, so it is transmitted to iter, increasing with each iteration.
If you go deeper than necessary, then return the result.

The rest of the code is the same as in the usual reduce.

JS: Trees → Aggregation

Let’s practice with one more variant of data aggregation on file systems. We write a function that takes a directory as input and returns a list of directories of the first nesting level and the number of files in them, including all subdirectories. This function can then be used in a command line utility that performs the corresponding task.

As can be seen from the statement, the task cannot be solved using it reducein its pure form, since we do not need to crawl all the nodes, on the other hand, we need a crawl to count the number of files. Consequently, our task falls into two subtasks: extracting the child directories of the current root and invoking the counting of files inside each child.

Let’s start by counting the number of files. This task contains one primitive reduce:

Absolutely typical reducewhich increases the battery by one if the type of the node file. The next step is to extract all the children from the source node and apply a count to each of them.

That is, we turned to the children directly by first filtering them, and then performing a mapping to the necessary array containing for each directory thetmber of files in it.