5 Data Structures You May Have Overlooked
Data structures are the core building blocks to any program you write, whether on the front-end, back-end mobile, game development, or whatever tech stack you use. I like to think of programs as data + algorithms + data structures. Data structures are crucial components, with the most common ones being arrays, trees, graphs, maps, and linked lists. For this article, I will go over five data structures that you may have overlooked.
1. Trie
A Trie is a tree data structure where the nodes store letters inside of the alphabet. It is also known as a prefix tree. Using this data structure, you can traverse branches to search for specific words or prefixes of those words. You can also search for suffixes depending on how the data is inserted inside your Trie. The interesting part about this data structure is every single node can have a character as a child. For example, if we have 230 available characters, we could have 230 children for every single node inside the Trie. Let’s say we were to insert the words cow, cats, and Cobra inside our Trie.
If we wanted to insert the word ‘Cow’, we would first create the C node since it does not exist yet, then we would create a node for O, which connects to C, then create a W node, which connects to O.
For the word ‘Cat’, we already created a node for C. With that, we would move immediately to a node for A which connects to C, node for T, which connects to A, and create a node for S, which connects to T.
For the word ‘Cobra’, we would move directly to B since we already have the first two nodes connected. After creating the node for B, which connects to O, we could create a node for R which connects to B, and then a node for A which connects to R.
As you can see, this data structure is efficient for storing a wide variety of words inside it. The real power behind it is we can search for prefixes inside it. Say we wanted to search for ‘Cat’ to see if it exists. We would check if C is present. If yes, we check if A is present, and then T. If everything checks out, the return value would be ‘true’. Note that both inserting words and searching for words are an O(M) operation where M is the length of the word.
The most common application of this is auto-completion. If you are typing into your phone and doing a text message, you can have different options that pop up to get as close to what you are trying to type. In parallel, another use is spellchecking. If I am incorrectly typing, the Trie can recommend the correct word for me to use.
2. Disjoint Set
A Disjoint set is also known as Union-Find. It tracks the set of elements into a number of disjoint non-overlapping subsets. I know that may sound complicated, but it is really not. All it is, is you have two main functions, which are Union and Find. ‘Union’ will merge two subsets together into one, and then ‘Find’ will perform a search to determine if two objects are in the same subset. Let’s say I have three subsets total, each containing a group of numbers.
Each subset has a parent where all the numbers inside of the subset point to that specific parent.
I can choose to merge subset zero and subset one causing zero and one to just become a single subset. This is done in an efficient manner because all we have to do to merge is change the parent of subset one to point to the parent of subset zero, and now all the numbers in subset one have been merged into subset zero.
I can also decide to check if two numbers are in the same subset. This is done using the Find operation. All there is to this is to check if their parents are the same. If their parents end up being the same, they are in the same subset.
This data structure is used in Kruskal’s minimal spanning tree, and to compute least common ancestors. Since you do not really have to know Kruskal’s minimal spanning tree or computing least common ancestors, the best use case for this structure is in solving a bunch of different interview problems. Some of the most common interview problems that you have seen, like number of islands, friend circles, number of connected components in an undirected graph, accounts merge, and a bunch of other ones can be solved with the disjoint set data structure.
3. TreeMap
A Treemap inherits from a Hashmap. The difference is that the entries are sorted by the ordering of the keys or by a custom comparative class that you provide to the constructor. If we were trying to store the entry of an integer mapping to a string, all of the entries inside your treemap would be sorted in ascending order based on that integer value. Under the hood, this data structure is using a red-black tree to maintain order efficiently. The performance for insertions and lookup is actually O[log(N)].
Let’s say I added the following entries inside of my Treemap. If I were to look for the first key, which is the lowest that would return one, I could also look for all keys that are less than a number. So, I could look for all numbers less than two that would just return and tree one. I could also say, find me all keys greater than or equal to one. So that would return all of our entries in our Treemap.
This data structure is perfect for performing queries such as finding the largest or smallest number or finding keys that are less than or greater than some specific value.
4. Skiplist
Skip list is a faster way to search for nodes inside linked lists. In a normal linked list, you would have to search through every single node in case you wanted it to search for a specific node, that would take O(N) time, where N is the number of nodes. In a skip list, you have an express lane on top of your normal linked list. So, you have your normal lane, which has all of the connections to every single node inside your linked list, and then your express lane is skipping over a bunch of different nodes.
Let’s say I wanted to search for node 45 in the following image.
I would first go to my express lane, which starts at node 10. That is less than that 45 so I need to move forward to node 30. As a result of doing that, I got to skip a bunch of nodes in my normal lane. 30 is also less than 45 so I will move to node 58, which is greater than 45. Now, I know where I need to start my search in my normal lane. So, from node 30, I will move to 43 and then 45. For skip lists, search, insert, and delete will all be O[log(N)].
When would you use this? Well, if we need an ordered thread-safe data structure in a distributed environment. If we have a balanced tree and we need to rebalance it in a distributed environment where we have many different threads trying to access pieces of memory inside this tree, that requires us to have a new text lock on a huge portion of the tree potentially. That is very inefficient because while this tree is being rebalanced, there may be large portions of the tree that are not even being modified. However, for a skiplist, we can have the same functionality as the balanced trees, and we can put a mutex on every single node thus, we only need to lock nodes that are directly affected.
5. B-Tree
A B-Tree is a self-balancing binary tree. In normal binary trees, we have the potential to have a degraded tree if the tree is not balanced well. We could have a bunch of nodes inside of this tree that are essentially just a linked list. If our tree degrades to this, search, insert, and delete become O(N). When a tree is balanced, they become O[log(N)]. A B-tree will ensure that our tree is always balanced each time an insertion or deletion happens. Each node inside a B-tree contains keys, which are just the data represented inside of the nodes. The order of a bee tree is the number of children that each node can have at maximum. So, for every key inside a node, we can have further nodes as our children, and that is if they are greater than the key on the left and less than the key on the right.
You will probably never have to implement one of these from scratch. However, a B-tree is the default index for most storage engines, specifically my SQL. The power behind a B-tree in terms of storage engines is that you do not have to scan all of the data to find the desired rows.
So those were the five data structures that you may have not heard about. Some of them are easier to understand than others. Like I said, you might use data structured or not in your job depending your job description. The thing to take away from this is that they will pretty much be a part of your life if you are going into software development. You will have to understand them for your interviews and for your workflow. I hope you found this piece useful.