Data Structures — The Why

Kurt Maurer
Jul 22, 2017 · 7 min read

After starting my programming journey, I quickly realized that a big topic in the area is data structures. You’re gonna get asked them during your interviews over and over again. Well, great, let’s learn some data structures. *Commence hours of reading, meditation, crying, and questioning.* Linked lists, doubly linked lists, stacks, queues, graphs, heaps, hash tables, binary search trees, AVL trees, red-black trees, oh and don’t forgot to traverse them…

A grand stack of complex topics like this ultimately ended up at the age old question that, I imagine, every math teacher gets in their career: when are we ever going to use this in the real world? Why am I spending all this time to learn these data structures that I’ll be asked in an interview only to forget them all after I’ve landed a job?

A few quick google searches reveal countless college and coding student asking this very question, with some helpful folks from sites like Quora and StackOverflow offering some amazing insight. No overarching articles or books exist to really delve into real world applications, which, many of us have realized over the years, is a fantastic motivator to learn abstract topics like upper level math and data structures (if my math teacher had just answered the damn question, I’d be in a completely different place right now). So here’s a brief over of just a few data structures.

An important note: data structures are often abstract. If you look up the dictionary definitions:

  1. Thought of apart from concrete realities, specific objects, or instances…
  2. Theoretical; not applied or practical…
  3. Difficult to understand; abstruse… (lol)

This means they are often not used explicitly, but can be used to aid understanding of the world around us. This leads us to our first data structure and example…

Linked Lists

My favorite metaphor for this was a conga line, at least for a singly linked list. Imagine you have decided to have a strange get together where the host of the party is the head of the conga line. They start doing the conga by themselves, no one else is at the party yet, they’re a little awkward, but they know people will be arriving shortly. The next person arrives at the party and instead of saying or doing anything that might identify themselves other than a welcome friend, they walk up to the host/hostess, put their hands on their hips, and gets into the conga groove. (sssshhh….no talk…only dance…) The new friend knows who the host/hostess is, but the head of the conga line doesn’t know anything about the person whose hands are on their hips. Steadily, more folks join the party, each person not knowing the person in front but not the person behind. Voila, a linked list is born… until Bill starts that weird laugh which causes everyone else to laugh and now everyone knows who each other is in the line.

I’m sure there are some explicit, hard coded examples of linked lists in the real world, but they can be easily seen and applied in something as simple as a web browser. In this case, a double linked list appears. You open a new Chrome tab and decide not to open anymore since the other two browser windows with twenty four tabs open each is pushing your Chromebook to its limits. Your home page pops up. You now have a double linked list with your home page as both the head and end of the line.

To start your morning routine you quickly check facebook. After 45 minutes of navigating extreme opinions, old memes, and baby pictures, you realize your data structure has changed a little bit. Facebook is now at the end of the line with the browser providing pointers to the head of line and the head back at Facebook. You can see these at the top left of your browser screen. The left arrow points to the next page (homepage), and navigating back to the homepage will show a right arrow pointing at facebook.

From Facebook, you decide to check out what’s new on Medium. After reading some enlightening posts, you check on your doubly linked data structure.

You now have Medium at the end of the line, Facebook in the middle, and your homepage still at the head of the line. On Medium, your browser shows a greyed out right arrow and nice dark left arrow. Pressing the left arrow puts you back on Facebook where, now, both arrows are dark…the left points toward the head of the line and the right toward the end. Navigating back more still shows the home page with a greyed out left arrow and dark arrow pointing back at Facebook. Continuing your typical internet surf on one tab will leave pages in the middle with two pointers to the left and right (or up and down, depending on how you like to visualize it).

There’s your real world example of a double linked list! Browser back and forward arrows are a simple, abstract implementation of a double linked list. There are some other things that we can do to them, but let’s not get into that now.

As you navigate back to Facebook, you decide to do a little research on how Facebook works, which brings us to our next example.

Graphs

“Oh you mean like a typical chart with x and y.” I wish… it turns out there’s another type of graph that can be applied easily to our facebook interactions. In fact, you’ve probably seen one before.

Each circle is a node which points to some other node, much like a linked list. But instead of pointing only forwards and backwards, they can point to any number of other nodes. This is easily seen on Facebook: you and each of your friends is a node. The relationship between you, or the pointer, is called an edge. There’s an edge between you and each of your friends, where you are friends with your sister and your sister is friends with you. To follow the definition of an edge exactly, your friendships are two edges, where your friendship with your sister is an edge and her friendship with you is an edge. A better example of this would be Twitter. There doesn’t have to be (and typically isn’t) two edges between nodes. In the case of Twitter, you can follow someone, which represents an edge, but they don’t have to follow you back.

Edges can also have weights. Facebook can add these weights however they please. If you like posts by your sister or message her frequently (I don’t know exactly what metrics Facebook uses), the weight of the edge will increase. This can be used to assess patterns in the relationships of nodes.

As graphs developed, grew, and became more complex, they also became more useful in processes such as data processing and preparation (see directed acyclic graphs).

Last, we’re left with our most explicitly used data structure.

Binary Search Tree

Deep in the heart of your machine lies the kernel, the beating heart of your computer. It’s a very, very busy conglomeration of code that takes care of running all of your processes and makes your operating system what it is. With all the busy things it’s doing, it needs to both keep the CPU busy and operate efficiently. To help it accomplish this, it designates part of its processing power (or hired) a scheduler. This scheduler determines what processes get how much runtime on the CPU. Many different algorithms were developed to help meet the requirements of busy-ness, and in 2007, Linux settled on the Completely Fair Scheduler.

“Wasn’t this article about data struct — ” And that’s where we arrive at our binary search tree, particularly a self-balancing binary search tree, and particularly-particularly a red-black binary search tree.

Previous methods of scheduler deployment needed to use heuristics to aid in determining which processes get the attention of the CPU and required a lot of code and a lot of time itself to make this determination. The red-black tree used in the CPU helped with this; rather than using heuristics, every process is given an equal share of time with the processor. As with any binary search tree, the values with the lowest values are on the left, and the values of higher value are on the right. Processes with the lowest spent execution time are on the left and are prioritized in this manner.

The process on the left is executed until it completes execution, thus removed from the tree, or hits a maximum run time(depending on how many processes they are at a given time), where its execution ceases, and is reinserted into the red-black tree on the right. Since it’s self balancing, the scheduler continually runs and rebalances. It turns out the red-black tree is pretty efficient at doing this inserting and removing operation, as no node is no more than twice the distance of any other path, which means in the worst case scenario, it’s still pretty efficient, which is exactly what that busy, busy kernel needs its scheduler to maintain.

Whoa, that red-black tree is a little more abstract and difficult to understand. As a reminder, I’m trying to show you ways that data structures are implemented directly or are found commonly in the real world, thus justifying you taking the time to learn it. As we’ve seen, we see them used directly and indirectly, whether it’s use in a conga line, a browser, or something as complex as an operating system kernel. The data structures those interviewers ask you about are important to both gain understanding of our world as well as to make it better. So the next time you have that lingering why? when reviewing those data structures, just remember you’re using them every day and may be asked to implement one at your new job.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade