Data Structures — Diving Into Data Structures (Part 1)

Data Structures are often our biggest blind spot.

Omar Elgabry
OmarElgabry's Blog
26 min readNov 9, 2016

--

We naturally think in data structures, don’t we?

When we have a programming problem, we dive into the algorithm, while ignoring the underlying data structure. And even worse, we think that using another data structure won’t make much of a difference, although we could vastly improve the performance of our code by picking an alternative data structure.

What Is A Data Structure

It doesn’t start without asking: What the heck data structure is?

It’s an intentional arrangement of a collection of data that we’ve built.

Intentional Arrangement

The intentional arrangement means the arrangement done on purpose to impose, to enforce, some kind of systematic organization on the data.

Because it’s useful, it makes our lives easier, and it’s easy to manage when you keep related information together.

Data Structures In our Life

We need data structures in our programs because we think this way as human beings.

A recipe is an actual data structure, as is a shopping list, a telephone directory, a dictionary, etc. They all have a structure, they have a format.

Data Structure and Object Oriented Programming

Now if you’re an object oriented programmer, you may be thinking, Well isn’t this what we do with classes and objects?.

I mean, we define these real-world objects in a program because we think this way as human being, or at least we’re supposed to.

And yes, absolutely. Objects are a type of data structure, and not the only one.

Five Fundamental Behaviors

How to access, insert, delete, find, & sort. These are the operations that you will most likely going to perform.

Not all of data structures have the five fundamental behaviors.

For example, many data structures don’t support any kind of searching behavior, It’s just a big collection, a big container of stuff, and If you need to find something, you just go through all of it yourself. And many don’t provide any kind of sorting behavior. Others are naturally sorted.

Each data structure has it’s own different way, or different algorithm for sorting, inserting, finding, …etc, Why? Because, due to the nature of the data structure, there are algorithms used with specific data structure, where some other can’t be used.

The more efficient & suitable the algorithm, the more you will have an optimized data structure. These algorithms might be built-in or implemented by developer to manage and run these data structures.

Always read the language documentation and check the performance of the algorithms used with the underlying data structure. Different algorithms might be used based on the data (it’s size, type, …) you have.

One-Dimensional Arrays

The array is the most fundamental, the most commonly used data structure across all programming languages. The support for one-dimensional arrays is generally built directly into the core language itself.

An array is an ordered collection of items, where each item inside the array has an index.

Arrays

Indexes

The indexes are in order, they are zero-based index in most languages, but, that’s not the case with other few languages.

Size

The simplest arrays are fixed size, also called immutable, meaning unchangeable arrays. They can be created initially at any size. But once the array is created, you can’t add or remove elements from it.

Sometimes the ability to dynamically add or remove elements from it while the program is running is made available in the standard array in a language, and sometimes you have multiple different array types, depending on whether you need a fixed or resizable array.

Data Type

Simple arrays are typically restricted to a specific type of data. It’s an array of integers, or an array of booleans. But some languages allow you to create arrays of just generic objects, meaning you could put different data types.

Multidimensional Arrays

Taking the one dimensional array one step further, we can have arrays with two dimensions.

It’s basically an array of arrays, where each item in this array itself contains another array.

Therefore, any single array element is not accessed with just one index, but we need two numbers to get to it. Sometimes referred to as a matrix or a table because this is, effectively, rows and columns of information.

We can take the two-dimensional array further to three-dimensional arrays and even more.

Jagged Arrays

When we have a Multidimensional array, where each row, each item needs to have different number of items. It’s where Jagged arrays come into play.

It’s a Multidimensional array where each item can have different sizes.

Rows have different number of stones

When To Use Jagged Arrays

If you have a multidimensional array which records the number of sales every day. The first index is for the month, while the second is for the day.

You can notice that Jan has 31 days, while Feb 29, and so on. So, either leaving the irrelevant elements empty, or setting them to 0 or -99 or some other value is not always desirable. But, you shouldn’t have elements that represent impossibilities. Like In February, where days 30 & 31 don’t exist.

So, Using Jagged Arrays, we can get the average of sales in a month by grabbing all the sales in that month, add them all, and divide by the number of days in that month, without having to add logic to figure out the days we should be ignoring.

Resizable Arrays

Most languages provide some kind of resizeable array, or dynamic array, or mutable. In Java, the standard array is fixed-size and a fixed data type, but, you can also create a resizeable array using ArrayList.

Adding & Removing

The location where we add a new element or remove an existing one does matter, because adding or removing an element at the end is faster than at some where else.

So, elements will have to be shifted to the left or to the right, and re-indexed; re-ordered. Therefore, it does have a performance impact.

The way of shifting, differs from language to another, some will be shifting items in place, but, some other will just copy the entire contents of the old array into a new one with the item being added or removed.

Sorting Arrays

Sorting is always computationally intensive, you might need to do it but we want to minimize it. So, keeping aware about how much data we have and how often we’re asking to sort might lead us into choosing different data structures.

Sorting people by height —pleacher.com

The Built-In Sorting Functionality

When we’re sorting arrays, there are things you need to understand the built-in sort functionality.

First, Is the built-in sort functionality will look at how big your array is?. Your language may automatically switch between different sorting algorithms based on the size of the array.

Second, Is the built-in sort functionality will attempt to sort an existing array in place or will create another copy?. Most languages will attempt to sort an existing array in place, while a few will create new copy of the original array to contain the sorted array.

Sorting Custom Objects

In object orientated languages, you often have arrays of your own custom objects, not just arrays of simple numeric or even string values.

When you then ask that array to sort those objects using the built in sort functionality in the language. It simply won’t know how to do it. Because you need to provide a little more information on which property to sort accordingly.

So, for example, sort the users by their id, name, or birth date. This defines the object’s property to sort accordingly.

Usually only needs a few lines and it’s typically called a comparator, or compare function, or compare method.

Searching Arrays

If we wanted to know if a specific value exists somewhere in an array or not, we could loop over array elements, and check the value at that current position, and see if it’s equal to goal or not.

Keep searching even under the bed

The best case scenario is that the first element is the value we’re looking for. The worst case is that the value is at the end or does not appear anywhere in the array. This method is referred to as a linear search or a sequential search. This is an inelegant brute-force method.

While linear searches are simple to understand and they are easy to write and they will work, but, they’re slow. And the more elements you have, the slower they get.

Data Needs To Be Ordered

If there is no order, no predictable sequence to the values in the array, then, there may be no other options than check all the array elements.

But, if the data is ordered. If it has some kind of predictable sequence where the items are in strict ascending, or descending order. Then, there are much better options for searching than a linear search like binary search.

So, data must be ordered in a way so that we can use an algorithm other than linear search. Therefore, having some kind of order to those elements is more and more significant.

The Inherit Challenge of Data Structures

You may sometimes agree to use a slow search ability, because the only way of fixing that is to sort the array, which in turn will add a performance hit that it’s simply not worth it.

So, If you are going to search on an array once, It’s better to have a complexity of O(N) for linear search, than O(NLogN) for sorting + O(LogN) for searching. If, however, you are going to search a lot, then, you may first sort the array once at the beginning, and now, you can search with O(LogN) every time instead of linear search O(N).

You cannot have a data structure that is equally good in all situations. A naturally sorted data structure requires less time in searching for an element, and more time in inserting because it keeps the array sorted. While a basic array requires more time in searching, and less time in inserting elements to the end of the array.

Lists

Lists are pretty simple data structures. It’s structure keeps track of sequence of items.

It’s a collection of items (called nodes) ordered in a linear sequence.

These nodes don’t need to be allocated next to each other in the memory like an array is.

There is a general theoretical programming concept of a list, and there is the specific implementation of a list data structure, which may have taken this basic list idea and added a whole bunch of functionality. Lists are implemented either as linked lists (singly, doubly, circular, …) or as dynamic array.

An Array Vs A List

A list is a different kind of data structure from an array.

The biggest difference is in the idea of direct access Vs sequential access. Arrays allow both; direct and sequential access, while lists allow only sequential access. And this is because the way that these data structures are stored in memory.

The structure of the list doesn’t support numeric index like an array is.

Linked Lists

It’s a collection of nodes, where each node has a value and a link to the next node in the list.

The pointer of the last node points to NULL, or terminator, or a dummy node.

Linked List — Wikipedia

Because of the way it’s built, adding, and removing elements is much, much easier than with an array.

But, with arrays, it requires shifting to the left or to the right to keep its meaningful ordered structure. Even adding elements at the end of the array can still require reallocation of the entire array in order to store the array in a contiguous area of memory.

Doubly Linked List

What we have introduced is a linked list, But to be yet a little more specific, this is a called a “Singly Linked List”. We can also have a “Doubly Linked List”.

Instead of each node having a reference just to the next node, we add one more piece of data that it also has a reference to the previous node as well. So, it allows us to go forward and backward, to traverse the list in any direction.

Doubly Linked List — Wikipedia

Linked lists in most of the languages are typically implemented as doubly linked lists.

The Singly Vs Doubly Linked List

The doubly linked lists require more space per node because it maintains a pointer to the next and previous nodes.

Adding operation is clearly less work in a singly linked list, because doubly linked list requires changing more links than a singly linked list.

For singly linked list, we assume that we insert at the head or after some given node.

In case of adding before some given node, you need to know the node before that given node, which will require you to sequentially access all the nodes until you find the node you’re looking for.

Removing is simpler and potentially more efficient in doubly linked list, because in singly linked list, you need to know the node before the node to be deleted.

Circular Linked List

In a singly linked list, If the next pointer of the last node points back to the first node. Then, it’s “Circular Linked List”. And if the previous pointer of the first node points to the last node as well. Now this would be considered a “Circular Doubly Linked List”.

Circular Linked List — Wikipedia

It’s not common, but it can be useful for certain problems. So, if we get to the end of the list and say next, we just start again at the beginning.

Stacks

Just like the arrays and lists, stacks and queues are also collection of items. They are just different ways we can hold multiple items. It’s last in, first out data structure, with no care about numeric indexes.

It’s a collection of items where we add and remove items to and from the top of the stack.

Stack of dirty dishes

Stack Implementation

A stack can be easily implemented either using an array or a linked list

Usage of Stacks

Stack is not limited to only model the real world situations (same for queues). One of the best uses for a stack in programming is when parsing code or expressions, where you need to do something like validating the correct amount of opening and closing curly braces, square brackets or parenthesis.

The Basic Operations of Stacks

They are: push(), pop(), & peek(). push is for pushing a new element on the top of the stack, and pop will return (and remove) the element at the top, while peek will get the element at the top without removing it.

Stacks Vs Arrays Vs Linked Lists

Working with stack is simpler than working with arrays or linked lists, because there is less you can do with a stack.

This is an intentionally limited, an intentionally restricted data structure. All we do is push and pop and maybe peek. And if you’re trying to do anything else with this stack, you’re using the wrong data structure.

Queues

The key difference between a stack and a queue. Stacks are last in first out (LIFO), while queues are first in first out (FIFO). And as with stacks, we should not even be thinking about numeric indexes.

It’s a collection of items where we add items to the end and remove items from the front of the queue.

Queue of people

Queue Implementation

As with stacks, a queue can be implemented either using an array or a linked list.

Usage of Queues

Queues are very commonly used in concurrency situations to keep track of what tasks are waiting to be performed and making sure we take them in that order.

The Basic Operations of Queues

Just like a stack: add(), remove(), & peek().

Priority Queues

Some languages offer a version of a queue, called a priority queue. This allows you to arrange elements in the queue based on their priority.

It’s a queue, where items with higher priority step ahead of items with lower priority in the queue.

How Priority Queues Works

When you add items that have the same priority, they will queue as normal in a first in, first out order. If something comes along with a higher priority, then it will go ahead of them in the queue.

Defining The Priority

You can define based on what an element has higher, lower, or equal priority. This done by implementing a comparator or a compare function (as when sorting arrays), where you provide your own logic for comparing the priority between elements.

Deque

The deque, pronounced “DEK”, is used when we want to leverage the power of the queue and the stack, where you can add or remove from start or end.

It’s a queue and a stack at the same time.

Associative Arrays

They give the ability to use meaningful keys to work with elements in our data structures, rather than working with numeric indexes as keys. This gives a meaningful relationship between the key and the value.

It is a collection of key-value pairs.

The implementation of associative arrays have different names. In Objective-C and Python, they’re called dictionaries.

The Order of Elements

Unlike a basic array, In an associative array the keys do not need to be in any specific order. Because order is not a concern in associative arrays.

Sure, you might find it useful to sort them by key. You might find it useful to sort them by value, or, you might not need to sort it at all.

Key Duplicates

The same way that you don’t get the same index number appearing twice in a basic array, there can be no duplicate keys, and keys must be unique in an associative array.

Keys & Values Data Types

You can usually use any data type as either the key or the value. It is common to use a string as a key.

Most associate arrays, whether they are called dictionaries or maps or hashes, are implemented using hash table data structure. So, we’ll start by hashing and then dive into hash tables.

Hashing

Hashing is a valuable concept in programming. It’s used, not just in data structures, but in security, cryptography, graphics, audio.

It’s a way to take our data and run it through a function, which will return a small, simplified reference generated from that original data.

Hashing is commonly used with passwords

The reference might just be an integer, or letters and numbers, …etc.

Why Using Hashing?

Because being able to take a complex object, and hash it down to a single integer representation. So, we can use that integer value to get to a certain location in the data structure.

Hashing Is Not Encryption

Hash functions are not reversible; They are one way. So, you cannot convert a hash value result back to the original data. Therefore, you lose information when hashing, that’s okay, that’s intentional.

Hashing Vs Encryption — ssl2buy.com

Hashing Function Implementation

Say we have a person class defined, and we want a hash function defined on this class. The hash function should return a specific single reference (usually an an integer) for a specific person object. This single integer is generated using the data in a person object (firstname, lastname, birth date, …etc).

Hashing Rules

  1. If we take the exact same object, with this same data, and feed it into the hash function again, I would expect the same hash result.
  2. If you have two different objects that you consider equal, they should return the same hash value.
  3. While two equal objects should produce the same hash value, two equal hash values does not guarantee they came from equal objects, Why?Because two different objects might, under some circumstances, deliver the same result out of a hash function (See Hashing Collision).

Hashing Collision

It’s when we have different objects with different data, but giving the same hash value result.

It could be because the hashing function is simple hash function, but it’s also possible even with more complex hash functions. In most cases that’s okay, we can manage the hash collision.

Hash Tables

The idea of hashing was fundamental to understand the hash table data structure.

It’s a typical data structure to implement an associative arrays; mapping keys to values.

Mapping numbers (keys) to boxes

Hash Tables Vs Arrays Vs Lined List

The big benefit of hash tables over arrays and over linked lists is they’re very fast, both for looking at whether an item exists or finding a specific item in a hash table, and for inserting and deleting items.

How Hash Tables Work?

A small phone book as a hash table — Wikipedia

When a hash table is created, it’s really internally a very array-based data structure, and can usually be created with an initial capacity. The hash table consists of an array of slots, or buckets which contain the values.

When adding a key-value pair, It will take our key and it’ll run it through the hash function, getting a specific hash value out (usually an integer).

If this hash value is large, you might need to simplify it with respect to the current size of the hash table.

Then, it will assign the value to an array element with the index equals to the returned hash integer value.

Accessing a value given a key follows the same idea. It will take that key, run it through the exact same hash function, and it can then go directly to a specific location that contains the value we’re looking for.

There’s no linear search, no binary search, no traversing a list. We just go straight to the element we need.

Managing Collision

We can expect that a hash collision will arise; when we get the same hash value for different keys. But, how to manage it?

When we add a new key value pair, and collision happens, It’s going to put that value in the same location, even if there’s another existing value in that location.

Now, hash table implementations have different ways to automatically deal with this. These range from that each location contains a simple collection, like an array, or a linked list.

So, we would still be able to get to any location very fast, but once we get to the location with multiple values, the hash table will traverse that internal list to find what we’re looking for.

Each node in the linked list in turn will contain the value associated with a specific key. In addition, It will contain that key, or a reference to that key. Because in case of multiple values inside the same location, we need to know which key this value belongs to.

Hash collision resolved by separate chaining — Wikipedia

This is what is called the “Separate Chaining Technique” in hash tables. Now, there are other techniques for managing collisions inside hash tables, including open addressing, Cuckoo hashing, hop-scotch, and Robin Hood hashing.

Writing Hash Functions

When you want to store key-value pairs using hash table data structure, It’s probably most of objects already have hash function. The default behavior is usually returns an integer; calculated from the memory address of that object.

If you ever override the equality behavior in your class, you must override the hash behavior. Because hash codes are so tied to equality, And, if you have two objects that you consider equal, they should return the same hash value.

Again, If you change what it means for your objects to be equal, you should also change what it means to hash these objects.

String objects already overridden their own equality and hashing behavior. So if you have two separate string objects, they will still count as equal, and return the same hash, if they have the same value, even if they’re actually separate string objects allocated in different parts of memory.

Sets

When all what you need is a big container you can put a bunch of items into it, where you don’t care about the sequence.

There is no specific sequence as with a linked list, or a stack, or a queue. There is no key-value pairs as with a hash table.

It’s an unordered collection of items, with no repeated values.

By unordered, I mean there is no index like an array.

A grocery bag of food

Set Implementation

Sets are actually use the same idea of hash tables data structure most of the time. But, instead of key-value pairs (hashing a key and store it’s value), when you’re using a set, the key is also considered as the value (or the value is assigned to a dummy or a default value).

So, to get any specific value, or any specific object in the set, you need to have the object itself. And the only reason to do this is to check for it’s existence. This is often referred to as “checking membership”.

Sets can be implemented using a self-balancing binary search tree for sorted sets, or a hash table for unsorted sets.

The Advantage of Sets

Unlike an array, or values in an associative array, or a linked list, sets do not allow duplicates. You cannot add the same object, the same value twice to the same set.

Sets are designed for very fast lookup, to very quickly be able to see if we already have a value contained in a collection.

You can also iterate through all the elements in a set, you just may not have any guaranteed order.

Trees

The idea of a tree data structure is that we have a collection of nodes, and the nodes have connections, they have links between each other.

This sounds similar to linked lists. But in a linked list, we always go from one node to a single specific next node. While in a tree, each node might link to one, or two, or more nodes.

It’s a collection of nodes, where each node might link to one, or two, or more nodes.

A Tree

Tree Terminologies

There are some terminologies that come with the tree. You can find more about them here. They are essential to understand and work with trees.

Binary Trees

A binary tree is just a tree with maximum of two child nodes for any parent node. Binary trees are often used for implementing a wonderfully searchable structure called a “Binary Search Tree” or “BST”.

Binary Search Trees (BST)

It’s a specific type of binary tree, where the left child node is less than its parent, and a right child node is greater than its parent.

A Binary Search Tree (BST) — Wikipedia

How Binary Search Trees Work?

The the rule is that the left child node must be less than its parents, and a right child node must be greater than its parents, and that rule follows all the way down in the tree.

And as we insert new nodes with values, the rule will always be followed to make sure that the tree stays sorted.

So, It is a data structure that naturally stays sorted and it’s sometimes called “A Sorted Tree” or “An Ordered Tree”.

Storing Nodes As Key-Value Pairs

A binary search trees are often used to store key value pairs, meaning, the nodes consists of a key, and an associated value. And it’s the key that would be used to sort the nodes accordingly in a binary search tree.

Duplicates

You can’t have duplicate keys, just as you don’t have duplicate keys in a hash table or even an array.

Adding & Accessing Nodes

Adding & Accessing nodes follows the same rule mentioned above. If the current node is less than, then go right, if greater than, go left.

Retrieve Nodes In Order

The other benefit is binary search trees are staying sorted. So, If we retrieve the items from a left to the right, bottom to top, we will get them all out in order.

Unbalanced Tree

It’s when the tree has more levels of nodes on the right hand side than on the left (or vice-versa).

Although it’s unusual for this kind of thing to happen, but, we can’t always guarantee that the way data is added will allow us to build a tree with a perfectly symmetrical structure all the way down.

In this case we say that our tree is unbalanced; There are more levels on one side than on the other. And we would have to perform more checks to find or insert or delete any values on the right hand side than we would on the left (or vice-versa).

Binary Search Tree Implementations

What we have been talking about is the abstract idea of a binary search tree data structure. But, there are several implementations of this binary search tree idea that are self-balancing binary search trees.

The important idea with balancing a binary search tree is that the number of levels are roughly equal to each other. We don’t have three levels on the left and twenty on the right.

Examples of self-balancing trees could be: Red-Black, AVL or Adelson-Velskii and Landis’, Splay, Scapegoat trees, and more.

Binary Search Tree Vs Hash Table

Both are fast for insertion, fast for deletion, fast for accessing any element even at large sizes. But, because binary search trees stay sorted, they allow us to get all the items out of the tree in order, where the hash table doesn’t guarantee a specific order.

Heaps

Heaps are usually implemented using the idea of a binary tree, not a binary search tree but still, a binary tree. They’re a way of implementing other abstract data types like the priority queue.

It’s a specific type of binary tree, where we add nodes from top to bottom, left to right, and child nodes must be less (or greater) than or equal their parents.

Max Heap — Wikipedia

Therefore, we completely fill out any level before moving on to the next. So we don’t have to worry about the tree becoming unbalanced like a binary search tree can.

Min Vs Max Heap

A min heap states that any child node must be greater than (or equal) its parent node, while a max heap states that any child node must be less than (or equal) its parent node. However, we don’t care if a node is less than or greater than it’s sibling.

How Heaps Work?

So, In case of Min Heap:

  1. We keep adding elements from top to bottom, left to right,
  2. Then, compare with the parent node; Is it less than its parent?
  3. If so, then swap the node with it’s parent,
  4. Keep doing steps from 2 through 3 until the node is greater than it’s parent (or it becomes the root node).

This little swapping of nodes is how a heap keeps itself organized.

Heap Is Not A Fully Sorted

Unlike a binary search tree, which does stay sorted, and where we can easily traverse the tree and retrieve everything out in order.

Because if you notice, that at any particular level past the root, the values do not have to be in any specific order, just as long as they’re all greater (or less) than their parent.

One of the benefits of this is a heap does not have to do as much reorganization of itself as may be necessary with a binary search tree.

The one thing we can be sure about, is that the parent node will always be less or greater than it’s child nodes, and hence the minimum or the maximum value will be always at the top.

And, therefore, heaps are most useful for the idea of a priority queue.

Graphs

The limitations of a tree are no longer exist here. A node can link to multiple other nodes, no specific sequence, no root node.

It’s a collection of nodes, where a node can link to multiple other nodes, no specific sequence, no root node.

A graph of a social network

Graph Theory In Mathematics

Because graphs in computer science are so closely linked to graph theory in mathematics. It is common to hear mathematical terms used. So, In graph theory, we call the nodes “vertices”, and the links between them are referred to as “edges”.

Graphs Usage

We could use a graph to model a social network with each node a person. Or, modeling the distances between cities. We could model a ethernet network in an office, or in an entire building, or a city.

Direct & Undirect Graphs

We can also say whether those edges should have a direction to them or not.

In some situations, it makes sense that any edge, any connection between two vertices, is only one-way; So, node A is connected to node B, while the reverse is not true; node B is NOT connect to node A.

In other situations, you might want to be able to follow that edge, that link, in either direction; So, node A is connected to node B, and also node B is connected to node A.

Weighted Graphs

You can also give every edge a weight; associating a number, with each of the edges.

You could do this to represent, say, the distances between cities, if you were trying to calculate the shortest route between multiple locations. Or, you could use a weight to indicate which edges take priority over other edges.

Abstract Data Types (ADTs)

Before diving into the abstract data types, what they are, and the difference between them and other concepts. Let’s define what’s meant by a data type.

Data Type

A variable’s data type determines the values it may contain, plus the operations that may be performed on it.

Data Types consists of: Primitive, Complex or Composite, and Abstract Data Types.

An Abstract Data Type (ADT)

It’s a data type, just like integers and booleans primitive data types. An ADT consist not only of operations, but also of values of the underlying data and of constraints on the operations.

A constrains for a stack would be that each pop always returns the most recently pushed item that has not been popped yet.

The actual implementation for an ADT is abstracted away, and we don’t have to care about. So, how a stack is actually implemented is not that important. A stack could be and often is implemented behind the scenes using a dynamic array, but it could be implemented with a linked list instead. It really doesn’t matter.

Lists, stacks, queues, and more are all abstract data types.

An Abstract Data Type & Data Structure

ADT is not an alternative to a data structure, they are different concepts.

When we say the stack data structure, we refer to how stacks are implemented and arrangement in the memory. But, when we say the stack ADT, we refer to the stack data type which has set of defined operations, the operations constrains, and the possible values.

We don’t care about the underlying implementation, same as when using integers, we are abstracted away from how integers are represented in the memory, we are only interested in the operations (add, subtract, …), and the possible values (…, −2, −1, 0, 1, 2, …).

Data structures can implement one or more particular abstract data types (ADT), which specify the operations that can be performed on a data structure.

An Abstract Data Type Is NOT An Abstract Class

Both have nothing to do with each other. An abstract class defines only the operations, sometimes with comments that describe the constraints, leaving the actual implementation to the inheriting classes.

Although ADTs are often implemented as Interfaces (abstract classes) as in Java. For example, The List interface in Java defines the operations of the List ADT, while the inheriting classes define the actual implementation (data structure).

But, this is not always the case. For example, Java has a stack class, It is a regular concrete class, but a stack is also an abstract data type, meaning, it represents the idea of having a last in, first out data structure. So, stack is a real concrete class and it’s also an abstract data type.

An Abstract Data Types Is A Theoretical Concept

An abstract data type (ADT) is a theoretical concept that’s irrelevant to programming keywords.

So, If we have a stack, we have a last in, first out data structure, that we can push and pop and peek. That’s what we expect a stack to be able to do. Whether a stack is implemented as a concrete class or whatever, It still have the idea of a last in, first out data structure.

Conclusion

Deciding on The Data Structure

There are some key questions you need to ask before deciding on the data structure

  • How much data do you have?
  • How often does it change?
  • Do you need to sort it, do you need to search it? Faster to Access, Insert, or Delete?

If any place where you only have trivial amounts of data, it isn’t going to matter very much. Unless of course, those amounts of data change so frequently, or you have large amounts of data.

Immutable Data Structures

Now, a general principle that immutable or fixed size data structures tend to be faster and smaller in memory than mutable ones.

Why should we restrict the features of arrays? The reason is, that the more constraints you have (like fixed size, specific data type, …), the faster and smaller your data structure is able to be.

The pitfalls of having a data structure with many features (non-restricted) is the compiler cannot allocate the ideal amount of space, therefore, it must introduce overhead to support the flexibility of the data structure (like different data types, …).

Improve Your Code Performance

If you’re looking at existing code to see what to improve on, then see whatever holds the most data. There is no better way to really get the most out of these data structures than when you see some of the performance improvements you can get, just from simply changing from one structure to another.

Differences In Languages

The algorithms used with each data structure differs from one language to another, and the actual implementation of the data structure differs from one language to another. So, It’s worth to look closely at your language documentation.

--

--

Omar Elgabry
OmarElgabry's Blog

Software Engineer. Going to the moon 🌑. When I die, turn my blog into a story. @https://www.linkedin.com/in/omarelgabry