Data Abstraction and Problem Solving with C++

Walls and Mirrors. By Frank M. Carrano, Timothy Henry, Janet Prichard.

Vardan Grigoryan (vardanator)
Computer Science Reader
9 min readSep 24, 2019

--

Amazon link to the book. (you could find the same book with examples in Java).

Novice programmers often struggle at understanding data structures and algorithms. Experienced developers suggest reading Introduction to Algorithms by Cormen, Leiserson, Rivest, Stein (usually referred to as CLRS).

Many newcomers and experienced coders who’ve realized early enough that fundamental algorithms and data structures are a must for successful programming career have struggled at “hard” books like the Introduction to Algorithms (though the title says otherwise).

One of the main reasons that programmers find it hard to understand algorithms is being able to measure the complexity of an algorithm (or a data structure’s operations). Another difficulty for beginners is the variety of data structures and pitfalls in their implementations.

I found the “Data Abstraction and Problem Solving with C++” when I was searching for a great introductory material that won’t leave the reader with the easy stuff but will dive into rather complex topics with an easy to follow approach. This book is a pearl for those of you who’ve or had a similar problem. It introduces to data structures and algorithms in a way that is both engaging and fruitful. Most of the topics are discussed in great detail so that even experienced programmers will find it truly attractive.

Why algorithms and data structures are so damn popular?

Because of companies like Google, Facebook, and alike. They tend to ask algorithmic problems at technical interviews. Strong problem-solving skills are a must for engineers working at big companies (and now at small companies, too).

Too many times people (sometimes I refer to developers as people) complain about the problems asked at companies (mostly big ones) during the interviews. Guys writing successful or famous applications get rejected from Google, Facebook, or others because they don’t know how to invert a binary tree.

These companies face challenging engineering problems every day, starting from planning and managing cheap and fast hardware resources and ending with loading the content on the device of the end-user in the fastest way. You might have written a couple of successful mobile or desktop applications, but the approach you choose in solving particular problems in the application defines your level as a programmer. For example, you can design an email client that shows the recent ten emails [in its Home screen, I guess] and it could have the best UI out there; displaying ten recent emails will work smoothly on almost any device, trust me.

Now, let’s suppose the user of your email application will receive hundreds of thousands of emails, say, in 2 years of using your application. When the user will need to search for a particular email (by subject or content or whatever), that’s where your skills may play a significant role. The way you stored the hundred thousand emails and the methods (algorithms) you used to sort and search them will now make you a rockstar developer or a hotdog beggar in the eyes of your end-user. Why so? Suppose you store all the incoming emails in an array.

struct Email {
string subject;
string body;
string from;
date when;
};
...
// let's suppose a million emails is the max for anyone
int MAX_EMAILS = 1000000;
Email inbox[1000000];

We can store the recent ten emails in any form that won’t affect the performance of the application. Issues arise when we try to manipulate thousands of emails stored in the inbox array. What if we want to search for the word “friend” in all emails? We have to scan all the emails in the array and collect ones containing the word “friend” in a separate array.

search(word) {
Email search_results[1000];
for (...) {
if (inbox[i].subject.contains(word)) {
search_results.insert(inbox[i]);
}
}
return search_results;
}

Obviously, the complexity of the algorithm (the speed at which it runs) is linear (i.e. it’s not perfect), so if the number of emails is N, then the number of operations performed to find all the emails containing the search term will be N (in the worst case). More than that, it also starts a search in the subject line of each email (by calling the contains() function). Considering that the search function would also try to find matches in the email body too, the running time will rise even more. So if one operation takes a millisecond, then searching among one hundred thousands of emails will take more than a minute (100 seconds to be exact). I guess you’ve never waited so long when searching for emails in any of your email clients. Here comes the need for proper use of data structures and algorithms.

Just to make the idea clear, let’s suppose we process each incoming email’s subject and body to store each word in a hashtable. The hashtable slot will refer to the email object that contains the word. When the user will search for any word, for example, “friend”, we will return the list referred by the slot of the hashtable having the “friend” key.

search(word) {
return hashtable[word];
}

This is a constant-time operation and in computer science, it’s considered as the ultimate goal of efficiency. In simple words, searching among emails would take less than a second.

Now imagine that you have a great startup that produces some fascinating application or provides internet service. Would you hire someone who wouldn’t master the art of problem-solving, or wouldn’t know when and how to use a hashtable (or other data structures)?

Now let’s explore one of the best books for studying algorithms and data structures.

Mmm… Algorithms

I divide the book into three imaginary sections: “The fundamentals”, “Algorithms and faster data structures”, “Trees and Beyond”.

The fundamentals

Chapter 1 (Data Abstraction: The Walls) introduce the concept of abstraction. The book dives into Abstract Data Types (ADT), which is then used as a base to build data structures introduced in the book. This chapter also has a short intro to C++ classes that will help the reader implement an ADT (on the example of a Bag).

Chapter 2 (Recursion: The Mirrors) is one of my favorite chapters. I first read a detailed explanation of the recursion in this chapter. This was the place that I’ve found out that not every function that calls itself is a recursive function (there are a couple of more rules). The chapter contains plenty of examples such as the Fibonacci sequence, the towers of Hanoi, the binary search algorithm, and more.

Chapter 3 (Array-Based Implementations) and chapter 4 (Link-Based Implementations) start with leveraging arrays to implement abstract data types on the example of the Bag ADT introduced in the first chapter. The third chapter also has an interlude discussing pointers, polymorphism and memory allocation in C++ (topics that will be extensively used in further chapters). Link-based implementations show the reader how to leverage pointers to implement the Bag ADT using a link-based data structure.

Chapter 5 (Recursion as a Problem-Solving Technique) is another great chapter on recursion. It discusses algebraic expressions and solving those expressions using recursion. I skipped those parts to get to the most interesting part of the chapter: Backtracking. You might already encounter the problem of the 8 Queens (how to place 8 queens on the chessboard in a way that they won’t hit each other). It has an elegant solution using backtracking. You should definitely read that part of the chapter.

Chapter 6 (Stack) and Chapter 7 (Stack Implementations) introduce the stack, a rather data structure adapter. Can’t say much about these chapters, the stack is a popular topic in many books.

Chapter 8 (Lists) and Chapter 9 (List Implementations) are great chapters that discuss the linked list in great detail. There is this false feeling when studying linked lists that you seem to understand them unless you try to implement a list. These chapters will put things in order, and you will finally grasp linked lists completely.

Algorithms and faster data structures

Finally, we’ve got to Chapter 10 (Algorithm Efficiency). This chapter will answer all those nasty questions on “Big-O”s and “small P”s and “middle F”s (the last two are fake). You will understand how to measure the efficiency of an algorithm. This will help you estimate the running time of your applications and also find better solutions to further problems that you would encounter.

Chapter 11 (Sorting Algorithms and Their Efficiency) dives into various sorting algorithms such as the selection sort and the merge sort. The chapter also discusses the efficiency of each introduced algorithm. One of the most important chapters in the book.

Chapter 12 (Sorted Lists and Their Implementations) combines the linked list discussed earlier in the first part to sort data and introduce us to the sorted list. The sorted list keeps its elements in sorted order, which might play a great deal in efficiency in applications constantly requiring data to be sorted.

Chapter 13 (Queues and Priority Queues) and Chapter 14 (Queue Implementations) moves further into sorted data structures and introduces priority queues, queues that return the top (max or min) element in constant time and reorder themselves to properly handle the next request. Although the simple Queue (not the priority queue) should’ve been introduced in earlier chapters (as in previous editions of the book), these chapters make a good effort on describing the queue and its various implementations.

Trees and beyond

Chapter 15 (Trees) introduces trees, binary trees, and binary search trees. Applications of the latter are many, so it’s a great tool in a programmer’s inventory. It’s strongly suggested reading this chapter twice if you are new to trees. Almost all advanced data structures are based on trees. Chapter 16 (Tree Implementations) is the continuation of the previous chapter. It shows the various ways to implement trees, the most popular of them is the link-based implementation.

Chapter 17 (Heaps) introduces a variation of binary search trees that adapt the underlying tree to serve specific operations faster than others. For example, the max heap returns the element having the maximum value in constant time (the ultimate goal of efficient operations). Priority queues discussed in previous chapters are an example of a heap.

Chapter 18 (Dictionaries and Their Implementations) is one of the most important chapters in the book. The example of emails and searching among their subjects/contents that we brought at the beginning of the article used a hashtable as an efficient solution to the problem. A dictionary is the same hashtable (it’s like a nickname). This chapter also dives into topics such as collision resolution of hashtable insert operations. As a proven fast data structure, hashtables are used everywhere they fit, so you have to understand them.

Chapter 19 (Balanced Search Trees) is an advanced topic for many readers. However, it is suggested to dive into the chapter as balanced search trees play a fundamental role in data-intensive applications. For example, databases leverage balanced search trees to organize table indexing. You should spend a good deal of time to thoroughly study this chapter along with Chapter 21 (Processing Data in External Storage). Chapter 21 introduces us to B-trees which are extensively used in relational (and not only) databases.

Finally, chapter 20 (Graphs) introduces us to the popular data structure that is used in almost every gigantic internet service in the world. You might’ve heard of the Facebook friends’ graph or Google’s knowledge graph. The graph data structure is extensively used in many applications and services. Though this book doesn’t dive into graphs much, it still represents a good introduction to the topic.

To sum up, understanding and using algorithms and data structures should be considered as the number one priority in your studying plans. Even if you have years of experience in making applications, studying fundamental algorithms and data structures should be the first thing bothering you. It’s worth your time!

Final score

Useful for programmers: 10/10

Fundamental knowledge: 10/10

Real-world projects and examples: 5/10

Detailed and friendly: 10/10

Worth buying: 10/10

--

--