Let’s Talk About Data Structures & Algorithms (2)–Array

黃翌軒
Ian 的摳頂小學堂
7 min readJun 29, 2021

This series of articles will focus on some of the commonly used data structures in computer science and their related algorithms or use cases. Contents of this series come from many sources, including “Data Structures & Algorithms”, which is a course I took in National Taiwan University, CSIE department. If you have any questions regarding the contents or sources, you’re welcome to comment below or send me an email!

這篇文章為英文版,如果要閱讀中文版文章的話,請點以下連結。

👉 淺談資料結構與演算法 (2) — Array

Today’s topic is “Array”. The term “array” could have different meanings & definitions in different fields, so we’ll only focus on array in computer science.

What is an Array

First, let’s take a look at wikipedia’s explanation.

a collection of same type data items that can be selected by indices computed at run-time

“indices” is the plural of index. Basically, you could think of an array as lockers and indices as their numbers. An array stores a bunch of data of same type (e.g. integers, floating numbers, characters…)

If we know the number of the locker, we could go straight to it and retrieve the things stored in it. Likewise, if we know the index of the item stored in an array, we could take out the data we want immediately.

Memory as an array

Array is the most basic and important data structure. It is the foundation of other data structures. Why is that? Because the whole memory (RAM) inside our computers could be seen as a giant array. The difference is that memory can only save either 0 or 1 (binary).

When a program executes, it needs to put relevant data into memory so that CPU could retrieve them and perform some calculations. After the calculations are done, it then stored the results back into memory for the program to retrieve them later.

Whenever a program needs to have some places to store data, OS (operating system) will find some space with appropriate size in memory according to its needs. And it will tell the program the “starting location/address” of this space. So, the program could jump to a certain location to retrieve the data it needs afterwards. This is an extremely simplified explanation, you could deep dive into “Computer Architecture” & “Operating Systems” if you want to know more. Since this series will mainly focus on data structures and algorithms, I’ll just leave it here.

Properties

  • fast access by index
    When using arrays in program, normally the variable will store the array’s starting address (reference). So for example, if the starting address of array numbers is 12(it’s just a number I made up) and I want to retrieve the fifth element in numberswhich is numbers[4] (array in most programming languages starts from index 0), OS knows that the element you want stores in address (12 + (size of one block) * 4).
  • fixed size
    An array’s length is fixed when declared. Why is that? Take a look at the picture below. Suppose that the array in the picture is the memory in your computer. A red cell means that there’s data stored in it whereas a white cell means it’s empty.

You could see that the biggest consecutive spaces are four cells. It’s impossible for OS to let you have an array that has more than four cells without removing or relocating others. You probably would say “That’s easy. Why don’t we just split the array and store them separately?” In fact, you could totally do that; however, that won’t be a consecutive space, which means you cannot retrieve data immediately by adding starting address and offset.

common operations

The following are some of the common operations that we would perform on an array.

  • construct(length)
    Construct and get an array of certain length.
  • get(index)
    Get the item of that index in an array, which is super fast since we could do that by adding starting address and offset like we mentioned before.
  • update(index, value)
    Update the item of that index in an array. Still pretty fast since all we do is find that item and change its value.
  • remove(index)
    If all you want to do is simply delete the data in that cell, it’s kind of easy too. However, if we leave that cell with no valid data in it, it will cause some problems like the length of array doesn’t match the number of items in that array and memory waste. As array gets bigger and bigger, it could become a pain in the ass to manage.
  • insert(index, value)
    What if we’re gonna add some new items into an array? We might have some different scenarios.

    1. The cell of that index is empty -> Great! Just store the data into it. Done!

    2. The cell of that index already has data in it -> Move all the elements that have bigger index to the right by one cell and then put the data we want in the cell of that index.

    3. The cell of that index already has data in it. What’s worse, the entire array is full -> Either dump some data or move all the data to a bigger array.

    Things are not too bad in case 1, but not so good if it’s case 2 or 3. With more elements in an array, it would result in longer processing time for these two scenarios. This has something to do with “time complexity”, which we will explain and discuss in next article. At this moment, we just need to realize that unlike other operations, we cannot guarantee that insertion could always be done in a short amount of time.

Ordered Array

Next, we will discuss a special case of array. That is ordered array which means an array that is ordered / sorted with some predefined conditions. Depending on the types of data in the array, it could have different ordering mechanism. For instance, if the array stores a group of people’s heights, we could sort them in ascending or descending order. If we store many people’s name, we could order them alphabetically.

Why do we need to order / sort an array? That’s because it could speed-up a lot when we’re looking for a particular item in an array. Just like when you’re looking up a word in a dictionary (e.g. “order”), if the page you turn to is full of words start with “H”, you would search larger pages since you know “O” comes after “H”. However, this solution only works under the premise of that dictionary sort the words from A to Z.

binary search

The concept we just talked about is actually the root of “binary search”, one of the most iconic searching algorithm. If we want to apply binary search on an array, it must be sorted beforehand.

The word “binary” means that after each search, we could get rid of half of data and just continue searching in the other half. I guessed most of you have the experience of playing “Guess Number”, which is a game that one person choose a number from 1 to 100 and the other must guess what that number is. If the guess is too much or too less, the other must tell the guesser. For example, if the answer is 79 and the initial guess is 50. Then the one who has the answer must reply “Too Less” so that the one who’s guessing knows that he/she only needs to consider 51~100 from now on.

In every search, we find the middle element of the rest of array. Suppose the array is sorted in ascending order. (the most left one is the smallest and the most right one is the biggest) If the element we’re searching for is smaller than the middle element, then we continue our search in the left half. Contrarily, if the element we’re searching for is bigger than the middle element, we continue searching in the right half.

Assume that we have 10000 elements in the array we‘re searching. We would search 10000 times at most to get the element we want if the array is not sorted. On the other hand, we would only search 14 times at most if the array is sorted in advance using binary search. As the size of array gets bigger, the gap will even become bigger.

Insert & Update

We just mentioned the insertion & updating of an ordinary array in the previous section. What about these operations in an ordered array? If we’re gonna insert an element into an ordered array, as long as the element inserted is not located at the end of the ordered array, it must push every element that comes after it one block to the right. After that, it could insert into the right position. For example, there’s an array 1, 2, 4, 7, 10, 14 , if we’re putting 3 into this ordered array, we must push 4, 7, 10, 14 right by one block and put 3 after 2 . Also, when the array gets bigger, it would become more and more time-consuming.

As for updating, it is no longer as simple as finding the element and just updating its value since the order may get corrupted after you altered one of the element’s value. In the example above, if you change 10 to 5 , the array is no longer ordered. You have to take some actions to restore the order.

Alright, that’s all for today! We covered the basic of array and the famous binary search algorithm. In next article, we will talk about “time complexity”. If you like my articles, feel free to give me some claps. I’ll really appreciate it! Thank you.

--

--