Low-level implementation of array

Nazar Ivanchuk
The Startup
Published in
5 min readNov 5, 2019

Over the past few years, I conducted dozens of interviews for a software developer position. During an interview, I always try to devote some time to the basic computer science questions. Usually, I start with array related questions.

  • Why it takes constant time Θ(1) to access an element in an array?
  • How can we compute the address of any given element of an array?
  • How to represent a multidimensional array in memory space?

It’s surprising how many developers are not able to answer these simple questions. Meanwhile, they manage to answer questions about more complicated data structures like List, LinkedList, Map, etc.. This fact urged me to write this article. Here we will have a deal with low-level implementation of one the most common data structure. Let’s start…

An array is a data type that contains all members(elements) of the same type. An element can be selected from the array by the index with an integer value (or with some other value which might be represented as an integer, such as character, enumerated and boolean type). Below is an array representation in the memory(let’s assume that array occupies contiguous locations in memory).

The base address is the address of the first element of the array and the lowest memory cell. The next elements follow the first in the memory directly. Note: there is no requirement that the index starts at zero. It may start with any number as long as they are contiguous. It will be more straightforward to discuss array if the first index is zero.

If our array is in the contiguous memory location and the first element index of the array is zero, then accessing an element of an array is simple. For obtaining an element from the array we can use the following formula:

Element_Address = Base_Address + index * Element_Size

The Element_Size item is a number of bytes that each array element occupies. For example, Element_Size for the short type is 2 bytes. And so on.

Consider the following Java array declaration:

int [] tenInts = new int [10];

To access an element of the tenInts, assuming 4-byte integer, we will use this calculation:

Element_Address = AddressOf(tenInt) + index * 4

When we deal with assembler language we actually have to do this calculation manually rather than having the compiler do the work for us. For getting the element from the array we will use the code as follows:

mov(index, ebx);

mov(tenInt[ebx * 4], eax);

Multidimensional array

Most CPUs can easily handle one-dimensional arrays. Unfortunately, there is no easy way to access elements of multidimensional arrays. For that reason, all matrices are presented as a one-dimensional array with additional information about the matrix dimension. A matrix with dimension h * w can be represented by table M in memory.

Mapping a 4x4 matrix to sequential memory locations

While developing we don’t really care about mapping function as long as two things occur.

  • No two entries in the array occupy the same memory location
  • Each element in the array always maps to the same memory location

There is a large number of functions that fit this bill. However, modern high-level languages usually use one of these functions: row-major ordering and column-major ordering.

Row-major ordering assigns array elements to successive memory locations by moving across the rows and then down the columns. The image below demonstrates this mapping.

Row-major ordering for a 4x4 matrix

This method employed by most high-level programming languages: C/C++, Java, etc… The row-major ordering method is very easy to implement and use in machine language. The transformation from a matrix to a linear sequence is highly intuitive.

With the slight modification to the formula for computing the address of an element of a one-dimensional array. We can compute the offset for a 4x4 two-dimensional row-major ordered array given an access of this form:

A[columnIndex][rowIndex]

… using the following formula:

Element_Address = Base_Address + (columnIndex * row_size + rowIndex) * Element_Size

  • Base_Address is the address of the first element of the array (A[0][0])
  • Element_Size is the size of an individual element in bytes
  • row_size is the number of elements in a single row

The formula can be adapted to the three-dimensional array. It will be slightly more complex. Let’s consider a Java array declaration:

int [][][] A = new int[depth_size][col_size][row_size];

The computation for obtaining an element from the array is the following

Element_Address = Base_Address + ((depthIndex * col_size + columnIndex) * row_size + rowIndex) * Element_Size

For a four-dimensional array with the following declaration

int [][][][] A = new int[bounds0][bounds1][bounds2][bounds3];

The calculation of the element address in a four-dimensional A[i][j][k][m]:

Element_Address = Base_Address ((( i * bounds1 * j) * bounds2 + k) * bounds3 + m) * Element_Size

Column-major ordering is the other common array element address function. Used in Fortran, MATLAB, GNU Octave, S-Plus, R, Julia, etc.. The picture below shows the representation of column-major ordering array.

Columns-major array element ordering

The formula for computing address for this column-major ordering is very similar to the one that we already reviewed (row-major ordering). We just need to reverse the order of the index and size variables in the computations.

The formula to determine an element in a two-dimensional column-major array is the following:

Element_Address = Base_Address + (rowIndex * column_size + columnIndex) * Element_Size

and for a three-dimensional columns-major array is the following:

Element_Address = Base_Address + ((rowIndex * column_size + columnIndex) * depthIndex) * Element_Size

Accessing an element of a multidimensional array in a high-level language is very easy. For that reason, a lot of developers will do so without any consideration for what is going on under the hood.

--

--