Why Do Arrays Start With Index 0?

Whether you code in JavaScript, Python, Java, PHP, Ruby, or Go, to access the first element of the array you will need to refer to array[0]. This is often confusing for new developers — why do we start numeration from zero? Wouldn’t be array[1] a more obvious choice for the first item?

Photo by Agence Producteurs Locaux Damien Kühn on Unsplash

First, it’s worth to point out that not every programming language follows zero-based numbering. Before I decided to write this post I was aware of Lua breaking that convention, but I was very surprised to discover that this is not such a rare occurrence.

AWK, COBOL, Fortran, R, Julia, Lua, MATLAB, Smalltalk, Wolfram Language — In all of these languages the default index of the first element in an array is one.

(Full list can be found on the Wikipedia).


The most common answer to the array numbering question, points out that zero-based numbering comes from language design itself. In C an array points to the location in the memory, so in expression array[n], n should not be treated as an index, but as an offset from the array’s head.

“Referencing memory by an address and an offset is represented directly in computer hardware on virtually all computer architectures, so this design detail in C makes compilation easier … “ — Wikipedia

To illustrate that, we can run the C program below:

#include <stdio.h>
int main()
{
int data[6] = {1, 2, 3, 4, 5, 6};
int i = 0;

printf("Array address: %p\n", data);

do {
printf("Array[%u] = %p\n", i, (void *)(&data[i]));
i++;
} while(i < 6);
}

Output:


Array address: 0x7ffe9472bad0
Array[0] = 0x7ffe9472bad0
Array[1] = 0x7ffe9472bad4
Array[2] = 0x7ffe9472bad8
Array[3] = 0x7ffe9472badc
Array[4] = 0x7ffe9472bae0
Array[5] = 0x7ffe9472bae4

As we can see on this example, the first element and the array itself points to the same memory location, so it is 0 elements away from the location of the array itself.


On the other hand, I also find out that many answers point to the article written by Dijkstra: Why numbering should start at zero. It explains the problem of how to denote the sub-sequence of natural numbers. I find its reasoning interesting from the mathematical perspective. However, as C language predates this paper it does not seem to be relevant to the question.