Arrays Start at Zero: What’s up with that?

Jonathan Bluks
14 min readFeb 19, 2019

--

I see…arrays…Everywhere! (Source: Matthew Henry on Unsplash. Aside: My career as a software developer actually started in these very buildings.).

I have recently been exploring the world of Data Science through an online intro program. It begins by introducing Python, the basics of programming using Python, and of course, arrays. Particularly important for “data”.

Python, like many programming languages, uses 0-based indexing for arrays, and the course contained a link to a post intended to explain this situation to a person with no programming experience.

It is an interesting read that sheds factual, historical, and social light onto the origin of 0-indexed arrays, and points out that many decisions are arbitrary and not necessarily because they are best for us humans.

[Spoiler Alert: IBM’s use of computers for yacht races may have influenced the use of 0-based indexing in their systems at MIT in the mid-1960s. That decision stuck and found it’s way into most programming languages today.]

A summary of the post is on the Zero-Based Indexing Wikipedia page, as well as a general overview of the various ideas around 0 vs 1-based indexing.

While this account may be accurate, to a new programmer with little experience, the details are hard to grasp, and still leaves an unresolved feeling around the question. The whole point of this tangent in the material was to try and answer an assumed question in the budding programmer’s mind:

Why don’t arrays start at 1? Don’t we usually start counting things at 1? What’s up with that?

While it may be a quirk of history that led to a potentially confusing convention, what I would like to suggest is that 0-based indexing is not that foreign of an idea, and is something that we regularly deal with in other areas of life. Understanding that can help a new programmer adapt to a convention that they will likely be using for the rest of their career.

tl;dr

Indexing from 0 occurs in various places in the life of humans, and is not specifically a programming language issue. We generally focus on counting from 1 because we count things we can see, not things we can’t. But in many cases, such as the passage of time, we often use 0-based counting. While array indexing seems like an arbitrary choice, in a binary system using binary-based addressing in computer memory, 0-based indexing is a reasonable choice when working with arrays.

Arrays Start at 1…

Arrays starting at 1: It’s a bit of a joke in the programmer community.

A commonly occurring joke in the programming community is variations of the “Arrays Start at 1” meme. They always make me chuckle, and poke fun at the opinionated nature of programmers and our sense of the “rightness” and “wrongness” of things.

The crux of the joke is that most programming languages index arrays from 0, but as humans, we more naturally start counting from 1.

To an aspiring programmer, this seems kinda confusing at first.

An array is just a sequential list of items. If you know the address of an item, then you can find it in the list. An address is a sequential list of numbers with a starting and ending point known as an index. In real life, think of the floors of an apartment building, or house numbers on a street. Arrays in many programming languages look like this:

You’ll note that while there are 5 items, the indexing only goes up to 4, and that the 1st item is index 0. This is the basis of all Arrays start at 1 memes.

The new programmer then asks:

Why is the 1st position of an array 0? The 2nd position 1, the 3rd position 2…and so on, such that the nth position is n-1? Shouldn’t the 1st element be position 1 and the nth position be the nth element?

Why Might this be?

Since most programming languages work this way, how can we understand it in a way that makes some logical sense? There are, in fact other domains of our lives where this type of issue shows up.

1 | Counting

Mmmmm…donuts. Look kinda like 0s…but yummier!

It seems like the origin of numbers would have begun in counting things. Whether it was stars, apples, sheep, seashells, or donuts, humans probably started counting physical objects they could see, hear, and touch. It doesn’t make sense to count things we can’t experience, so starting from 1 is completely logical and intuitive. Physical objects, in that way are binary — they exist, or they don’t exist. If they don’t exist, it doesn’t really matter because we only care about counting things that do exist.

In fact, we tend to orient ourselves to what exists. We naturally think in terms of shapes and color, not negative space. We see the foreground, and not the background. We pay attention to sound more than we do the absence of sound, and consider music the arrangement of sounds, not of silences. It takes a certain kind of genius to make music out of silence.

So when we start counting physical objects we generally start with 1, 2, 3, 4…etc all the way up to 9.

But what happens after 9? We get 10.

Hmmm…why is there a zero there? Shouldn’t we have 11, which would be the 1st thing of “ten” things?

And actually, if we think about it, we did have a 0 at the beginning of our number sequence, we just left it out and started at 1. In our common decimal system, the 0 is ever-present, but it is often implied, so we leave it out of our general day to day talk. Especially when talking about physical things.

Going back to elementary school math when we first learned about numbers, we learned that numbers are constructed as follows:

1977 = (1 x 1000) + (9 x 100) + (7 x 10) + (7 x 1)

We talk about the places of a number: in the thousands place we have 1, in the hundreds place we have 9, and so on. Meaning we have 1 1000, 9 100s, 7 10s and 7 1s. This can be rewritten as:

1977 = (1 x 10³) + (9 x 10²) + (7 x 10¹) + (7x10⁰)

And now we have a Base-10 numbering system, commonly known as the decimal system. There are 10 possible base digits in the set of 0,1,2,3,4,5,6,7,8,9. If you count out 10 numbers in sequence starting from 0, the 10th number will be 9. Also notice that to get the 1s position, we take the exponential of 10⁰. So for any digit position in a number of n digits, the exponent will always be in the range of 0 → n-1.

In a way, to determine the position of a particular digit in a number, we are using a 0-based indexing system. n=0 is used to determine the 1st digit position, n=1 is used to determine the 2nd digit position, etc.

Now consider the number 2000:

2000 = (2 x 1000) + (0 x 100) + (0 x 10) + (0 x 1)

This gives us the explanation of why zero is important. Since we have 2 1000s, but 0 100s, 0 10s, and 0 1s, the 0 functions as a placeholder saying that we have a position, but the position is empty.

Therefore, for the number 10, which comes after the number 9, we have 1 “10”, and 0 “1s”. For the number 11, we would have 1 “10” and 1 “1”.

And of course, technically speaking there is always a zero present, but there isn’t really any point in including it, since it has no effect on the value of the number, and any multiplier of 0 is always 0.

2000 = (2 x 1000) + (0 x 100) + (0 x 10) + (0 x 1) + 0

As humans, we tend to focus on what is, more than what isn’t. Yet what isn’t is always there. Our numbering system reflects the possibility of representing things that exist, and their absence. Even in our familiar decimal system while we focus on counting in 1s, the 0 is always there behind the scenes.

2 | Dates

21st Century Fox…Wait, isn’t it the year 2019?

Dates are a more obvious day-to-day example, and we actually use BOTH 0-based counting and 1-based counting in our dates. For example, it is currently the year 2019, and yet we are in the 21st century. Previously, we were in the 20th century, but it could be referred to as the “Nineteen Hundreds”, the 19th century was the “Eighteen Hundreds”, and so on.

Why? Because at some point a guy decided our western calendar was setup to start from year 0 for political and religious reasons. He saw the birth of Christ as more significant than a ruler from that time, and wanted to make the calendar reflect a different aspect of history. Like a “zero-point” in history tied to the point in time when Jesus apparently showed up. And while this has become our Western Calendar, other calendar systems do it differently and start from 1.

The Western Calendar Starts at 0.

Starting from 0 kinda makes sense — depending on the perspective you take.

For example, we count babies ages in terms of months at first because they haven’t been alive a full year. We just leave off the part about them being “0 years…and 5 months old”. Since they are less than a year old, we focus on the more relevant units like days and months. We have to use smaller units before we get to the larger unit of 1 year.

With years, we are counting in reference to the number of years passed. And a year seems to be the sweet spot of not too small, but not too large for measuring the span of a human life.

The current age-year we are in will always be n — 1, and the years passed since our birth will always be n. So when you turn 20 years old, you are beginning your 21st year of life, because the 1st year of your life was the year before you turned 1 year old. That would make it your 0 year.

An exception to this is months and days. We consider January as the 1st month and then count forward, so on January 15, we say “we are half way through 1st month of the year”. And we do the same with days of the month, and days of weeks: “It is the 1st day of January. Sunday is the 1st day of the week.”

That means we think about dates using both a 0-based counting system, AND a 1-based counting system. So, January 1st, 2019 can be written as the date 1/1/2019 and uses 1-based numbering for the month and day, but 0-based numbering for the year.

Hmmm…that seems kinda inconsistent and confusing. But, what are you gonna do? ¯\_(ツ)_/¯

3 | Time

Look closely. If you count the 0, or first position, of a stopwatch as “1”…then at 5 seconds, you would be in position 6…

With time, we also start at 0.

For example, stopwatches start at the 0th second. In other words, no seconds have passed yet, which makes logical sense. You are in the 0th position, until you start counting. But once a second passes, we then are in the 1st position, 2nd position, 3rd position…and so on. It just so happens, that if you start from 0, the positions and the seconds line up. And, if we were counting milliseconds, we would have a value within the 0th second — a certain amount of milliseconds before the 1st second. So the 0th second could be an index for the number of milliseconds that have passed.

If we were to use 1-based indexing, then we would actually be in position 2 on the watch for the 1st second. Once we get to 5 seconds, we would actually be in position 6, and so on. The nth second would always be position n + 1, which does seem kind of confusing. So 0 seconds makes more sense as the 0th position.

Another example, the 24-hour clock, similarly starts from zero and is based around hours. So the start of the clock is 0 hours, 0 minutes, and 0 seconds. And the last hour of the day is the hour 23. The clock rolls over to 0 at 23:59:59, which is the last possible moment of the 24th hour.

The 24-hour clock is the most commonly used time format in the world, and is not in any way modern, but goes back to the Egyptian days.

In the 12-hour system, we never have to face a 0, but instead we use the number 12, with an AM/PM indicator.

(Raise your 🖐 if you have ever been late for work because you mixed up AM/PM on your alarm…That’s the human equivalent of a programmer’s “off-by-one error” when indexing arrays.)

So we see that in counting time, it is commonly accepted to start from 0, counting forward towards the next unit.

3 | Events and Places

Visualizations of the impact of nuclear detonations: http://www.carloslabs.com/ground-zero-2-map/

When we talk about significant events and the point of origin, we often refer to “Ground Zero” as the point on the ground from which the event originated. There was a particular state at a certain point in time and space, and then that state changed, and the effects rippled out from that point.

According to Wikipedia, it is a military term originating from nuclear testing, but it has become a commonly used expression that represents the starting point of a significant event. With it’s origins based in nuclear usage, the type of events are catastrophic ones.

So while we don’t use the phrase “Ground One”, we very easily understand the idea of 0 as a starting point. We do however also use the phrase “Back to Square 1”. Whatever the origin, this does suggest going back to a “1st position”. In the case of Ground Zero, there is the notion of a single origin out of which effects emanate, while “Back to Square 1” suggests not a geographic location, but a linear journey with many steps.

Again, like in the other examples, we see cases where both 0-based counting, and 1-based counting are used interchangeably to indicate a position.

4 | Computers

Typical Punched Card Commonly used in the 20th century…err…the 1900s…err…

In electrical computers, everything is based around being either “on” or “off”. In the old days of punched cards (inspired by looms) a position on the card either had a hole, or no hole. This is a Base-2 system, or a binary system.

In binary, the “0-state” is the opposite of the “1-state”, so it’s not technically used for counting the way our numerical digits are. Computers don’t care about what type data is and don’t care about numbers the way we do. Binary is simply a signalling system that encodes things we do care about. But to create a numbering system using binary, we need a start digit. We consider 0 to be less than 1, so it seems like counting from 0 in binary makes the most sense.

Counting in binary works exactly the same as the decimal system but there are only 2 digits to represent all numbers: 0 and 1.

For example, consider the number 13:

13 in binary is 1101
which is (1 x 8) + (1 x 4) + (0 x 2) + (1 x 1)
which is derived from (1 x 2³) + (1 x 2²) + (0 x 2¹) + (1 x 2⁰)

Each position in binary is called a “bit”, as in a “binary digit”. So while technically 4 bits can have 16 possible values, the values can only be 0–15 because 0 is a possible value. And, like with decimals we again see that in order to create the positions for each digit, we start with a zero-based counting system for the exponent sequence.

So in a binary system, it makes sense that 0 is the first “thing”, and 1 is the second “thing”, and go up from there. If we had a very small computer that only had 2 possible positions for memory, it would make sense if the first position was 0, and the second position was 1. In fact, computer memory is built this way, and the addressing locations in computer memory start at the smallest binary address, 0.

Typical way that memory is addressed in a computer: Beginning from the smallest possible binary number. In this case, there are 8 bits (1 byte), which creates 256 potential addresses corresponding to the decimal values of 0–255

In the case of computers, our starting position is limited to two possible values, 0 or 1. While we can combine those values together to create larger numbers that match the numbers in our decimal system, and then ignore the 0 position, from the perspective of a computer, starting from 0 makes more sense.

The Base-Index Choices

Well, we’re pretty much screwed.

When building a computer and designing a programming language, there are 4 possible choices for how to index an array:

  1. Computer Centric: Always start counting from 0 as seems natural in a binary system.
  2. Human Centric: Always start counting from 1 as humans are generally accustomed to — although as we have seen, not exclusively so.
  3. Context/Domain Centric: Allow the user to specify where they want their indexing to start from based on their preferences. Behind the scenes, the computer can convert to whatever indexing system the hardware/assembly language is using.
  4. Arbitrary: Always start counting from some other arbitrary number…2? 3? 9? 42?

#4 wouldn’t make much sense since you might as well pick either 0 or 1 if there was no other good reason to pick a different base index.

#3 provides flexibility and allows for the reality that indexing from 0 or 1 may be too limiting — there may be situations where some other index makes sense regarding some data set. The problem with allowing the index to change would be keeping track of what the starting index is across the code base. If different parts of a codebase were using different indexes, things could get confusing.

That leaves #1 and #2 — the source of all those sweet, sweet memes. For better or worse, the most popular languages currently in use all use 0-based indexing. There are a few other languages that use 1-based indexing, but likely as a professional developer you will be stuck using 0-based indexing.

There is one saving grace that many languages use so that you can avoid all of this grief: higher order functions. It is common to operate on arrays doing things like mapping, filtering, sorting, finding, forEach-ing, and reducing. In these cases, the index usually doesn’t matter, so as a programmer you can skip all this 1 versus 0 nonsense, and just get the result you want.

At least until you need to get a value from a specific position…then it’s back to Square Zero. 0️⃣

Disclaimer: This is post is not intended as an argument for or against a particular approach to indexing, but simply providing some context to the most common indexing in programming languages widely used, and how to think about it in a way that is practical. I accept that there may be other historical, technical, mathematical, or esoteric reasons for the choice of indexing in computer systems that are well beyond my pay grade.

I’ll just leave this here.

📝 Read this story later in Journal.

🗞 Wake up every Sunday morning to the week’s most noteworthy Tech stories, opinions, and news waiting in your inbox: Get the noteworthy newsletter >

--

--

Jonathan Bluks

Software Engineer @ Plenty Of Fish | Organizer @ ReactVancouver | Former Lead Educator @ Brainstation | Tech Enthusiast.