On Pointers: The Wall of Lockers

Audience: those with some programming experience who know that pointers exist and what they are, but not the details of how they work. Bonus points if pointers are frustrating for you. I want to give you a mental model by which to understand pointers and pointer errors.

xckd on point. Or rather, on pointers. [Credit]

In my second year of college at the University of Illinois at Urbana-Champaign, I took a class called CS 225: Data Structures. In addition to being the class that opened up most of the interesting senior-level classes to my eager eyes, and — according to the experienced senior students — got you an internship, it was the class that challenged me with the strange world of pointer management.

Data Structures at UIUC was taught in C++, which means code looks something like this:

class ReferenceFrame {
ReferenceFrame *next;
Pose pose;
Mat grayscale_image;
/* other code */

Those programmers that are familiar with Java [as I was] can handle the class definition, access modifiers, and variable declaration. The strange statement is the mysterious asterisk that separates “ReferenceFrame” from “next.”

I’m not going to tell you what pointers are and show you code that uses pointers. I assume you’ve seen that and you’ve been confused by it — or worse, it seems to make sense, but you still end up stuck with a pointers bug in your code that’s harder to catch than Entei in Pokémon Gold after your brother used the Master Ball on a friggin’ Caterpie.

I am going to give you a mental model that has worked well for me. Memory works like a bunch of lockers, and pointers let you use locker numbers.

The memories of high school that come flooding back. [image source]

Now consider the bread-and-butter example of pointers: the linked list. I assume you’re familiar with it, if not, here’s a link.

Here’s a C++ type declaration of a singly-linked list item. We won’t worry about managing the whole list in this post, just one element.

class ListNode {
Thing thing;
ListNode* next;

The ListNode declaration is a recipe for each ListNode. If we use the lockers analogy, the ListNode is exactly two lockers next to each other: one has a Thing, and the other has a ListNode* — whatever that means.

Here’s the kicker: the ListNode* is a locker number. It tells you the location of the next ListNode, which has its own Thing and ListNode pointer (locker number) next to it. So now the computer can go to that ListNode and do whatever you need it to do — maybe the Thing is song data, and you just started the next song in your Indie Pool Party playlist, or the Thing is an edit in Microsoft Word and you want to undo the accidental “I just deleted everything” in your 1,800-page epic about corn.

I assume you’ve tried to use pointers before and can tell me, with some trial-and-error, what is wrong with each of these code fragments. That’s boring, and that’s why you don’t remember what’s wrong even though you’ve seen these errors before. Because pointers are locker numbers, I can tell a story — and you can come up with your own stories — for each error that you see.

Modifying an Uninitialized Pointer

Consider this code snippet:

int *x;
*x = 12;

It’s boring and therefore unmemorable, though it shows the gist of the problem. By the second line, the pointer is declared, which means there is space for it, but uninitialized, which means the space doesn’t have any meaningful value inside of it. Here’s another, more common example, in a different form:

int *potatoes_per_player;
potatoes_per_player[0] = 500; // Player 1 is potato king.

Arrays in C++ are usually laid out contiguously, such that the next element in an array occupies the place in memory right after the current element in the array. For our locker metaphor, this means that parts of an array are all locker neighbors.

This contiguity (locker-neighbor-ness) of the array allows a useful representation of the entire array. Any element of the array can be found if you know where the element starts (the first locker number), how big each element is (how many lockers it takes up), and in which position of the array the element is. That’s how array indexing works.

But to use this array indexing, you need a pointer to begin with. And so we return to our problem: an uninitialized pointer.

The first line in each example declares an int*. In Locker Land, that means you have a locker for your little slip of paper that will tell you the locker number you’re actually looking for.

The next line in each example tries to read what you’ve written on that slip of paper — but the slip of paper wasn’t written by you!

Your program: “So what do I deliver to the White House?” Mysterious data: “Ah, here it is. This box with clay, wires, and an old cell phone.”

In the best case, you read some locker number you don’t have access to, and the program crashes immediately. In the worst case, the locker number is something your program needs to run, and you change it, and you make your program do very, very strange things.

Yes, the better of the two is having your program crash.

Dereferencing a Null Pointer

Sometimes when you try to access a memory location (locker number) that you’re not supposed to use, you’ll get a segfault. Short for segmentation fault, segfaults are what happens when the computer gets a fault (error) when it tries to access a protected memory segment.

This code snippet produces a segfault on the vast majority of computers:

int *potatoes_per_player = 0x0; // 0x0 is NULL or nullptr
potatoes_per_player[0] = 500; // Player 1 is potato king.

On those computers, there is a large section of memory beginning at zero [aka “null”] that no program can ever access. It seems wasteful, but it’s not. Look up physical and virtual memory if you’re curious.

In our locker model, think of these sections of memory as halls that are off-limits to anyone except the building staff. Or more accurately, there are no lockers in that hall — so that any program that tries to access that memory (like a person trying to get into that hall saying they need some data) must be mistaken or malicious.

Every computer, at the unreachable memory address 0x-1, stores a secret. I found it, and it is that all humans ar —

A null pointer error is like a black beetle running across a white tile floor: bothersome, noticeable, but easily taken care of. The other kinds of pointer errors, like uninitialized data or dangling pointers, are like speckled carpet covered in crumbs of wasabi & ginger potato chips: not noticable until you sit on your heels while playing Settlers of Catan and feel the junk stuck to your shins and two weeks later you still haven’t done anything about it because your vacuum cleaner sucks because it doesn’t suck.

In other words, the null pointer errors are the better of the two because you can find it and fix it faster.

Memory Leaks

There’s one more error I want to cover: memory leaks.

int *potatoes_per_player = new int[player_count];
... code ...
/* set to null pointer so future [illegal] accesses are black
* beetles and not wasabi & ginger chips */
potatoes_per_player = 0x0;

You’ve gotten this far, so I’m going to congratulate you. You’ve avoided the Labyrinth of Uninitialized Pointers and the False Door of Dereferenced Null.

But you’ve just stumbled into the Bottomless Pit of the Memory Leak.

Lockers aren’t free. You have to talk to the staff at Locker Land and request some lockers each time you use the word “new” in C++. At some point, when you’re done using those lockers, you need to tell Locker Land that you don’t need to have them reserved anymore by using the “delete” word. How can you do that without knowing what lockers you want to give back?

What if you take six lockers, then another six lockers, and another six lockers, again and again until Locker Land gets fed up with you taking so many damn lockers and they refuse to give you any more. That’s called an “out of memory” error. And it was caused because you didn’t remember to return your lockers. So be a good customer at Locker Land, and return your lockers when you’re done.

Let’s sum it up:

  • A pointer is the address of some memory, which works like a locker number.
  • An uninitialized pointer is like using something inside a locker that was just sitting in there when you opened it. In other words, ew.
  • Dereferencing a null pointer is like asking Locker Land for some locker that you shouldn’t have and doesn’t actually exist. And it puts your program on the hit list.
  • A memory leak is like repeatedly asking for more and more lockers until Locker Land gets tired and tells you that you can’t have any more.

Now that you know more about the stories of these errors, you can remember them better, and look for them in your own code and find the solutions.

Got questions, comments, ideas, problems, or free money? I’m not famous or busy enough to ignore your responses yet. Let me know what you think.