Over the past few weeks I’ve responded to a number of questions/assertions about pointers as a performance optimization. It seems to confuse many people, which is understandable as it’s a complex subject. I hope this post will help.
The short answer: No, pointers are not inherently a performance optimization.
A thorough explanation of all the minutiae involved would make this post longer than you’re likely willing to read and exceed my own knowledge of the subject. So I’m going to go with a medium sized answer aiming to cover the high-level concepts at play.
One thing I’d like you to keep in mind: The performance aspects discussed in this article are micro-optimizations. Don’t apply micro-optimizations without benchmarking your code to make sure they’re actually improving the performance of you application in a noticeable way. Always prefer the most readable solution first.
What are pointers?
At a basic level a pointer is a memory address. To access the value being pointed at, the program must follow the address to the beginning of the value. This is referred to as “dereferencing.”
How could using a pointer be an optimization?
When passing a variable to a function, a copy of the variable is given to the called function. In many cases, a pointer is smaller than the value being pointed at.
Typically, a pointer is the same size as your system’s architecture, 32 bits on a 32 bit system and 64 bits on a 64 bit system. If the argument is a scalar type (
float, etc), it’s going to be less than or equal to the size of a pointer. If the argument is a compound type, such as a struct with multiple fields, it’s likely the pointer is smaller.
So, the idea is that copying the pointer is more efficient than copying the entire value being pointed at. This is true, to some degree, but there are more considerations than just copying memory when talking about performance.
Can pointers negatively affect performance?
Absolutely. There are two major considerations here:
- Dereferencing pointers isn’t free. It’s not a huge cost, but it can add up.
- Sharing data via pointers will likely cause the data to be placed in the “heap.” The heap is a section of memory for data that lives longer than a single function call. There is overhead to adding data to the heap and heap data can only be cleaned up by the garbage collector. The more data in the heap, the more work the garbage collector has to do, and the more impact it’ll have on your application.
Stack vs Heap
The stack and heap can be intimidating concepts, but they’re very important to this discussion. Here I’m going to try to give you a brief overview. Don’t worry if it doesn’t make sense right away, it didn’t to me either.
Stack: Function-local memory
Each time a function is called it gets it’s own section of the stack to store local variables. The function’s stack size is known at compile time. When the function is called the next area of free memory in the stack is given to the function. When the function returns, that area is available for the next function call, no other cleanup is necessary. While not free, this process is relatively cheap.
Heap: Area for shared data
As explained above, function local variables “disappear” after the function returns. This isn’t a problem if only non-pointer values are returned, because the returned values are copied into the stack of the calling function.
However, if pointers are returned, the pointed-at data needs to be placed somewhere outside the stack so that it will not “disappear.” This is what the heap is for.
There are a few performance related concerns with the heap:
- Placing data in the heap requires asking for memory from the runtime. Again, not a huge overhead, but not free either.
- If there’s not already enough heap space, the runtime will have to ask for additional memory from the OS, which is additional overhead.
- Once a value has been placed in the heap it needs to stay there until no functions have a pointer to it anymore. When there are no more pointers to the data it needs to be cleaned up. In Go, this is the job of the garbage collector. It has to find all the unreferenced values and mark their space in the heap as free. The more values placed in the heap, the more work the garbage collector has to do, and the more potential there is to impact your application.
So why use pointers?
Pointers allow you to share data. If you want a function to be able to modify the the data you’re passing it, a pointer is appropriate.
Pointers can also be useful when you need to distinguish between a zero value and an unset value.
Yes, pointers can avoid copying memory, but the tradeoffs are additional levels of indirection and increased work for the garbage collector. In my opinion, this shouldn’t be a consideration until you’ve profiled the application and found that the copying is actually causing a problem. Computers are very fast at copying memory.
What I’d like you to take away from this post is that pointers can be useful, but don’t use them just because you think they might give you better performance.
Default to using values except when you need the semantics a pointer provides.
- I intentionally simplified many concepts in this article to keep it relatively short and approchable.
- A few of the related concepts I did not cover include: escape analysis, interface conversions, function inlining, stack growth.
- Many types, such as slices, strings, and maps, contain pointers to underlying data, passing pointers to these types rarely makes sense.