Golang: simple optimization notes

Published in

Scum-Gazeta

6 min readMar 26, 2022

Today I want to bring to your attention an article on the simplest optimizations in Golang projects.

To begin with, I wanted to talk a little about the optimizations themselves. We will not discuss the theory, but we will start immediately with practical examples.

In the era of cloud computing, we often create serverless applications. And if we create them for our pet projects, then the infrastructure maintenance budget comes to the front. If our service is lightly loaded, then it really can be practically free. But if something goes wrong — you will pay a lot for it! And when it comes to money, you will definitely react to it somehow.

The second case is when you have, for example, your own VPS running several services. And one of them sometimes takes up all the resources, so much so that you cannot access the server via ssh.
We move to a Kubernetes cluster, set limits for all our applications and see that some of our applications are restarting — OOM-killer comes and solves the problem of our memory “leaks”.

Of course, this is not always a leak, it can also be a common overspending of resources, and we will try to avoid this today. And memory leaks are closer to bugs and you can talk about them.

Too much resource consumption hurts the wallet, which means it requires immediate action.

Now let’s talk about optimization. I hope you all understand why we do not prematurely optimize and against it?! This is likely to be useless work, since we will need to study the entire application first and your piece of code will most likely not be a bottleneck. We need a quick result, MVP, and only then we will think about his problems.

Second moment. that every optimization must be justified. That is, each optimization should be built on a benchmark. We must prove and show how much profit it brought us.

You also need to understand that most optimizations complicate the readability of the code. You will have to determine this balance for your projects yourself.

We have finished with the theory, now let’s look at practical advice grouped by standard entities.

Arrays and slices

Allocate memory for slices in advance

Try to always use the third parameter: make([]T, 0, len)

If you don’t know the exact amount in advance and the slice is short lived, you can allocate more, as long as the slice doesn’t grow at runtime.

Don’t forget to use “copy”

We try not to use append when copying or, for example, when merging two or more slices.

We iterate correctly

If we have a slice with many elements, or with large elements, we try to use “for” or range with a single element. With this approach, we will avoid unnecessary copying.

Reusing slices

If we need to carry out some kind of manipulation with the incoming slice and return the result, we can return it, but already modified. This way we avoid new memory allocations.

We do not leave unused slices

If we need to cut off a small piece from a slice and use only it, remember that the main part will also remain with you forever. We use copy for a new piece to send the old one to the GC.

Strings

Doing concatenation correctly

If gluing strings can be done in one statement, then we use “+”, if we need to do this in a loop, then we use string.Builder. Specify the size for the builder in advance through “Grow”

Using transformation optimization

Since strings under the hood consist of a slice of bytes, sometimes conversions between these two types allow you to avoid memory allocation.

Using Internment

We can pool strings, thereby helping the compiler store identical strings only once.

Avoiding Allocations

We can use a map (concatenation) instead of a composite key, we can use a slice of bytes. We try not to use the fmt package, because all of its functions use reflection.

Structures

Avoid copying large structures

Small structures in our understanding, these are structures with no more than 4 fields no more than a machine word.

Standard copy cases

cast to interface
receiving and sending to channels
replacing an entry in a map
adding an element to a slice
iteration (range)

Avoid accessing struct fields through pointers

Dereferencing is expensive, we can do it as little as possible especially in a loop. We also lose the ability to use fast registers.

Work with small structures

This work is optimized by the compiler, which means it is cheap.

Reduce structure size with alignment

We can align our structures (arrange the fields in the right order, depending on their size) and thus we can reduce the size of the structure itself.

Functions

Use inline functions or inline them yourself

We try to write small functions available for inlining by the compiler — it’s fast, but it’s even faster to embed code from functions yourself. This is especially true for hot path functions.

What won’t inlined?

recovery func
select blocks
type declarations
defer
goroutine
for-range

Choose your function arguments wisely

We try to use “small” arguments, as their copying will be specially optimized. We also try to keep a balance between copying and growing the stack with a load on the GC.
Avoid a large number of arguments — let your program use super fast registers (there are a limited number of them)

Declaring a named result

This seems to be a bit more performant than declaring these variables in the body of the function.

Saving intermediate results

Help the compiler to optimize your code, save intermediate results and then there will be more options to optimize your code.

Use “defer” carefully

Try not to use defer, or at least not use it in a loop.

Facilitating the “hot path”

Avoid allocating memory in these places, especially for short-lived objects. Make the most common branches first (if, switch).

Map

Allocate memory in advance

Everything is like everywhere else. When initializing the map, specify its size.

Using an empty structure as values

struct{} is nothing, so using this approach for example for signal values is very beneficial.

Clearing the map

The map can only grow and cannot shrink. We need to control this — reset the maps completely and explicitly, because. deleting all of its elements won’t help.

We try not to use pointers in keys and values

If the map does not contain pointers, then the GC will not waste its precious time on it. And know that strings are also pointers — use an array of bytes instead of strings for keys.

Reducing the number of changes

Again, we do not want to use a pointer, but we can use a composite of a map and a slice and store the keys in the map, and in the slice the values that we can already change without restrictions.

Interface

Counting memory allocations

Remember, to assign a value to an interface, you first need to copy it somewhere and then paste a pointer to it. The keyword is copy. And it turns out that the cost of boxing and unboxing will be approximate to the size of the structure and one allocation

Choosing the optimal types

There are some cases when there will be no allocations during boxing / unboxing. For example, small and boolean values of variables and constants, structures with one simple field, pointers (map, chan, func including)

Avoiding memory allocation

As elsewhere, we try to avoid unnecessary allocations. For example, to assign an interface to an interface, instead of boxing twice.

Use only when needed

Avoid using interfaces in the parameters and results of small, frequently called functions. We do not need extra packing and unpacking.
Use interface method calls less frequently, if only because it prevents inlining.

Pointers, channels, BCE

Avoid unnecessary dereferences

Especially in a loop, because it turns out to be too expensive. Dereferencing is a whole complex of necessary actions that we do not want to perform at our expense.

Channel usage is inefficient

Channels are slower than other synchronization methods. In addition, the more cases in select, the slower our program. But select, case + default are optimized.

Try to avoid unnecessary boundary checks

This is also expensive and we should avoid it in every possible way. For example, it is more correct to check (get) the maximum slice index once, instead of several checks. It is better to immediately try to get extreme options.

Сonclusion

Throughout this article, we see the same optimization rules.

Help the compiler make the right decision and it will thank you. Allocate memory at compile time, use intermediate results, and try to keep your code readable.

And i repeat once again that benchmarks are mandatory for implicit optimizations. If only because our compiler changes too much from version to version and what worked yesterday will not work tomorrow and vice versa.

Don’t forget to use the built-in profiling and tracing tools.

Good luck with your optimizations!