Good Code vs Bad Code in Golang

Teiva Harsanyi
Apr 16, 2018 · 9 min read
Image for post
Image for post

Recently, I was asked to detail what makes a good code or a bad code in Golang. I found this exercice very interesting. Actually, interesting enough to write a post about that. To illustrate my answer, I have taken a concrete use cases I faced in the Air Traffic Management (ATM) domain. The project is available in Github.

Context

Eurocontrol is the organization managing the air traffic across Europe countries. The common network for exchanging data between Eurocontrol and an Air Navigation Service Provider (ANSP) is called AFTN. This network is mainly used to exchange two different message types: ADEXP and ICAO messages. Each message type has its own syntax but in terms of semantic, both types are equivalent (more or less). Given the context, performance must be a key element for the implementation.

This project has to provide two implementations for parsing ADEXP messages (ICAO is not managed in the frame of this exercise) based on Go:

  • A bad implementation (package name: bad)
  • A refactored implementation (package name: good)

An example of an ADEXP message can be found here.

In the frame of this exercise, the parsers handle only a subset of the fields we can find in an ADEXP message. Yet, It is still relevant to illustrate common Go mistakes.

Parsing

-ARCID ACA878

Meaning the ARCID (aircraft identifier) is ACA878.

-EETFIR EHAA 0853
-EETFIR EBBU 0908

This example is a list of FIR (Flight Information Region). The first FIR is EHAA 0853 whereas the second one is EBBU 0908.

-GEO -GEOID GEO01 -LATTD 490000N -LONGTD 0500000W
-GEO -GEOID GEO02 -LATTD 500000N -LONGTD 0400000W

A repeating list of tokens. Each line contains a sublist of tokens (in this example GEOID, LATTD, LONGTD).

Given the context, it is important to implement a version leveraging parallelization. So the algorithm is the following one:

  • A preprocessing step to clean and rearrange the input message (we have to clean the potential white spaces, rearrange the tokens which are multi-lined like COMMENT etc.)
  • Then splitting each line in a given goroutine. Each goroutine will be in charge to process one line and to return the result.
  • Last but not least, gathering the results and returning a Message structure. This structure is a common one regardless of the message type (ADEXP or ICAO).

Each package contains an adexp.go file exposing the main function ParseAdexpMessage().

Step-by-step comparison

String vs []byte

Error management

The good implementation deals with each potential error:

We can also find some mistakes in the bad implementation like in the following code:

The first mistake is a syntax one. An error string shall neither be capitalized nor end with a punctuation according to Go standards.
The second mistake is due to the fact that if an error string is a simple constant (no formatting is required), a call to errors.New() is slightly more performant.

The good implementation looks like:

Avoid nesting

On the opposite, the good implementation is a flat representation:

This makes the code easier to read in my opinion. Furthermore, this flat representation must also be applied to errors management. As an example:

Should be replaced by:

Once again, the second code version is easier to read.

Passing data by reference or by value

Given the context of this project (performance does matter) and considering a message can potentially be quite heavy, a better option was to pass a pointer to the container structure instead. Otherwise, in the previous example the container value will be copied during each call.

The good implementation does not face this problem as it deals with slices (a simple 24-byte structure regardless of the underlying data):

More generally speaking, passing data either by reference or by value must not be an idiomatic choice.
Passing data by value could also help to make sure a function will not cause any side effect (like mutating the data passed in the function input). This has several benefits like unit testing or refactoring a code for parallelization for example (otherwise we need to check each subfunction to see if a mutation is made).

I do believe such choice must really be done carefully depending on the project context.

Parallelization

This is achieved in the bad implementation by iterating over the number of lines and spawning a mapLine() call in a goroutine.

The mapLine() function takes in arguments three parameters:

  • A pointer to the final Message structure to be returned. It means each mapLine() will enrich the same variable.
  • The current line
  • A channel used for sending a notification once the processing of the line is done

Sending a pointer to a shared Message variable breaks one of the main Go principles:

Don’t communicate by sharing memory, share memory by communicating.

There are two main drawbacks to passing this shared variable:

  • Drawback #1: Slices concurrent modifications

Because the structure contains some slices which can be modified concurrently (by two or more goroutine at the same time), in the bad implementation we had to deal with mutexes.

For example, the Message structure contains a Estdata []estdata.
Modifying the slice by appending another estdata must be done this way:

Actually, except very specific use cases, having to use a mutex in a goroutine might be a code smell.

  • Drawback #2: False sharing

Sharing memory across threads/goroutines is not a good idea due to potential false sharing (a cache line in a given CPU core cache can be invalidated by another CPU core cache). This means we should avoid as much as possible sharing the same variable across threads/goroutines if they intend to mutate it.

In this very example, though, I don’t think false sharing has a huge impact as the input file is quite light (running a performance test with padding fields in the Message structure gives more or less the same result). Yet, that’s always something important to bear in mind in my opinion.

Let’s see now how is the good implementation dealing with the parallelization:

Now, the mapLine() takes only two inputs:

  • The current line
  • A channel. This time this channel is not used to simply send a notification once a line processing is done but also to send the actual result. It means it is not up to the goroutines to modify the final Message structure.

Gathering the results is done this way by the parent goroutine (the one spawning the mapLine() calls in separate goroutines):

This implementation is more aligned, in my opinion, with Go principles to share memory only by communicating. The Message variable is modified by a single goroutine to prevent potential concurrent slices modifications and false sharing.

One potential criticism even with the good code is to spawn a goroutine for each line. Such implementation will work because an ADEXP message will not contain thousands of lines. Yet, the simple implementation one request triggering one goroutine does not scale very much under very high throughput. A better option would have been to create a pool of reusable goroutines for example.

Edit: This assumption (one line = one goroutine) was definitely not a good idea as it leads to way too much context switches. For additional information, please take a look at the link in the further reading chapter (at the end of the post).

Line processing notification

As the parent does not actually check the value sent by the channel, a better option would have been to use chan struct{} with a ch <- struct{}{} or even better (GC wise) to use a chan interface{} with a ch <- nil.

Another approach (even cleaner in my opinion) would have been to use a sync.WaitGroup as the parent goroutine just need to continue its execution once every mapLine() is done.

If

An improved version of:

Can be the following implementation:

It slightly improves the code readability.

Switch

The default can be optional if the developer thought about all the different cases. Yet, it is definitely better to catch this specific case like in the following example:

Handling the default case would help in catching potential bugs made by developers as soon as possible in the development process.

Recursion

Yet, Go does not support tail-call elimination to optimize sub-function calls. The good code produces the exact same result but using an iterative algorithm:

The second code will be then more performant than the first one.

Constants management

Whereas the good code is a more elegant solution based on Go (elegant) iota:

It produces exactly the same result but it reduces potential developer mistakes.

Receiver functions

The bad code implements it this way:

Meaning we have to pass a Message as an input of the function.
Whereas the good code is simply a function with a Message receiver:

The second approach is preferable. We simply indicate the Message struct implements a specific behavior.

It might also be a first step to using Go interfaces. For example, if someday we need to create another structure with the same behavior (IsUpperLevel()), the initial code does not even need to be refactored (as Message already implements this behavior).

Comments

On the other side, I tried to comment the good code as I would do in a real project. Even though I’m not the kind of developer who likes to comment every single line, I still believe it is important to comment at least each function and the main steps in a complex function.

As an example:

One concrete example in addition of a function comment might also be very useful:

Such concrete examples can really help another developer in better understanding an existing project.

Last but not least, according to Go best practices the package itself is also commented.

Logging

go fmt

DDD

Performance results

  • Bad implementation: 60430 ns/op
  • Good implementation: 45996 ns/op

The bad code is more than 30% slower than the good one.

Conclusion

The first obvious characteristic of a good code is to provide a correct solution according to given functional requirements. A code can be performant if it does not fit the requirements, it is pretty useless.

Meanwhile, it is important for a developer to care about simple, maintainable and performant code.

The performance improvement does not materialize from the air, it comes with code complexity increase.

A good developer is someone able to find the right balance between these characteristics according to a given context.

Just like in DDD, context is key :)


Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store