Goroutines & WaitGroups: Writing Concurrent Programs in GoLang

Rakib Al Hasan
AirAsia MOVE Tech Blog
5 min readJun 8, 2022

by Syed Rakib

It’s been a few days since I started exploring GoLang and I was impressed with how easily concurrency can be achieved in this language. This article demonstrates the use of Goroutines & WaitGroups for writing concurrent programs in GoLang.

P.S. Note that concurrency is not parallelism.

Photo by Martin Sanchez on Unsplash

In GoLang, when we use the go keyword, it means to start a new goroutine to perform a specific job in the background. Think of a goroutine as a thread to understand this article - although goroutines are not threads. Meanwhile, the caller function will go ahead & can continue to execute its next lines.

Note here that, the go keyword only instructs the program to start a new background job and says nothing about how those background jobs will return back and merge into the caller function - that’s the job of WaitGroups and we will cover that as well in this article.

First, let’s take a look at a simple Go program

  1. The main() function calls do_some_work() 4 times
  2. Each call to do_some_work() performs 5 jobs
    - so that’s a total of 20 jobs to be completed by this program.
  3. Each job uses therandom_wait() function to simulate a random execution time of minimum 50 milliseconds to maximum 350 milliseconds.

The program time (for 20 jobs assigned across 4 workers) took about 4.16 to 4.86 seconds to complete.

Photo by Valentin Salja on Unsplash

Next, let’s introduce concurrency using Goroutines.

  1. Each call to do_some_work() is now done via a goroutine (using thego keyword).
  2. The same number of 20 jobs will be completed by this program.
  3. The main() function now sleeps through a 1-second-wait-time to wait for all the child jobs to complete.

Now, the program time (for the same number of 20 jobs across 4 workers) has significantly reduced from 4.16-to-4.86 seconds to a mere 1.00 second - thanks to concurrency using goroutines.

However, it is not yet fully reliable ❌

The 1st execution of the script took the individual jobs a total of 3,393 milliseconds. All the jobs were executed concurrently (via goroutines) and all of them managed to complete (and return) within the 1-second-wait-time.

The 2nd execution of the script took the individual jobs a total of 3,538 milliseconds. But, despite being executed concurrently (via goroutines), only 17 jobs managed to complete (and return) within the 1-second-wait-time.

The foundation of this program is based on the juvenile assumption that our 1-second-wait-time will be long enough for ALL the child jobs to complete (and return) within that time. This assumption is far from the truth - it is not guaranteed and is never a reliable way to write concurrent programs.

Photo by Veri Ivanova on Unsplash

Let’s revise our program using WaitGroups

  1. Here, we declare a WaitGroup object wg inside the main() function.
  2. For every goroutine call made [from the main() function to the do_some_work() function], we pass the wg object (by &reference) into the called function.
  3. For every goroutine call that the wg object is passed into, we must increment the WaitGroup count by 1
    - in this case, we are incrementing by 4 for 4 goroutine calls
  4. Inside the do_some_work() function, the wg object must announce when that function has completed
    - that is achieved by calling thewg.Done() method.
  5. Meanwhile, inside the main() function, further execution of its next lines are halted using thewg.Wait() method. The wg.Wait() method waits until the WaitGroup count comes back down to zero - which happens when wg.Done() is called from the child functions.

Using waitgroups with goroutines, regardless of how many times we call this program, all the child jobs will always complete (and will always return) to the main() function gracefully before exiting the program. The main() function no longer has to make any juvenile assumption about how long the child jobs may need to take to complete.

Photo by Dan Dennis on Unsplash

This technique is nothing unique to GoLang. In fact, in every language, we have to use some form of method to keep track of the number of child jobs created, and then keep track of when each of them finishes, etc.

However, what makes GoLang interesting is how easily all of this can be achieved natively with just 3 small lines of code.

  1. wg.Add(n)
  2. wg.Wait()
  3. wg.Done()

ProTips:

  1. The number of WaitGroups added via the wg.Add() command must equal the actual number of goroutines spun up via the go keyword. And when a goroutine finishes, it must declare so using the wg.Done() command. Else, the wg.Wait() command will end up in a deadlock waiting forever.
  2. It is generally a good practice to defer the call to wg.Done() at the beginning of the child function itself. This ensures calling wg.Done() is never missed in case the child function returns earlier due to any if-else / switch-case conditions inside the function.
  3. Avoid using primitive incrementors (like n++ or n+=1) when updating global variables inside a concurrent program. With a large enough concurrency, the child jobs can easily run into a race condition and end up incorrectly updating the counter. Use atomic counters instead
  4. For cases where you need to update a global variable from a concurrent program, but cannot use simple atomic counters - you can use mutexes instead. This ensures safe updates can be done without race conditions.
    P.S. Mutexes are the underlying implementation of atomic counters.

--

--

Rakib Al Hasan
AirAsia MOVE Tech Blog

DevOps Engineer, Backend Developer, Cloud Architect, Night time drive-outs & nice hangouts