Modern Concurrency with Go

From Learn Concurrent Programming with Go by James Cutajar

Manning Publications
CodeX

--

This excerpt covers:

  • Introducing concurrent programming
  • Improving performance with concurrent execution
  • Scaling our programs
  • Why you should choose Go for concurrency

Read on to learn more.

Meet Jane Sutton. Jane has been working at HSS international accountancy as a software developer for only 3 months. In her latest project she got involved with an issue that has been occurring in the payroll system. It’s a software module that runs at the end of the month, after close of business and computes all the salary payments of HSS clients’ employees. This morning her manager arranged a meeting with the product owner, infrastructure team, and a sales representative to try to get to the bottom of the problem. Unexpectedly Sarika Kumar, the CTO, has joined the meeting room via video call.
Thomas Bock, the product owner, starts, “I don’t understand. The payroll module has been working fine for as long as I can remember. Suddenly last month the payment calculations weren’t completed in time, and we got loads of complaints from our clients. It made us look really unprofessional with Block Entertainment, our new and biggest client yet, threatening to go to our competitor.”
Jane’s manager, Francesco Varese, chimes in, “The problem is that the calculations are just too slow and take too long. They are slow because of their complex nature, considering many factors such as employee absences, joining dates, overtime and a thousand other factors. Parts of the software were written more than a decade ago in C++. There is no developer left in the firm that understands how this code works.”
“We are about to signup our biggest ever client, a company with over 30,000 employees. They have heard about our payroll problem, and they want to see it resolved before they proceed with the contract. It’s really important that we fix this as soon as possible,” replies Rob Gornall from the sales and acquisitions department.
“We have tried adding more processor cores and memory to the server that runs the module. This has made absolutely no difference. When we execute the payroll calculation using test data, it’s taking the same amount of time no matter how many resources we allocate. It’s taking more than 20 hours to calculate the all the clients’ payrolls, which is too late for our clients” continues Frida Norberg from infrastructure.
It’s Jane’s turn to finally speak. As the firm’s newest employee, she hesitates a little but manages to say, “If the code is not written in a manner that takes advantage of the additional cores, it won’t matter if you allocate multiple processors to it. The code needs to use concurrent programming for it to run faster when you add more processing resources to it.”
Everyone seems to have established that Jane is the most knowledgeable about the subject. There is a short pause. Jane feels as if everyone wants her to come up with some sort of an answer, so she continues, “Right. Ok… I’ve been experimenting with a simple program written in Go. It divides the payrolls into smaller employee groups and then calls the payroll module with each group as input. I’ve programmed it so that it calls the module concurrently using multiple goroutines. I’m also using a Go channel to load balance the workload. In the end I have another goroutine that collects the results via another channel.”
Jane looks around quickly and sees only a blank look on everyone’s faces, so she adds, “In simulations it’s at least 5 times faster on the same multicore hardware. There are still a few tests to run to make sure there are no race conditions, but I’m pretty sure I can make it run even faster, especially if I get some help from accounting to migrate the old C++ logic into clean Go concurrent code.”
Jane’s manager has a big smile on his face now. Everyone else in the meeting seems surprised and speechless. The CTO finally speaks up and says, “Jane, what do you need to get this done by the end of the month?”
Practical concurrent programming knowledge is an increasingly sought-after skill by tech companies. Businesses want to utilize their hardware resources to their full capacity as this saves them time and money. For this they understand that they have to hire the right talent — developers who can write scalable concurrent applications.
Apart from getting hired or promoted as developers, knowing concurrent programming gives us a wider set of skills which we can employ in new scenarios. For example, we can model complex business interactions that happen at the same time. We can also use concurrent programming to improve our software’s responsiveness by picking up tasks swiftly. Unlike sequential programming, concurrent programming can make use of multiple CPU cores. This allows us to increase the work done by our program, by speeding up the program execution. Let’s try to look at some of these scenarios in more detail.

Interacting with a concurrent world

We live and work in a concurrent world. The software that we write models complex business processes that interact together concurrently. Even the simplest of businesses typically have many of these concurrent interactions. For example, we can think of multiple people ordering online at the same time or a consolidation process grouping packages together while simultaneously coordinating with ongoing shipments, as shown in Figure 1.

Figure 1 A consolidation shipping process showing complex concurrent interactions.

In our everyday life we deal with concurrency all the time. Every time we get in our car’s driving seat, we interact with multiple concurrent actors, such as other cars, cyclists, and pedestrians. We can easily predict and navigate such environment. At work we put a task on hold while we’re waiting for an email reply and pick up the next. When cooking, we plan our steps, so we maximize our productivity and shorten time. Our brain is perfectly comfortable managing concurrent behavior. In fact, it does this all the time without us even noticing.
Concurrent programming is about writing instructions so that multiple tasks and processes can execute and interact at the same time. If two customers place an order at the same time and only one stock item remains, what happens? If the price of a flight ticket goes up every time a client buys a ticket, what happens when multiple tickets are booked at the same exact instant? If we have a sudden increase in load due to extra demand, how will our software scale when we increase the processing and memory resources? These are all sample scenarios that developers deal with when they are designing and programming concurrent software.

Increasing throughput

For the modern developer it is ever more important to understand how to program concurrently. This is because the hardware landscape has changed over the years to benefit this type of programming.
Prior to multicore technology, processors’ performance increased proportionally to clock frequency and transistor count, roughly doubling every 2 years. Processor engineers started hitting physical limits due to overheating and power consumption, which coincided with the explosion of more mobile hardware such as notebooks and smartphones. To reduce excessive battery consumption and CPU overheating while increasing processing power, the engineers introduced multicore processors.
In addition, with the rise of cloud computing services, developers have easy access to large, cheap processing resources to run their code. All this extra computational power can only be harnessed effectively if our code is written in a manner that takes full advantage of the extra processing units.

Figure 2 Improving performance by adding more processors.

Having multiple processing resources means we can scale horizontally. We can use the extra processors to compute executions in parallel and finish our tasks quicker. This is only possible if we write code in a way that takes full advantage of the extra processing resources.
What about a system that has only one processor? Is there any advantage in writing concurrent code when our system does not have multiple processors? It turns out that writing concurrent programs is a benefit even in this scenario.
Most programs spend only a small proportion of their time executing computations on the processor. Think for example about a word processor that waits for input from the keyboard. Or a text files search utility spending most of its running time waiting for portions of the text files to load in memory. We can have our program perform a different task while it’s waiting for input/output. For example, the word processor can perform a spell check on the document while the user is thinking about what to type next. We can have the file search utility looking for a match with the file that we have already loaded in memory while we are waiting to finish reading the next file into another portion of memory.
Think for example when we’re cooking or baking our favorite dish. We can make more effective use of our time if, while the dish is in the oven or stove, we perform some other actions instead of idling and just waiting around. In this way we are making more effective use of our time and we are more productive. This is analogous to our system executing other instructions on the CPU while concurrently the same program is waiting for a network message, user input, or a file writing to complete. This means that our program can get more work done in the same amount of time.

Figure 3 Even with one processor, we can improve performance if we utilize idle times.

Improving responsiveness

Concurrent programming makes our software more responsive because we don’t need to wait for one task to finish before responding to a user’s input. Even if we have one processor, we can always pause the execution of a set of instructions, respond to the user’s input, and then continue with the execution while we’re waiting for the next user’s input.
If again we think of a word processor, multiple tasks might be running in the background while we are typing. There is a task that listens to keyboard events and displays each character on the screen. We might have another task that is checking our spelling and grammar in the background. Another task might be running to give us stats on our document (word count, pages, and so on). All these tasks running together give the impression that they are somehow running simultaneously. What’s happening is that these various tasks are being fast-switched by the operating system on CPUs. Figure 4 illustrates a simplified timeline showing these three tasks executing on a single processor. This interleaving system is implemented by using a combination of hardware interrupts and operating system traps. We go into more detail on operating systems and concurrency in the next chapter. For now, it’s important to realize that if we didn’t have this interleaving system, we would have to do each task one after the other. We would have to type a sentence, then hit the spell check button, wait for it to complete and then hit another button and wait for the document stats to appear.

Figure 4 Simplified task interleaving diagram of a word processor.

Programming concurrency in Go

Go is an ideal language to use to learn about concurrent programming because its creators designed it with high-performance concurrency in mind. The aim was to produce a language that was efficient at runtime, readable, and easy to use. This means that Go has many tools for concurrent programming. Let’s take a look at some of the advantages to using Go for concurrent programs.

Goroutines at a glance

Go uses a lightweight construct, called a goroutine, to model the basic unit of concurrent execution. As we shall see in the next chapter, goroutines give us a hybrid system between operating system and user-level threads, giving us some of the advantages of both systems.
Given the lightweight nature of goroutines, the premise of the language is that we should focus mainly on writing correct concurrent programs, letting Go’s runtime and hardware mechanics deal with parallelism. The principle is that if you need something to be done concurrently, create a goroutine to do it. If you need many things done concurrently, create as many goroutines as you need, without worrying about resource allocation. Then depending on the hardware and environment that your program is running on, your solution will scale.
In addition to goroutines, Go gives provides us with many abstractions that allow us to coordinate the concurrent executions together on a common task. One of these abstractions is known as a channel. Channels allow two or more goroutines to pass messages to each other. This enables the exchange of information and synchronization of the multiple executions in an easy and intuitive manner.

Modelling concurrency with CSP and primitives

The other advantage of using Go for concurrent programming is its support for Communicating Sequential Processes (CSP). This is a manner of modelling concurrent programs in order to reduce the risk of certain types of programming errors. CSP is more akin to how concurrency happens in everyday life. This is when we have isolated executions (processes, threads, or goroutines) working concurrently, communicating to each other by sending messages back and forth.
The Go language includes support for CSP natively. This has made the technique very popular. CSP makes our concurrent programming easier and reduces certain types of errors.

Figure 5 A concurrent Go application using CSP.

Sometimes the classic concurrency primitives found in many other languages used with memory sharing will do a much better job and result in a better performance than using CSP. These primitives include tools such as mutexes and conditional variables. Luckily for us, Go provides us with these tools as well. When CSP is not the appropriate model to use, we can use the other classic primitives also provided in the language.
It’s best to start with memory sharing and synchronization. The idea is that by the time you get to CSP, you will have a solid foundation in the traditional locking and synchronization primitives.

Building our own concurrency tools

In this book, we will learn how to use various tools to help us build our concurrent applications:

  • Programming concurrent solutions to create more responsive, higher performance and scalable software.
  • Recognizing and avoiding common concurrency programming problems such as deadlocks and race conditions.
  • Managing concurrency using goroutines, mutexes, readers-writer locks, semaphores, waitgroups, barriers, channels, condition variables and how to build some of these tools.
  • Identifying concurrency patterns such as pipelining, worker pools, shared memory and message passing.
  • Discovering advantages, limits and properties of parallel computing.
  • Improving coding skills in Go with more advanced, multithreading topics.
  • And much more!

Knowing how to use these concurrency tools is good, but what about having knowledge of the inner workings? We’re going to go one step further and take the approach of building them together from scratch. Knowing the inner workings of concurrent programming is essential to getting the most out of it.
We will pick common concurrency tools and understand how they can be implemented using other concurrency primitives as building blocks. For example, Go doesn’t come with a bundled semaphore implementation, so apart from understanding how and when to use semaphores, we go about implementing one ourselves. We do this also for some of the tools that are available in Go, such as waitgroups and channels. The idea is that if we know how to build something, we will also know how and when to use it.

Scaling performance

Performance scalability is the measure of how well our program speeds up in proportion to the increase in the number of resources available to the program. To understand this, let’s try to make use of a simple analogy.
Imagine a world where we are property developers. Our current active project is to build a small multi story residential house. We give your architectural plan to a builder, and she sets off to finish the small house. The works are all completed in a period of 8 months.
As soon as we finish, we get another request for the same exact build but in another location. To speed things up, we hire two builders instead of one. This second time around, the builders complete the house in just 4 months.
The next time that we get another project to build the same exact house, we agree to hire even more help so that the house is finished quicker. This time we pay 4 builders, and it takes them 2 and a half months to complete. The house has cost us a bit more to build than the previous one. Paying 4 builders for 2.5 months costs you more than paying 2 builders for 4 months (assuming they all charge the same).
Again, we repeat the experiment twice more, once with 8 builders and another time with 16. With both 8 and 16 builders, the house took 2 months to complete. It seems that no matter how many hands we put on the job, the build cannot be completed faster than 2 months. In geek speak, we say that we have hit our scalability limit. Why does this even happen? Why can’t we continue to double our resources (people/money/processors) and always reduce the time spent by half?

Amdahl’s Law

In 1967 Gene Amdahl, a computer scientist, presented a formula at a conference that measured speedup with regard to a problem’s parallel-to-sequential ratio. This famously became known as Amdahl’s law.

Amdahl’s Law states that the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used.

In our house build scenario, the scalability here is limited by various factors. For starters, our approach to solving the problem might be limiting us. For example, one cannot build the second floor before finishing up the first. In addition, several parts of the build can only be done sequentially. For example, if only a single road leads to the building site, only one transport can use the road at any point in time. In other words, some parts of the building process are sequential (one at a time after each other), and other parts can be done in parallel (at the same time). These factors influence and limit the scalability of our task.
Amdahl’s law tells us that the non-parallel parts of an execution act as a bottleneck and limit the advantage of parallelizing the execution. Figure 6 shows this relationship between the theoretical speedup obtained as we increase the number of processors.

Figure 6 Chart showing the speedup against number of processors according to Amdahl’s Law.

If we apply this chart to our construction problem, when we use a single builder and she spends 5% of her time on the parts that can only be done sequentially, the scalability follows the topmost line in our chart (95% parallel). This sequential portion is the part that can only be done by one person, such as trucking in the building materials through a narrow road.
As you can see from the chart, even with 512 people working on the construction, we only finish the job about 19 times faster than if we had just one person. After this point, the situation does not improve much. We’ll need more than 4096 builders to finish the project just 20 times faster. We hit a hard limit around this number. Contracting more workers does not help at all, and we would be wasting our money.
The situation is even worse if a lower percentage of work is parallelizable. With 90% we would hit this scalability limit much quicker, around the 512-workers mark. With 75% we get there at 128 and with 50% at just 16. Notice also that it’s not just this limit that goes down but also the speedup is greatly reduced. In the 90%, 75% and 50% we get maximum speed ups of 10, 4 and 2 respectively.
Amdahl’s law paints a pretty bleak picture of concurrent programming and parallel computing. Even with concurrent code that has a tiny fraction of serial processing, the scalability is greatly reduced. Thankfully this is not the full picture.

Gustafson’s Law

In 1988 two computer scientists, John L. Gustafson and Edwin H. Barsis, reevaluated Amdahl’s law and published an article addressing some of its shortcomings. It gives an alternative perspective on the limits of parallelism. Their main argument is that, in practice, the size of the problem changes when we have access to more resources.
To continue with our house building analogy, if we did have thousands of builders available at our disposal, it would be wasteful to put them all into building a small house when we have more future projects in the pipeline. Instead, what we would do is try to put the optimal number of builders in our house construction and allocate the rest of the workers on the other projects.
If we were developing some software and we had a large number of computing resources, if we noticed that utilizing half the resources resulted in the same performance of that software, we could allocate those extra resources to do other things, such as increasing the accuracy or quality of that software in other areas.
The second point against Amdahl’s law is that when you increase the problem size, the non-parallel part of the problem typically does not grow in proportion with problem size. In fact, Gustafson argues that for many problems this remains constant. Thus, when you take these two points into account, the speedup can scale linearly with the available parallel resources. This relation is shown in figure 7.

Figure 7 Chart showing the speedup against the number of processors according to Gustafson’s Law.

Gustafson’s Law tells us that as long as we find ways to keep our extra resources busy, the speedup should continue to increase and not be limited by the serial part of the problem. This is only if the serial part stays constant as we increase the problem size, which according to Gustafson, is the case in many types of programs.
To fully understand both Amdahl’s and Gustafson’s laws, let’s take a computer game as an example. Let’s say a particular computer game with rich graphics was written in a way that makes use of multiple computing processors. As time goes by and computers become more parallel processing cores, we can run that same game with a higher frame rate, giving us a smoother experience. Eventually we get to a point when we’re adding more processors, but the frame rate is not increasing further. This happens when we hit the speed up limit. No matter how many processors we add, we won’t have the game run with higher frame rates. This is what Amdahl’s law is telling us — that there is a speedup limit for a particular problem of fixed size if it has a non-parallel portion.
However, as technology improves and processors get more cores, the game designers will put those extra processing units to good use. Although the framerate might not increase, the game can now contain more graphic detail and a higher resolution due to the extra processing power. This is Gustafson’s law in action. When we increase the resources, there is an expectation of an increase in the system’s capabilities and the developers would make good use of the extra processing power.

Summary

  • Concurrent programming allows us to build more responsive software.
  • Concurrent programs can also provide increased speedup when running on multiple processors.
  • We can also increase throughput even when we have one processor if our concurrent programming makes effective use of the input/output wait times.
  • Go provides us with goroutines which are lightweight constructs to model concurrent executions.
  • Go provides us with abstractions, such as channels, that enable concurrent executions to communicate and synchronize.
  • Go allows us the choice of building our concurrent application either using concurrent sequential processes (CSP) model or alternatively using the classical primitives.
  • Using a CSP model, we reduce the chance of certain types of concurrent errors; however, certain problems can run more efficiently if we use the classical primitives.
  • Amdahl’s Law tells us that the performance scalability of a fixed-size problem is limited by the non-parallel parts of an execution.
  • Gustafson’s Law tells us that if we keep on finding ways to keep our extra resources busy, the speedup should continue to increase and not be limited by the serial part of the problem.

Learn more about the book here.

--

--

Manning Publications
CodeX
Writer for

Follow Manning Publications on Medium for free content and exclusive discounts.