A cup of Go’s concurrent programming for Python developers

Go routines, Courtesy: dot Go conference

Whenever someone works with Go’s go routines and channels, they first try to compare them with threads in other programming languages. Concurrent programming in programming languages like C, C++ is painful as my senior colleague Chaitra. M says “Oh man! You got a new tool called Go where all the burden in concurrent programming is uplifted! Today’s kids are taking things for granted”

Coming from Python and JS background, I can only think about asynchronous programming using Twisted and NodeJS. There are no true threads in Python. On top, we have GIL(Global Interpreter Lock) which never allows me to utilize multiple cores (on a minimum two core machine). After truly empathizing with the Go’s standpoint, I am attempting to instruct Python developers to think in terms of Go’s ideology. I think Go provides very high-level constructs for doing concurrent programming. For a Python guy like me, synchronous blocking code is common. Go is not a Yet Another Programming Language, but much more than that.

I am a full stack developer and author of the book “Building RESTFul web services with Go”.

As experts always say dynamic programming languages are good for web development where CPU operations are limited. I/O can be handled well by frameworks like Django, Express JS. Complex systems like flight/movie booking, real-time collaboration editor, and payments gateways need a critical design because of a lot of variables in the processing pipeline. Go can give powers to an average python/JS developer to think beyond traditional programming. Heed me! Designing complex systems is relatively easy in Go.

Most of the tutorials out there on the Go’s channels and go routines are rather confusing than being friendly. Here, I make an attempt to simplify the high-level constructs available in Go to do some serious tasks in few lines of code. I lay the foundation and build examples on top of that. It will be a fun ride, so stick with me.

Go’s concurrency is based on Communicating Sequential Processes(CSP). It is a formal language proposed by C.A.R Hoare. This world is full of sequential processes which can run in parallel to deal a single complete operation. The famous example is our washing machine. For example, In a washing machine, we place our clothes in the washer. We can then remove the first set of clothes and put them in the dryer. Then, add the second set of clothes to the washer. Now both washer and dryer are running in parallel. We are not going to discuss concurrency vs parallelism here. You already got one holy grail talk by Rob Pike here. https://talks.golang.org/2012/waza.slide#1

What we are going to talk here is to understand Go’s concurrency and how the Go language is different from other dynamic languages like Python, JavaScript. By the end of this article, the concepts of Go’s concurrency will be clear to you. When someone says “Go”, simpler solutions to complex problems will run in the back of our mind. Go can unleash all your CPU cores and can execute things in parallel. It sounds faster job completion, but it depends on what problem we are solving. All the time, we deal with concurrent problems which need to execute sequentially.

Concurrency is a better way of solving a sequential problem. We can achieve concurrency through parallelism when tasks are independent and not much communication needed. When tasks are dependent(Producer/Consumer) copying data to and fro onto different CPU cores makes the program run very slow and be a performance clog.

Parallel solutions works for problems which are intrinsically parallel. Concurrent solutions are better in majority of cases.

If you have a 4 core CPU, at a given time, you can make your computer execute a maximum of 4 different instructions. Even though you have hundreds of threads, only 4 threads can be busy while others are paused. In case of a single core, only one instruction at a time. Many developers mistakes that on a single core, multiple lightweight threads can do multiple things at a given time. It is not true. The execution-wise, only one thread can occupy the CPU time. There is an exception when I/O comes into the picture. Even on a single core, at a given time, one thread can occupy CPU and other can do I/O in parallel. The example is you can write a Python program to copy the standard input to standard output(console by default) and also compute some busy algorithm at the same time. One is the I/O, other is a CPU bound operation.

We do stick to the single core for now. Why are we bothering about all those details? The intention of this article is not to show why Single-core Vs Multi-core scales. It is to show even on a single thread, how Go’s concurrency is different from thread-based concurrency in other programming languages that makes us solve problems in a clear way.

Think it like this. A postmaster does stamping on letters. A postman brings letters to the office from postbox. A sender puts a mail letter in the postbox. All three are working concurrently. If postmaster does stamp all available posts, he waits for the postman to bring few more. If postbox is empty, postman waits for more posts to be placed in the postbox. A concurrent Go solution can be easily designed for this problem compared to a traditional thread-based solution.

Threading In Python

In Python threading package enables us to perform concurrent tasks. When you need to make concurrent I/O requests, or working with the combination of CPU + I/O, you use threads. Let us write a small example showing how to calculate Fibonacci and copy the STDIN to STDOUT at the same time.

This program is creating two long-running threads:

  1. One collects the user input and prints it back to the console
  2. Another one computes Fibonacci for random numbers

Here, I used the class syntax of the thread. We can also create a thread using an instance of threading.Thread by specifying a target function to run.

If you see, one is the CPU operation and other is an I/O operation. Both are running at the same time. The output is this.

If we have tens of threads we cannot call everyone using start method. In that case, we can append all threads to a list and call start method on that.

threads = []
for i in range(10):
t = threading.Thread(target=target)
threads.append(t)
t.start()

We can also run threads as a daemon, by passing the daemon=True keyword argument.

This problem is allowing Python to execute two things at a time. What if the problem needs you to execute things in parallel but on the shared data. Then few more things comes into the picture. Locks & Semaphores

A concurrent read operation is safe, but not write. In a traditional thread-based system, threads communicate using shared data. There, even for communication between threads, shared data is compulsory. One thread can only read after other thread writes it. We use a variable or object to communicate between threads. There comes the actual problem.

Gevent for basic concurrency

If a computing problem can be broken into individual pieces and has a lot of I/O interaction, we can easily bring concurrency into the picture. An example is zipping individual files using different workers in a directory. There is a light weight event library in Python called Gevent, which provides concurrency in Python by multiprocessing instead of threads. Gevent creates greenlets which can work independently. This is an excellent case while working with network sockets. We can also launch greenlets to apply the same operation on multiple data values.

This program is spawning multiple greenlets to work on different files and compressing them in parallel. Since they are independent of each other, it is easy for Gevent to fork the execution.

What if there is shared data? Then the same story repeats. You should use the locks & semaphores to communicate using shared memory. That is the reason Gevent provides the threading too. Remember, Gevent is very good at independent I/O operations.

We saw threads and greenlets in action. In Go, when communication is needed between threads it uses a different strategy as “Share data by communication, but do not communicate by sharing data”. To understand how this statement is justified in Go, we should know thread counterpart in go and few message passing mechanisms.

Note: for the above example, you need to install gevent library separately.
pip install gevent

Go routines

Go routines are similar to threads but they are lightweight co-routines. Unlike other programming languages, where a class or interface is provided to create a thread, here any valid function can be executed as a separate co-routine using go keyword.

I mean instead of doing this in Python

import threading
def target():
pass
t1 = threading.Thread(target=target)
t1.start()

we can simply do this in Go

func target() {
}
go target()

If you observe carefully, In Python’s case, we have a handle(t1) to the thread. Whereas in Go, we are just spawning a co-routine freely. You might have a doubt by now. How to access that thread(co-routine here) after some time?

We can do that using channels. In Go, you can communicate between different co-routines using a channel. You can also work with shared data like locking variables in Go when the situation demands it. But most of our trivial problems can be solved with co-routines and message passing between them through channels.

A Go routine can be in one of these two states:

  • Blocked (Waiting on a channel for someone to pass a message)
  • Active (Doing some work like writing to a channel, reading from channel, and calculating something etc)

Channels

Channels are the inbuilt types in Go. A channel is something like a UNIX pipe where you can pass a value from one process to the other. Instead of creating a reference to a go routine, in Go, we create a reference to a channel and establish an agreement between two or more go routines. This style is different from the traditional threading systems. In Go, we can create a variable using make function.

mych := make(chan int)

:= is an inference operator that declares a variable and infers the type from the RHS(Right Hand Side).

chan int means it is a channel that holds a value of type int. For your information, Go is a strongly typed language. You can create other types the similar way.

name := make(string)
age := make(int32)

A value can be written into a Go channel like this

name <-"Gopher"

Reading a value from a channel is like this

<-age

We saw how to create a channel, read and write from it. Now let us see a Go example which is similar to Python’s wait group. This example will not use any channel. Still, it can achieve concurrency using wait groups for independent tasks.

With the knowledge we got above, a simple STDIN to STDOUT copying program can be written like this.

The io.Copy function is used to copy text stream from reader to writer. This program runs only for 5 seconds because our main go routine is blocked while sleeping. After that, the program exits which also kills the child go routine.

Go routines working without Channels

Go provides a sync.WaitGroup to create a pool of go routines those execute independent tasks. We don’t need to care about how they start, align and, stop. We can add a go routine to the pool by waitgroup.Add function. We can then say waitgroup.Done in the go routine to notify the WaitGroup that it finished its task. The main go routine blocks on waitgroup.Wait() and whenever all go routines finish their tasks, main go routine proceeds from there. An example for parallel file compression similar to above Python’s version is here.

This program does compression of files which passed as command line arguments. A new go routine is spawned for each file. They run in parallel(because it is I/O) too. We didn’t use any channels in our program. Only Go routines and sync.WaitGroup. We can achieve the same result by using channels, but since go routines are not talking to each other much and doing independent tasks, we can safely multiplex their work using wait groups.

With Channels

Channels along with go routines are the building blocks of concurrency in Go. Go developer can do many great things with help of channels. Think a channel as a placeholder to store a value of a type. You can pass a channel into function as an argument, return from another etc. There are two parties involved in the channel operation at a given time.

  • Sender
  • Receiver

A Sender sends some data to the channel and receiver by default waits(blocks) on that data. They communicate to perform a task. Whenever something is written to the channel by the sender, the receiver gets it instantly. We form this contract at the design of the program. Let us re-write the above “copy STDIN to STDOUT until 5 seconds”.

This program executes the same way as above echo program, except it replaces time.Sleep with time.After. It is a function that returns a channel. time package creates that channel. That channel blocks wherever it is read(here in the main block on line 12). We already saw the syntax of “read from channel”.

<-time.After(5 * time.Second)

Who is writing the value to the channel? Go’s time package does! After 5 seconds, that channel is filled with some data which we are throwing away now but just using it as a notification system. Main go routines exits after reading value and program ends. This example is small. Let us create something useful. We can easily write a producer/consumer pattern in Go with the help of channels.

We can create two go routines for producing and consuming. But how to tell the consumer to wait until producer generates something? Some times, the consumer is busy and producer waits until that generated item is consumed.

The output is as follows.

The program is creating two go routines

  • Producer
  • Consumer

and two channels

  • isProducerDone — which notifies production is done
  • buffer — a channel that stores data to be written/read

Both go routines are declared with anonymous functions syntax just to avoid passing channels as function arguments. An Anonymous function inherits scope from its parent.

We can range over a channel, where the code infinitely gets blocked for listening updates on a channel.

for product := range buffer {
fmt.Println(“Consuming..”, product)
}

this is similar to Python’s range on iterable object.

for i in range(iterbale):
consume(i)

Producer go routine is generating an integer and writing it to the channel buffer. After that, it is sleeping for 5 seconds(we are using sleep to simulate a time taking blocking process).

Finally, in the main block, we are blocking on channel isProducerDone. When producer runs out of integers it writes to isProducerDone that will be received by our main program and program ends.

Note: If you observe here, we are not explicitly mentioning to the consumer that production is over. We are taking the advantage that if the main program ends, the consumer too will be killed automatically, hence write to isProducerDone. This is clever but not clean.

There could be another scenario where consumers are the last ones to exit and they should signal the main program to wait until they finish their jobs. Another scenario could be the worker queues. In that, producer fills queue and consumers(workers) takes data from the queue and starts processing them.

A single channel is good when we have a producer/consumer pattern. Go provides a special channel called buffered channel which can store n number of values in a channel. This kind of channel is blocked on the receiver when the buffer is empty and on sender when the buffer is full. Let us implement one.

Don’t worry about the code. I will explain it clearly. We are spawning 5 worker go routines to consume a queue or data pool that gets written by some producer. In the above program, a producer(for loop) is filling the buffered channel called jobs every 5 seconds. All worker go routines are listening to the channel and blocked initially. Whenever a data value is written to the the buffered channel, the first worker pulled the item and consumed it. It sleeps for 7 seconds(just analogy for a time taking process like DB call, External HTTP request etc). After 5 seconds one more data value is written and the second worker who is idle goes and picks it. Whenever a go routine(worker) finishes consuming data, they are available to pick the next value.

The most important thing here is when production is finished, we need to notify the main go routine to terminate the children. But if we put that logic in producer, few workers may die immaturely without finishing their tasks. We have a special keyword in Go to tell defer something until everyone is done.

defer close(jobs)

This statement tells that, close the buffered channel jobs, only after all the data is consumed i.e channel is empty. When a worker go routine which is ranging over the buffered channel will come out of the loop and writes a value to done channel to notify the main go routine to pack up things.

Implementing this in Python or any other language will take a considerable time with an increased chance of bugs creeping in. Go’s buffered channels are better suited for this kind of use cases.

Problem solving thought process

Whenever we are presented with a concurrent problem, in Go it is easy to think solution very quickly. There is a reason for that. Using normal channels and buffered channels we saw how to write producer/consumer and worker/Queues patterns respectively. There is one more construct which is very powerful in my opinion that can do crazy things in Go. That is the select keyword.

Select — A magical switch in Go

You might have seen a switch keyword in other programming languages. In Python, we don’t have any such thing, we manage things with multiple if/else statements. But, if you are familiar with C, Java, and, JavaScript, switch statement allows you to check a variable’s value with multiple conditions and qualifies only one condition. Go has both switch and select, which has the same functionality. Like how Go’s range keyword can be used to iterate over arrays, slices as well as channels, the select keyword can switch on channels. Channels are first class citizens in Go.

// Keep listening to channels
for {
select {
case <-chan1:
// if chan1 receives a value, do something
case <-chan2:
// if chan1 receives a value, do something
default:
// cleanup & housekeeping
}
}

The select statement blocks on all channels and if it gets value on one channel, it quits. So, we should use an infinite loop to keep on listening to the channels.

The select is like a giant octopus with n number of tentacles looking for a different fish to arrive in a channel. When it catches fish, you can consume it.

Why are we discussing select in the problem-solving thought process? We are going to see how combining the knowledge we gained above and by using select, we can solve real-world problems.

Problem: Design a flight booking system with a 15 minutes timeout for the user.

Let us write a small Go program to solve the above problem in a high level. We need to things.

  • User activity(Adding information, payment etc)
  • Timeout & cleaning up resources

There should be a sound design for clients and servers. It means whether an API request is blocking on client side vs server side. For example, we all know jQuery AJAX request can be asynchronous even though server call is blocking for that request. JavaScript can give a callback to that server request saying “Hey mate, whenever you are done with that long taking request, just execute this piece of code. It takes care of updating UI. It won’t block UI”. For our server design we assume the client does this:

  1. Whenever user books a ticket show him/her the GUI of booking status
  2. When user starts the process, make an API call to server and start a timer
  3. Display countdown timer on the screen
  4. If the operation is done under the maximum booking time, show user the success page, if not response returns with “Operation timed out” message.

Don’t hesitate to read the code, it is barely 80 lines. Give it a look. We are doing these things in the program.

  • Create a route and attach an HTTP handler
  • Whenever API is called with GET method with an ID, book ticket
  • Only allow user operations under the time limit
  • Use channels for communication
  • Use time.After to discontinue the user operation through timeout.
go func() {
status <- " Seat selection going on..."
// Use customer details to make DB queries, third party API/Service call
time.Sleep(5 * time.Second)
status <- " Making payments from bank..."
time.Sleep(5 * time.Second)
// Everything looks good. Notify customer
transactionSuccess <- true
defer close(transactionStatus)
}()

for {
select {
case update := <-status:
log.Println("ID:", customerID, update)
case <-timeout:
close(status)
log.Println("Operation timed out!")
isDone <- false
return
case <-transactionSuccess:
log.Println("ID:", customerID, "Successfully booked ticket!")
isDone <- true
return
}
}
}

This code is responsible for our core logic. We are selecting channels one is the status channel which keeps on sending the updates, timeout channel for tracking the maximum booking time. The transactionSuccess channel is used to notify the handler function that processing is done.

The time.Sleep function is used as a stub to DB queries on customerID. It could also be any third party API(REST/RPC) request. In other programming languages, it is really tricky to write this kind of timeout logic where we need to track the time instead of someone giving us a channel where we can put our exit logic.

If you see the sleep intervals, timeout is 15seconds(usually it will be 15 minutes in real-world scenarios). Our handler is finishing its job in 10 seconds(5 seconds sleep + 5 seconds sleep).

Running the above Go program starts an HTTP server. Now make two GET requests in two terminal windows.

curl -X GET http://localhost:8000/v1/flight-booking/1
curl -X GET http://localhost:8000/v1/flight-booking/2

The output log on console will be:

Both the CURL requests receives this response.

{“Status”:”success”,”Message”:”Dear passenger, your ticket is booked sucessfully!”}

Now modify the timeout to the 10s and re-run the program, you will see the following output.

{“Status”:”failure”,”Message”:”Operation timed out! Please try again”}

The reason is before the booking operation is successful, a timeout occurred resulting in a failure message. The beauty of this program is we didn’t use any external package to do it. Everything is inbuilt in Go.

Essence of Go

We saw how to implement simple concurrent programs in Go. We compared both Python’s thread-based concurrency and Go’s go routine plus channel based concurrency. The only difference is Go’s concurrency is derived on CSP(Communicating Sequential Processes). While designing a solution for a concurrent problem, think about sharing data using communication but not communicate using shared memory. Go provides sync.WaitGroup to execute independent go routines who rarely communicate. It also provides thread synchronization with sync.Mutex when multiple go routines are racing to share a variable which holds a database driver etc.

Go is indeed designed to write concurrent programs. Many complex problems can be solved in Go with great use of those constructs. There are few pitfalls too while working with channels like writing to a closed channel, making all go routines block in some state of application which can result in deadlock.

Hope you enjoyed the article.

I am a full stack developer and also wrote a book on building RESTFul web services in Go.