If a goroutine create a new goroutine, which one would scheduler pick up first?

Genchi Lu
3 min readJan 30, 2019

--

Golan’s concurrency mechanism is roughly as shown below:

context(P) would pick up one goroutine(G) from queue,then arrange it to be executed by an OS thread(M). When one thread’s queue is empty, context would steal another goroutine from other thread’s queue to be executed by that thread.

So the problem is: when one goroutine create a new goroutine, which one would scheduler pick up first?

The book Concurrency in Go says:

when creating a goroutine, it is very likely that your program will want the function in that goroutine to execute. It is also reasonably likely that the continuation from that goroutine will at some point want to join with that goroutine. And it’s not uncommon for the continuation to attempt a join before the goroutine has finished completing. Given these axioms, when scheduling a goroutine, it makes sense to immediately begin working on it.

Base on that, if a goroutine was created and wait for the source goroutine executing code and joining it, go scheduler must waste some additional time for the thread to switch between goroutines, Since go scheduler would pick new goroutine to execute.

That’s interesting! So I write some code try to experiment the scenario, code as below:

  1. function consume receives a channel and then call a goroutine to select that channel.
  2. function produce receive a channel and then do whatever math operator, pass the result to that channel.
  3. function BenchmarkProducerFirst would create a channel named buffer, pass buffer to function produce and loop 100000 times, finally pass buffer to function consume.
  4. In contrast, BenchmarkComsumerFirst would pass buffer to function consume, and then loop function produce 100000 times.

I run the benchmark on MacBook Pro 2017, and there is the result:

BenchmarkProducerFirst-4        2000000000               0.06 ns/op
BenchmarkComsumerFirst-4 2000000000 0.15 ns/op

To my surprise, BenchmarkComsumerFirst is slower nearly 4 times then BenchmarkProducerFirst, since I merely rearrange the order of consume and produce.

Use go tool trace to observe Scheduler latency profile:

Scheduler latency profile of BenchmarkProducerFirst
Scheduler latency profile of BenchmarkComsumerFirst

It shows that Scheduler latency of BenchmarkProducerFirst is obviously better then BenchmarkComsumerFirst, it is 9834345.77us vs 306062.93ms.

In practice, I don’t think it’s critical to affecting performance unless there is an extreme requirement in your system. It’s just a small experiment try to prove the concept of the algorithm of go scheduler.

--

--