Concurrency in Three Flavors

Lam Chan
6 min readFeb 22, 2017

--

I’ve been working with Golang this past week and it really got me thinking about how different languages manage control flows when combining concurrency with asynchronous typed task.

Concurrency

Definition from Wikipedia

concurrency is the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units

We tend to use concurrency to help solve many problems in software especially when the problem is big. We leverage concurrency to break down a big problem into many smaller problem which we piece back together. This process is called map-reduce. The benefit is revealed when all of these smaller problems can be solved at the same time.

An example of this can be seen in Elasticsearch and the sharding techniques for the search indices. When there is a big index, this body can be broken apart as shards across a cluster of nodes. Now each node only has the responsibility to lookup search request for what it holds. When a search request comes in, every node takes part in the lookup at the same time. When all results return from each node, a reduction process takes place to stitch the list back to the caller.

Concurrency and Back Pressure

When building for concurrency, it is important to understand processing velocity in order to handle back pressure. Processing velocity is the speed at which a system can handle request. When designing a middle man component for a data pipe such as a stream processor, awareness is necessary around how fast requests are made versus how fast the downstream systems can take the results. In the event of back pressure, strategies need to be created to handle the imbalance. Strategies may include buffering, signals for back off to upstream clients, or internal rate limit cap.

The Problem and Solution

The problem I worked on involved correcting for a bug in S3. When you upload objects using pre-signed urls, and ACL is not declared as part of the PUT request. The final object that is stored in S3 is not assigned an owner. This object ends up in an orphaned state that can not be operated on. The only way to repair it is to copy the object out, delete the original and PUT it back in with the corrected ACL.

We have around several million objects in a single bucket with objects of variable size. Ideal processing of this would be anything under a day of compute.

This was when I chose to try Golang having heard of great I/O and low compute overhead. I ended up picking up Golang within a day and engineered a throw away application in another day. The application achieved the repair of our S3 bucket in 40 minutes of time and was processing at a good clip of 256 MB/sec. What amazed me was how easy it was to write a fully concurrent application in a new language operating at this performance level.

Golang Channels

Two things define concurrency in Golang. Goroutines and channels, are incredible when used correctly in combination. Goroutines are decorators you can use around a function to execute it in the context of a separate *light* thread. An example of this being useful is when you need to spawn a dispatcher to create background workers to listen to a queue. With all threads that are created, there is going to be a time when you need the context to be coalesced back together. This is when channels come into play. Golang has a lot of language features built around channels as a first class citizens. The for loop is a good example of this where you can setup a loop to wait on items to come off a channel indefinitely.

This example above represents a job dispatcher and it highlights a for loop with a select expression. The for loop runs indefinitely and is conditionally held together by the select statement which dequeues job off the Job queue. When that happens, an IIFE is created and executed as a goroutine. Once the IIFE starts running, a worker is dequeued to process the job. When finished, the worker is enqueued back into the worker pool.

Let’s take focus on make(chan Job) and mak(chan Worker, 40). These expressions are invoking the creation of two channels for the job and worker pool. They are different because the channel for jobs do not have a upper limit in how many items it can hold. But the channel for the worker pool does have a capacity of 40 which helps us rate limit how fast we can process which is at the processing limit (CPU/Cores) of the system we are running on. When a job is dispatched, the goroutine is limited by the amount of workers that are available so this limits how many jobs we can process in parallel.

As the worker processes the job, the results from the job can be sent back on another channel to another component handling the merging (reduction) of the results to the caller or sent further down stream.

NodeJs Event Loop

Concurrency in NodeJs is a sticky subject because the user space your application runs in is a single threaded context. This user space is also known as the event loop which is akin to co-routines or co-threading. The participants (your code) of this event loop requires that all actors (functions) be good citizens and share the resource evenly to prevent blockages. When the aforementioned is implemented, the event loop is very fast because there is no threading and hence no context switching at the lower level. This methodology is akin to how a few low level TCP interfaces (Akka) are written.

NodeJs treats asynchronous concurrency as a first class citizen. Users can pick from several styles of available control flow. Here, we will talk about two.

Async

Native Node.js callback style was the first built in control flow to handle asynchronous concurrency. Basically when the task is done, a callback function is invoked to reconcile back to the caller the result(s) or error(s).

The listJobs() method returns a bunch jobs. Using the asynclib callback control flow library, we use it to help us map the collection of jobs into a process that does asynchronous processing (does not require resource from the event loop). The Async.mapLimit() takes in a number for how many of these jobs to execute concurrently. This is similar to the Golang example with how many workers are active at once.

Promises

Promises were added as a follow up support in Node.js to modify the control flow for asynchronous programming resemble more like synchronous styles.

The concerning difference in this example is the control flow framework used where we replaced previous Async with Promise. The map() method still takes in a max concurrency number to handle how many task we can at most process at one time.

.Net TPL

The advent of LINQ, gave C# an easy way to handle collection and enabled lazy evaluation. A good side effect of LINQ that was not effectively promoted are the RX (Reactive Extensions) and TPL (Task Parallel Library) frameworks. I used TPL extensively to maximize concurrency for batch processing. The reason is that TPL leverages the concept of building blocks to allow for composability in a data processing chain. There are a myriad of built in data block control flows like ActionBlock, TransformBlock and JoinBlock to name a few.

This example is some what similar to the Node.js Promises and Golang example above. In TPL, we create these building blocks which perform our actions, but are linked together by self managed in memory queues. In Golang, we created channels to represent this. The concurrency restriction is declared as part of the block instantiation using MaxDegreeOfParallelism flag. The processBlock handles the job processing and the aggregationBlock handles the reduction. This aspect of having to feed in the jobs to the starting block is some what similar to Golang’s tying a channel to the dispatcher and feeding jobs into the channel.

Conclusion

I found the reflection of different control flow mechanism interesting in how close each framework/language resemble at times, but also how far these concepts diverged in style. In the end, it is about understanding the tools available to you and picking the right one for the job at hand.

About the Author

Lam Chan

Lam is a Software Architect for the Locals Squads @ XO Group. He is a seasoned polyglot engineer with over 16 years of professional
experience working with startups and multiple fortune 500 companies. When he is away from the office, he enjoys contributing to OSS
projects and dabbles with wood working projects. Find out more about Lam on LinkedIn

--

--