Mastering Concurrent Processing: A Step-by-Step Guide to Building a Scalable Worker Pool in Go

Sourav Choudhary
7 min readMay 26, 2024

--

(process 10k requests per sec)

Connect me on LinkedIn 🤝 to craft scalable systems

In this blog post, we’ll dive into building a scalable worker pool in Go. This implementation efficiently manages a pool of workers to handle a large number of requests while dynamically scaling the number of workers based on the load. We’ll also discuss potential pitfalls and how to avoid them.

Overview

We’ll create a worker pool with the following capabilities:

  • Dynamically scale the number of workers based on load.
  • Handle incoming requests with a timeout and retry mechanism.
  • Gracefully shut down workers.

Here’s the complete code, followed by an explanation and documentation for each part.

go worker pool

Dispatcher

The dispatcher is responsible for managing the workers and distributing the incoming requests among them. It can dynamically add or remove workers based on the current load and ensures a graceful shutdown of all workers.

  • AddWorker: Adds a new worker to the pool and increments the worker count. The worker is launched to start processing requests.
  • RemoveWorker: Removes a worker from the pool if there are more than the minimum required workers. The worker is signaled to stop via the stopCh channel.
  • ScaleWorkers: Dynamically adjusts the number of workers based on the load. If the load exceeds a threshold and there are fewer than the maximum allowed workers, a new worker is added. If the load is below the threshold and there are more than the minimum required workers, a worker is removed.
  • LaunchWorker: Launches a worker and increments the worker count. This is typically used for the initial set of workers.
  • MakeRequest: Adds a request to the input channel. If the channel is full, the request is dropped, and a message is logged.
  • Stop: Gracefully stops all workers. It waits for all workers to finish processing their current requests. If the timeout is reached, it forcefully stops all workers.

Worker

The Worker struct represents a worker that processes requests. Each worker runs in its own goroutine and listens for incoming requests on a channel.

  • LaunchWorker: Launches the worker in a separate goroutine. The worker processes incoming requests until the input channel is closed or it receives a stop signal.
  • processRequest: Processes a single request. It retries the request up to the specified maximum retries if an error occurs or if the request times out.

The Code

// struct.go

package workerpool

import "time"

// Request represents a request to be processed by a worker.
type Request struct {
Handler RequestHandler
Type int
Data interface{}
Timeout time.Duration // Timeout duration for the request
Retries int // Number of retries
MaxRetries int // Max number of retries
}

// RequestHandler defines a function type for handling requests.
type RequestHandler func(interface{}) error
// interface.go 
package workerpool

import "context"

// WorkerLauncher is an interface for launching workers.
type WorkerLauncher interface {
LaunchWorker(in chan Request, stopCh chan struct{})
}

// Dispatcher is an interface for managing the worker pool.
type Dispatcher interface {
AddWorker(w WorkerLauncher)
RemoveWorker(minWorkers int)
LaunchWorker(id int, w WorkerLauncher)
ScaleWorkers(minWorkers, maxWorkers, loadThreshold int)
MakeRequest(Request)
Stop(ctx context.Context)
}
// worker.go

package workerpool

import (
"context"
"fmt"
"sync"
"time"
)

// Worker represents a worker that processes requests.
type Worker struct {
Id int
Wg *sync.WaitGroup
ReqHandler map[int]RequestHandler
}

// LaunchWorker launches the worker to process incoming requests.
// It runs in a separate goroutine, continuously listening for incoming requests on the input channel.
// The worker gracefully stops when either the input channel is closed or it receives a stop signal.
func (w *Worker) LaunchWorker(in chan Request, stopCh chan struct{}) {
go func() {
defer w.Wg.Done()
for {
select {
case msg, open := <-in:
if !open {
// If the channel is closed, stop processing and return
// if we skip close channel check then after closing channel,
// worker keep reading empty values from closed channel.
fmt.Println("Stopping worker:", w.Id)
return
}
w.processRequest(msg)
time.Sleep(1 * time.Microsecond) // Small delay to prevent tight loop
case <-stopCh:
fmt.Println("Stopping worker:", w.Id)
return
}
}
}()
}

// processRequest processes a single request.
func (w *Worker) processRequest(msg Request) {
fmt.Printf("Worker %d processing request: %v\n", w.Id, msg)
var handler RequestHandler
var ok bool
if handler, ok = w.ReqHandler[msg.Type]; !ok {
fmt.Println("Handler not implemented: workerID:", w.Id)
} else {
if msg.Timeout == 0 {
msg.Timeout = time.Duration(10 * time.Millisecond) // Default timeout
}
for attempt := 0; attempt <= msg.MaxRetries; attempt++ {
var err error
done := make(chan struct{})
ctx, cancel := context.WithTimeout(context.Background(), msg.Timeout)
defer cancel()

go func() {
err = handler(msg.Data)
close(done)
}()

select {
case <-done:
if err == nil {
return // Successfully processed
}
fmt.Printf("Worker %d: Error processing request: %v\n", w.Id, err)
case <-ctx.Done():
fmt.Printf("Worker %d: Timeout processing request: %v\n", w.Id, msg.Data)
}
fmt.Printf("Worker %d: Retry %d for request %v\n", w.Id, attempt, msg.Data)
}
fmt.Printf("Worker %d: Failed to process request %v after %d retries\n", w.Id, msg.Data, msg.MaxRetries)
}
}
// dispatcher.go

package workerpool

import (
"context"
"fmt"
"sync"
"time"
)

// ReqHandler is a map of request handlers, keyed by request type.
var ReqHandler = map[int]RequestHandler{
1: func(data interface{}) error {
return nil
},
}

// dispatcher manages a pool of workers and distributes incoming requests among them.
type dispatcher struct {
inCh chan Request
wg *sync.WaitGroup
mu sync.Mutex
workerCount int
stopCh chan struct{} // Channel to signal workers to stop
}

// AddWorker adds a new worker to the pool and increments the worker count.
func (d *dispatcher) AddWorker(w WorkerLauncher) {
d.mu.Lock()
defer d.mu.Unlock()
d.workerCount++
d.wg.Add(1)
w.LaunchWorker(d.inCh, d.stopCh)
}

// RemoveWorker removes a worker from the pool if the worker count is greater than minWorkers.
func (d *dispatcher) RemoveWorker(minWorkers int) {
d.mu.Lock()
defer d.mu.Unlock()
if d.workerCount > minWorkers {
d.workerCount--
d.stopCh <- struct{}{} // Signal a worker to stop
}
}

// ScaleWorkers dynamically adjusts the number of workers based on the load.
func (d *dispatcher) ScaleWorkers(minWorkers, maxWorkers, loadThreshold int) {
ticker := time.NewTicker(time.Microsecond)
defer ticker.Stop()

for range ticker.C {
load := len(d.inCh) // Current load is the number of pending requests in the channel
if load > loadThreshold && d.workerCount < maxWorkers {
fmt.Println("Scaling Triggered")
newWorker := &Worker{
Wg: d.wg,
Id: d.workerCount,
ReqHandler: ReqHandler,
}
d.AddWorker(newWorker)
} else if load < 0.75*loadThreshold && d.workerCount > minWorkers {
fmt.Println("Reducing Triggered")
d.RemoveWorker(minWorkers)
}
}
}

// LaunchWorker launches a worker and increments the worker count.
func (d *dispatcher) LaunchWorker(id int, w WorkerLauncher) {
w.LaunchWorker(d.inCh, d.stopCh) // Pass stopCh to the worker
d.mu.Lock()
d.workerCount++
d.mu.Unlock()
}

// MakeRequest adds a request to the input channel, or drops it if the channel is full.
func (d *dispatcher) MakeRequest(r Request) {
select {
case d.inCh <- r:
default:
// Handle the case when the channel is full
fmt.Println("Request channel is full. Dropping request.")
// Alternatively, you can log, buffer the request, or take other actions
}
}

// Stop gracefully stops all workers, waiting for them to finish processing.
func (d *dispatcher) Stop(ctx context.Context) {
fmt.Println("\nstop called")
close(d.inCh) // Close the input channel to signal no more requests will be sent
done := make(chan struct{})

go func() {
d.wg.Wait() // Wait for all workers to finish
close(done)
}()

select {
case <-done:
fmt.Println("All workers stopped gracefully")
case <-ctx.Done():
fmt.Println("Timeout reached, forcing shutdown")
// Forcefully stop all workers if timeout is reached
for i := 0; i < d.workerCount; i++ {
d.stopCh <- struct{}{}
}
}

d.wg.Wait()
}

// NewDispatcher creates a new dispatcher with a buffered channel and a wait group.
func NewDispatcher(b int, wg *sync.WaitGroup, maxWorkers int) Dispatcher {
return &dispatcher{
inCh: make(chan Request, b),
wg: wg,
stopCh: make(chan struct{}, maxWorkers), // Buffered channel to prevent blocking on stop
}
}
// main.go
package main

import (
"context"
"fmt"
"runtime"
"sync"
"time"
wp "workerpool/workerpool"
)

func main() {
// Set GOMAXPROCS to the number of available CPUs
numCPU := runtime.NumCPU()
runtime.GOMAXPROCS(numCPU)
fmt.Printf("Running with %d CPUs\n", numCPU)

// Configuration
bufferSize := 50000
maxWorkers := 20
minWorkers := 3
loadThreshold := 40000
requests := 50000

var wg sync.WaitGroup
dispatcher := wp.NewDispatcher(bufferSize, &wg, maxWorkers)

// Start initial set of workers
for i := 0; i < minWorkers; i++ {
fmt.Printf("Starting worker with id %d\n", i)
w := &wp.Worker{
Wg: &wg,
Id: i,
ReqHandler: wp.ReqHandler,
}
dispatcher.AddWorker(w)
}

// Start the scaling logic in a separate goroutine
go dispatcher.ScaleWorkers(minWorkers, maxWorkers, loadThreshold)

// Send requests to the dispatcher
for i := 0; i < requests; i++ {
req := wp.Request{
Data: fmt.Sprintf("(Msg_id: %d) -> Hello", i),
Handler: func(result interface{}) error { return nil },
Type: 1,
Timeout: 5 * time.Second,
}
dispatcher.MakeRequest(req)
}

// Gracefully stop the dispatcher
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
dispatcher.Stop(ctx)
fmt.Println("Exiting main!")
}

Main

The main function initializes the dispatcher and workers, sends requests to the dispatcher, and stops the dispatcher gracefully.

  • Sets GOMAXPROCS to the number of available CPUs.
  • Initializes the dispatcher and starts the initial set of workers.
  • Sends requests to the dispatcher.
  • Gracefully stops the dispatcher with a timeout.

We’ll adjust parameters such as context timeout, buffer size, and minimum/maximum workers to maximize requests per second (RPS) and improve application performance.

Stay tuned for practical insights and real-world examples!

If you read uptill now, then I hope you liked this article and if you like this article then please Clap, as it motivates me to help the community.

Please comment if you found any discrepancy in this article or if you have any question related to this article.

Thank You for your time.

Connect me on LinkedIn 🤝

--

--