How Batch Processing Improve Your Service (Go)

Herry Gunawan
3 min readSep 18, 2018

--

Have you ever tried to copy 1024 files of 1 kilobyte to flash disk? Compare it to 1 file of 1024 kilobytes (1 megabyte). Which one faster? The answer is 1 file of 1024 kilobytes. In programming, things like this often occur. Let’s take an example. Let’s say we have clapping service like in medium. When the user click a “clap button”, Clap function will be called.

func Clap(articleID int){
addClapToDB(articleID)
}

This approach is not efficient, similar to 1024 files case. So, how can we make it more efficient? We can use batch processing to answer this question. So instead of we run addClapToDB to the database every time user hit the button, we can batch it and process it at once. After all, we don’t need it updated real-time in the database, we just need to make sure it looks real-time in the frontend.

Nonbatch - Batch Flow

How to do this in Go?

I made a simple library to do batch processing called gobatch. You could get it here: https://github.com/herryg91/gobatch. So let’s try to solve the problem using this library.

Currently, gobatch just support memory batch, which means the batch of data will be stored in memory until it ready to be processed. We can initialize it like this.

import    "github.com/herryg91/gobatch"batchSize := 50000
batchTime := time.Second*15
batchWorker := 1
mBatch := gobatch.NewMemoryBatch(
addClapToDB,
batchSize,
batchTime,
batchWorker,
)

When the batch ready to be processed? It depend on batchSize and batchTime config. In the example, there is 2 condition for gobatch run addClapToDB function:

  1. If the batch size reaches 50k data (service hit mBatch.Insert(data) 50k times)
  2. If it does not reach 50k data for 15 seconds.

Therefore with this rules, we can change the Clap function behaviour. Instead of adding the data to DB directly when Clap function called, we can add it to batch and let the library handle the process when it ready.

func Clap(articleID int) {
mBatch.Insert(articleID)
}

Last, we need addClapToDB function which will be processed when the batch ready. This function is the most important part because we can make the whole process efficient depends on the code inside this function. So this is the example.

func addClapToDB(workerID int, datas []interface{}) (err error) {
//key = article id, value = number of clap | count by articleID
mapOfClaps := map[int]int{}
// this reducing technique make the process more efficient
// because connection & update process to db is reduced.
for _, d := range datas {
if articleID, okParse := d.(int); okParse {
if _, okCheckMap := mapOfClaps[articleID]; !okCheckMap {
mapOfClaps[articleID] = 0
}
mapOfClaps[articleID]++
}
}
for articleID, score := range mapOfClaps {
updateToDB(articleID, score)
}
}

In the example, I can reduce the number of update data to DB which will make the service more efficient, since updating data to DB multiple times will be costly.

Test & Result

I test this scenario with this criteria:

  • 2 Million claps will be run
  • 10k articleID generated randomly for each clap
  • DB: redis

I tried it on 3 scenarios: nonbatch, gobatch with 50k batch size and gobatch with 1 million batch size (all gobatch scenario is using 1 worker). So the result is expected and gobatch can make the service much more efficient. So this is the result.

2018/09/14 00:12:48 Start MemBatch 1.000.000
2018/09/14 00:12:48 Start MemBatch 50.000
2018/09/14 00:12:48 Start NonBatch

2018/09/14 00:12:55 MemBatch 1.000.000 done in: 6.977398839 s
2018/09/14 00:14:09 MemBatch 50.000 done in: 81.56769467 s
2018/09/14 00:18:05 NonBatch done in: 316.993155308 s

By the way, you could see the full code here: https://github.com/herryg91/gobatch/tree/master/example/benchmark

--

--

Herry Gunawan

Lead Software Engineer (Personalization & Recommendation) @ Tokopedia | LinkedIn: https://id.linkedin.com/in/herryg91