Performance Benchmarks of Serial and Parallel loops in R

3 min readJul 24, 2016

Recently and only recently, I have been exposed to large data structures, objects like data frames that are as big as 100MB in size (if you don’t know, you can find out the size of an object with object.size). When you come from another background to R, you are mostly used to for loops or foreach loops, however I have come across the beauty of expressiveness of lapply loops however the apply functions are generally slow, but in this post I’ll show you some ways to boost the performance of you loops in R.

Note: The reason for using doParallel package in this post (and not the parallel package) is that the older parallel package, parallelization was not working entirely on Windows and you had to write different code for it to work. The doParallel package is trying to make it happen on all platforms: UNIX, LINUX and WINDOWS, so it’s a pretty decent wrapper.

Case: Calculating Prime Numbers

Let’s imagine that we are computing prime numbers of 10 up to 100000 using the following function:

getPrimeNumbers <- function(n) {  
   n <- as.integer(n)
   if(n > 1e6) stop("n too large")
   primes <- rep(TRUE, n)
   primes[1] <- FALSE
   last.prime <- 2L
   for(i in last.prime:floor(sqrt(n)))
   {
      primes[seq.int(2L*last.prime, n, last.prime)] <- FALSE
      last.prime <- last.prime + min(which(primes[(last.prime+1):n]))
   }
   which(primes)
}

The function is taken from:http://stackoverflow.com/questions/3789968/generate-a-list-of-primes-in-r-up-to-a-certain-number

Now let’s see the code for each type of loops and finally compare the runtime performance in each loop type and how they perform.

for (serial)

This is one way of achieving our goal using a for loop:

result <- c()  
index <- 10:100000  
for (i in index) {  
  result[[i]] <- getPrimeNumbers(i)
}

The for loop finished in 55.4708 seconds in average of 10 runs.

lapply (serial)

Now let’s look at lapply:

index <- 10:100000  
result <- lapply(index, getPrimeNumbers(prime))

lapply ran on average in 57.00911. you might also agree that it makes your code much more beautiful. This is no secret that the apply function is slower in R than native for or for each loops.

doParallel::parLapply

Now let’s go parallel and use doParallel’s parLapply:

library(doParallel)  
no_cores <- detectCores() - 1  
registerDoParallel(cores=no_cores)  
cl <- makeCluster(no_cores, type="FORK")  
result <- parLapply(cl, 10:10000, getPrimeNumbers)  
stopCluster(cl)

The same loop took only 19.38573 in 10 runs. Now, remember that detectCores() finds how many cores you have on your CPU and just to be safe, I used one less core. Also, make sure to invoke stopCluster so you free-up resources.

doParallel::foreach

Now let’s do the same using doParallel’s foreach:

library(doParallel)  
no_cores <- detectCores() - 1  
cl <- makeCluster(no_cores, type="FORK")  
registerDoParallel(cl)  
result <- foreach(i=10:10000) %dopar% getPrimeNumbers(i)

doParallel.foreach is very fast. The loop only took 14.87837 seconds on average of 10 runs!

doParallel::mclapply

The last function I am going to show-case is the easy-to-use but not-very-impressive, mclapply from doParallel:

library(doParallel)  
cores <- detectCores() - 1  
mclapply(10:10000, getPrimeNumbers, mc.cores=cores)

Although you don’t need to create clusters like other functions of doParallel package (foreach or parLapply), it runs on average in 42.62276 seconds, slightly better than the serial for loop while using more cores which turns out to be inefficient in this case.

Results

Now let’s visualize the result using the amazing ggplot2 so that we can see it in a more humanly understandable ways:

as you can see, our winner(s) are the doParallel’s parLapply and foreach. They both reduce runtime less than 50% which is considered a great improvement.