Comparative study of serial and parallel processing in python
“At 128 cores, Altra Max has double the number of cores of AMD’s 64-core EPYC Rome processors but the same number of threads, 128, because Altra Max doesn’t support hyperthreading. But even without hyperthreading, the company said it can provide “ideal performance scaling.”
A little bit of google search gave this result. From this result, we can fairly say that the growth of technology has been exponential, especially in recent days. Nowadays as a developer, we often try to maximize our resources to deal with a large amount of data. In Big Data, world space complexity is not a factor anymore but still, the programmers have to deal with time complexity. Hence the concept of serial processing is becoming obsolete. Parallel processing is nothing but a computer using more than one CPU at a time. Before jumping into the technical terms let’s understand the topic in layman’s term.
As we all know that every human being has five sense organs and when a person is using two or more than two sense organs those are actually multiprocessing. For example, a person is listening to a song as well as having lunch. Though some people will call it multitasking, in the technical field the definition of multitasking is quite different from multiprocessing.
Just like sense organs nowadays in computers, there are multiple cores available. Whenever a programmer writes a code and the code gets executed by multiple processors parallelly, then it is called parallel programming.
Often people get confused over multithreading and multiprocessing, both of the methods deal with different concepts like the first one works on concurrency and the latter one works on parallel programming.
Basically, multithreading creates an illusion that all the threads are running parallelly. Actually, those threads are running in a concurrent manner. But in the case of multiprocessing, the code is completely using the CPU cores and each core is running parallelly. Both of these two methods are being used to improve the performance of the system but in separate situations. When the code demands a lot of computation then Multiprocessing is being used and if the code requires a lot of I/O operations or networks then Multithreading is the best option for its low overhead.
Recently, I was dealing with Hyperspectral Images for remote sensing data analysis. Unlike RGB images (3 bands) it has more than 100 bands or spectrum. For the project, I have to denoise the image spectrum wise. Initially, a linear method was being used with a for loop.
for i in range(padded_X.shape[0]):waveletX[i]=sp.spec_trans(threshold_val,wav_fam,thresh_type,padded_X[i])
here each band is denoised serially and the time taken by the system is:
total time taken for serial processing: 46.92178964614868
Here I was dealing with an image that had 103 bands. As a small file (162MB) is taking this amount of time for serial processing so we can imagine a much larger file (more than 1GB) can huge time if the linear method is being followed. Hence I moved on the parallel programming. In python, there is a package called “Multiprocessing”. Using this package parallel computation is being achieved.
p = mp.Pool(4)# first 3 arguments to spec_trans will be wav_fam, threshold_val and thresh_typeworker = partial(spec_trans, wav_fam, threshold_val, thresh_type)suitable_chunksize = compute_chunksize(4, padded_X.shape[0])transformedX = list(p.imap(worker, padded_X, chunksize=suitable_chunksize))
here, an object is being created by passing the no of processor the program will use. Then the function is passed in the imap function (just like imap function from itertools) with a chunk size.
imap
has an advantage over map
when the length of the data is too large and not already a list. For map
to compute a suitable chunk size it would first have to convert the iterable to a list just to get its length and this could be memory inefficient. But the length (or approximate length) of the data is already known, then by to imap will be a better option by setting a chunk size without having to convert the iterable to a list.
def compute_chunksize(pool_size, iterable_size):chunksize, remainder = divmod(iterable_size, 4 * pool_size)if remainder:chunksize += 1return chunksize
here each band is denoised parallelly and the time taken by the system is:
total time is taken for parallel processing: 12.22811770439148
time comparison of serial and parallel processing(time for the same operation)
In both cases, the wavelet transformation using the pywt package of python is performed on the same dataset of Pavia University.
The comparison for both the serial and parallel processing is:
References: