VLSI Digital Signal Processing- Overview on Parallel Processing and Pipelining

Laksh Maheshwari

Published in

vlsi in dsp

2 min readMay 31, 2021

Using VLSI technology, DSP algorithms can be prototyped in many ways. These options include:

(i) Single or multi-processor programmable digital signal processors

(ii) Use of core programmable digital signal processor with customized interface logic

(iii) Semi-custom and field-programmable gate-array implementations

(iv) Full-custom dedicated hardware implementation.

High-level algorithm and architecture transformations can play an important role in improving the performance of DSP systems in all of these implementation approaches. In the context of programmable processors, high-level transformations can lead to more efficient compiled code and a reduction in the number of programmable processors required in a multiprocessor environment. On the other hand, for custom implementations, the transformations can lead to a reduction in silicon area and/or power consumption. Thus, it is important to both understand the effect of these transformations on DSP circuits and integrate these transformations in DSP hardware and software synthesis systems.

Straightforward use of pipelining and parallel processing can increase the concurrency in systems that do not contain any feedback loops.
Pipelining can lead to a reduction in the critical path by placing latches at appropriate feed-forward locations. This reduction can be exploited to operate the system with higher speed. In other words, pipelining transforms a topology (containing no feedback loops) to an equivalent form that is now suitable for a high-speed application while the original topology cannot meet the speed demands of the application.
Pipelining reduces the critical path at the expense of an increase in latches and the input-output delay referred to as system latency.
Retiming can also improve the concurrency by moving delays around the system. Unlike pipelining, retiming does not alter the system latency.
Parallel processing is another approach to increasing concurrency. In parallel processing, multiple inputs are processed to generate multiple outputs. In systems with no feedback loops, this involves duplication of hardware. While the clock speed is not increased by this approach, the sample speed is increased. This is because the parallel system processes multiple inputs at the same time as the sequential system and the sample period, which is the ratio of the clock period to the number of samples processed per clock cycle, is reduced in a parallel implementation.
Pipelining and parallel processing can also be used to reduce power consumption without increasing the system speed. In pipelining, the reduced critical path can be charged or discharged at the same speed by using a lower charging/discharging current or by lowering the supply voltage.
In parallel processing, the same critical path can be charged or discharged in L times longer time where L is the number of samples processed in a clock cycle such that the sample rate is still the same as the sequential system. This also requires lower supply voltage which then leads to lower power consumption.

Parhi, K. K. (n.d.). VLSI digital signal processing education. Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers. doi:10.1109/acssc.1994.471669

VLSI Digital Signal Processing- Overview on Parallel Processing and Pipelining

Written by Laksh Maheshwari