DFLAP — Dynamic Frequency Linear Array Processor
What is an Array Processor?
An array processor or vector processor is a CPU (Central Processing Unit) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors, compared to the scalar processors, whose instructions operate on single data items. They can greatly improve performance of certain kinds of workloads, notably numerical simulation, and similar tasks.
In this blog we will talk about the concept of dynamic frequency clocking and the design of a linear VLSI array processor called DFLAP. It is used in image processing applications. As the term dynamic frequency clocking suggests the clock frequencies of the system adjust dynamically depending on the instruction set being executed which helps in better management of throughput and power. In reconfigurable architectures, frequencies are switched as the circuit changes while in the dynamic clocking strategy, frequency switching occurs based on the units being used. Different components have different critical path delays and therefore have a different clock speed. This paper has proposed a design where the multiplier has a 50MHz clock, the RAM has a 100 MHz clock, the shifter and the logical unit have a 200MHz clock and the adder with a 400 MHz clock.
So, the DFLAP uses a clock divider to create many clock frequencies based on a master clock. The dynamic clocking unit reads the instruction fields of the processor and generates control signals. These control signals select the appropriate output from the clock divider as the clock signal. When more than one operation is being carried out in the PE (Processing Element) simultaneously, the dynamic clocking unit chooses the frequency permitted by the slowest unit being used.
Figure 2 shows the DCU which is a series of cascaded clock divider stages whose inputs are controlled by the pass logic. The output of one clock divider stage is presented at the input of the next stage when the pass logic is enabled. The pass logic block is controlled by a set of signals, E[2:0], generated by the Enable Encoder. Based on the instruction class, the appropriate pass logic blocks are activated by the enable encoder. For instance, a particular instruction class, encoded as S[1:0] = 10 by the instruction encoder, has an operating frequency of 100 MHz associated with it. This means that the master clock of 400MHz should be divided by 2 twice to obtain the 100 MHz clock, S[1:0] = 10 is translated into E[2:0]= 110 by the enable encoder. The master clock thus gets routed through the first two divide-by-2 stages resulting in the 100MHz clock. This 100MHzclock is selected at the multiplexing logic by the S[1:0] signal. The pass logic ensures that the selected clock always starts with a positive edge. The design was implemented using Cadence design tools and verified through Veritime Simulations.
DFLAP has an array of N 8-bit Processing Elements (for a R x N image) and a host which provides the data and instructions to the array. Instead of having N Processing Elements ,the algorithms can also be partitioned & mapped on a fixed size linear array. Microinstructions are broadcast to the PES, which operate in a Single Instruction Multiple Data mode. The communication controller between the host and the array controls the rate at which the instructions are broadcasted and acts as a buffer. The Processing Elements array operates on the pixels of the current row of the image data. the next row of data is brought into the Processing Elements in parallel by the neighbor communication unit. After processing all the data is transferred back to the host computer for analysis.
Here is a table which shows some results of the DFLAP architecture against its competitors. This is not a very fair comparison because of differences in technology and architectures and this is bound to happen as technology keeps upgrading constantly.
1] Vijaykrishnan, N., Ranganathan, N., & Bhavanishankar, N. (n.d.). DFLAP: a dynamic frequency linear array processor. Proceedings of 3rd IEEE International Conference on Image Processing. doi:10.1109/icip.1996.561076