How fast is an FPGA in image processing?

Aditya Patil
Image processing using FPGA
2 min readMar 23, 2021

An FPGA is a field programmable gate array. A lot of applications in image processing have high inherent parallelism, and the data width of many operations is less than 16 bit. FPGA can execute those operations in parallel by configuring dedicated circuits for each application. Large number of internal memory banks on FPGAs also support this parallel processing by enabling parallel accesses to several hundred data which are cached in them. Because of this high parallelism, FPGAs show extremely high performance in image processing in spite of their low operational frequency. In order to achieve high performance using a hardware platform with higher operational frequency, graphics processing units (GPUs) have also been used and shown very good performance in some applications. However, they are originally designed for a specific sequence of operations, and it is difficult to realize high parallelism in various applications. Microprocessors have also supported SIMD instructions for parallel processing, and it becomes possible to execute a SIMD instruction for 128-bit data in one clock cycle in the recent processors. These processors also support multicores, and each core can execute SIMD instructions independently. Furthermore, the cache size is large enough for storing all image data for each core. Because of these progresses in the processors, it becomes possible to realize extremely high performance in image processing. However, the programming using these SIMD instructions is very tricky, and the performance varies considerably according to programming skill. We have implemented several applications in image processing on FPGAs and tried to achieve the highest performance by minimizing the number of operations and memory accesses. The methods used for the designs can also be used for the programming using SIMD instructions. In this paper, we try to make it clear how fast is an FPGA compared with the recent processors with SIMD instructions and multi-cores. We can compare the performance using three applications; two-dimensional filters, stereo-vision, and k-means clustering. In these applications, the performance by an FPGA can be improved using larger FPGAs. The comparison is discussed from the view point of the problem size, FPGA size and memory bandwidth.

--

--