Google Summer of Code’18
FPGA based Image Focus-Peaking
This post gives detailed insight on my work under the mentorship of Apertus organisation during ‘Google’ summer of code’18 on open source hardware design of FPGA based Image focus peaking.
Google Summer of Code is a global program focused on bringing more student developers into open source software development. Students work with an open source organization on a 3 month programming project during their break from school.
The Apertus° builds free software (FOSS) and open hardware (OSHW), digital, motion picture technology for the professional production environment. The Apertus° project is based on software that’s free to be used for any purpose — Free to be studied, examined, modified and redistributed (which includes the prospect of users distributing their own modified versions of software and hardware).
I would like to thank Mr. Herbert Poetzl and the entire team of Apertus° Organisation for their guidance and support throughout the coding period.
What is Image Focus Peaking?
Focus peaking is an Image Processing Algorithm that highlights the high contrast edges of the scene/frame with a user defined bright color. This technique is used by the photographers to achieve perfect focus (because of the instantaneous feedback received as the user rack focus through the scene)
The challenge was to design an HDL based IP (hardware accelerator) that detects and displays high contrast edges of a video frame with a user defined color while maintaining the resource constraints of the available FPGA resources and the realtime constraints of the video.
Top to down approach was followed. In the initial phase of the project, focus was on designing the higher-level architecture of the system, determining the input output ports of the IP and deciding the placement of the IP in the existing AXIOM-Beta Image processing pipeline.
Further, VHDL based structures (Line buffer and Peaking kernel) were designed and instantiated to the top-level IP based on storing and processing tasks on the data. Furthermore, the structures were subdivided into various module listed below.
Demosaic kernel converts Raw Image (RG1G2B) into RGB data format
RGB to greyscale converter converts RGB values of the pixel to corresponding greyscale value and append it to the pixel data structure.
Line buffer stores the incoming two lines of the frame in an internal Distributed RAM.
Sobel kernel detects whether the pixel is an edge or not.
Threshold unit finds the high threshold edges.
VHDL based Image Focus peaking kernel implementation
The complete working as well as the results of the project are described in the presentation. The overview of the subsections of the code is described below.
Line buffer module runs two parallel units namely store and release units. The incoming valid raw pixels from the DDR memory/Image sensor are stored in a series of shift registers and a ‘code’ (representing the pixel position in the frame) is concatenated with every data vector stream. Concurrently, a window of 9 pixels is fetched from the line buffer (shift reg.), the values in the elements of the extracted window are determined by finding position of the target pixel by decoding the ‘code’ earlier concatenated with the data.
The code for line buffer module can be accessed from the following link: line_buffer.vhd
Image peaking kernel streams in the window of 9 pixels from the line buffer module. The pixels are debayered and passed through RGB to greyscale conversion. The extracted grey values are passed through multiple multiply-accumulated units as shown in the presentation so as to find the maximum gradient change in all the four directions. The threshold unit raises the peaking flag if the detected contrast change value is beyond a certain user defined threshold.
The code for peaking kernel can be accessed from the following link: sobel_kernel.vhd
Top VHDL Module/IP incorporates both the above modules under one packaged IP that can be used at both input as well as output streams of any image processing pipeline. The IP takes in raw pixel values in RG1G2B format (12 bits each)with additional 16-bit overlay data resulting in 64-bit input pixel data. This data is pipelined for (2 times width of data frame minus 1) number of stages. The output of the IP is a 65-bit data with LSB as the peaking flag and rest of the data as original raw pixel values.
The code for top module can be accessed from the following link: top_kernel.vhd
VHDL based Simulation of the IP
For simulation purposes, a 5x5 image was considered. the input stimulus i.e. the input data to the IP was forced to certain values that describes the position of the pixel in the frame. The output was observed at the required clock frequency of 200 MHz. Firstly, The output of the line buffer was checked so as to confirm whether the line buffer module is producing correct window based on the pixel position and then the final output was checked to confirm whether all the arithmetic is successfully applied on the target pixel without producing extra delay in the pipeline.
The results show that the IP successfully produces the required peaking signal for every incoming pixel data w.r.t to the position of the pixel in the frame.
The screen shot of the result is shown in the presentation.
The code for the VHDL based simulation of the top module can be accessed from the following link: top_kernel_tb.vhd
Instantiation of the IP in the current Apertus° AXIOM Beta Pipeline
The AXIOM Beta Image processing pipeline has two parts, input and output pipeline. Input pipeline streams in the raw pixel data and stores them in DDR memory via DMA type high speed AXI-data writer. Output streams fetches the data from DDR via AXI high speed data reader and encodes the pixel data as well as the timing information in TMDS so as to display the image frames over HDMI. The peaking IP was decided to be placed in the output part of the pipeline just after the high speed data reader so as to not disturb the synchronization of the pixel-data stream and the timing information stream.
The results on the screen were observed after the instantiation of the IP on the AXIOM Beta pipeline. As we increased the size of the frame width the output data started distorting which indicated loss of synchronization between timing information flow and the pixel-data flow of the pipeline. The current work is focused on identifying the point where synchronization is lost and delay the timing signals accordingly.
The simulation results show that the designed IP generates peaking information for every incoming pixel data of the video frame. Such hardware based processing of the image processing algorithm will firstly, increase the overall processing speed of the system and secondly, decrease the load on the main processor.
Firstly, my task will be to remove the synchronization bug in the current AXIOM Beta pipeline so that my IP can be smoothly instantiated in the pipeline and secondly, Video processing units like AXIOM beta faces serious hardware resource constraints, which could hamper further incorporation of efficient dedicated image processing hardware units. This could be dealt quite effectively by using run time partial reconfiguration techniques. I would like to implement different components of the current AXIOM Beta pipeline on the FPGA fabric using run time partial reconfiguration so that only the components required at the specific time will be active while others won’t be. This would not only save fabric resources but also save power.
There were three major difficulties encountered during the coding period. First, designing of line buffer module, because the position of the pixel in the frame generated different values for the window because in corner pixels some of the neighboring pixels don’t exist which makes it difficult for the designing of window fetch module, this difficulty was overcome by generating codes for different corner cases which resulted in smooth store and fetch of the pixels in the line buffer. The second difficulty was to design such an arithmetic unit that doesn't generate much delay in its own. This was overcome by targeting multiply unit and replacing it with normal shift and add operations. Third major problem was instantiating the IP to the current pipeline where the pixel data and timing information stream flow concurrently. Such an architecture poses difficult situation to any IP that by nature buffers data. The problem is currently been tackled and hopefully be resolved soon.
Work experience with Apertus° organisation
Working for an open source organisation by itself gives a great motivation as the work helps many developers across the globe. Apertus° with its motive to create open source camera is a unique concept that inspired me to work more. The mentors/colleagues were highly involved and responsive throughout the project and guided me through the project comfortably.
Work experience with Google summer of Code
The google Summer of Code provided me a global platform to interact with developers across the globe. The process helped me to improve my coding technique/style, gain in-depth knowledge on the hardware development for video processing and most importantly utilize my summer in a fruitful manner that will help many developers working in his field.
The best part was that I could code anytime and anywhere I wanted. I coded through the season while enjoying the blissful European Summers, debugged the timing violations while sipping cappuccino with croissant in front of la tour de Eiffel in Paris, designed IP submodules in the train to Amsterdam, designed simulation modules while enjoying the scenic beauty on the banks of Danube river in Budapest, designed top level architecture while sunbathing on the beautiful beaches of côte d’azure in Nice and tested the module on the AXIOM Beta camera in the Apertus° lab in Vienna.