Embedded vision systems are on the rise. There is something special about seeing an image that you have created on a display. That display could be demonstrating a simple, transparent image showcasing an image sensor’s capability.
Alternatively, it could be implementing an advanced image processing solution that identifies and classifies objects or tracks movement in the image.
Of course, with the correct sensor selection, we can push the range of vision beyond the visible range into the infrared or X-ray elements of the EM spectrum.
Implementing image processing algorithms for embedded vision systems is computationally intensive, especially as image resolutions increase beyond HD and move to 4K.
A color HD image of 1920 pixels by 1080 lines using a 30-bit pixel has to process 3.73 Gbps to achieve 60 frames per second. Moving to 4K resolution, which has 3840 pixels and 2160 lines with a 30-bit pixel and 60 frames per second, requires significantly more at 14.92 Gpbs.
Therefore, each stage of the image processing algorithm must be able to support this data rate to achieve the desired frame rates, even when doing complex calculations on each pixel.
The truly parallel nature of programmable logic provides an ideal technology to implement image processing pipelines.
The parallel nature frees the developer from the sequential software world where each stage of the image processing algorithm must be implemented in sequence. In programmable logic, the algorithm’s elements run in parallel, enabling increased throughput and more deterministic performance – both critical for many image processing applications that use embedded vision to interact safely with the environment. ADAS or vision-guided robotics are two good examples.
Developers can get the best of both worlds by using a heterogeneous SoC such as the Zynq-7000 SoC or Zynq UltraScale+ MPSoC.
These devices combine programmable logic with high-performance Arm processors, providing significant flexibility because the image processing algorithms can be implemented within the programmable logic.
While the processing system can provide the image processing algorithm configuration to allow easy adaption to new image sensors or requirements, it can also implement high-level algorithms that take the output from the image processing system.
Elements of a PL Image Processing System
Implementing an image processing system in programmable logic is not as daunting as it first may seem. The image processing pipeline can be broken down into three distinct elements: image capture, algorithm, and output pipeline.
Image Capture
The image capture pipeline connects directly with the image sensor or camera. As such, the image capture interfaces externally to the programmable logic over HDMI, SD/HD/UHD-SDI, MIPI or Parallel / LVDS.
Thanks to the flexible nature of programmable logic IOs, most standards can be implemented using the IO structures without the need for an external PHY.
To help capture the image, Xilinx provides a range of IP cores in the Vivado IP library that enables the image to be captured and made ready for further processing – possibly required to obtain a useable image for the image processing algorithm.
This additional image processing may require color filtering (debayer) to convert raw pixel values to RGB pixels.
The image capture phase may also include gamma correction, noise filtering, and color space conversion. In adaptive systems, the input video timing and resolution is detected to enable the image processing system to configure itself for the video format received automatically.
An example image capture pipeline can be seen in Figure 1 below, showing a MIPI CSI-2, demosaic (debater), and frameBuffer to write to PS DDR memory.
Algorithm
This is the actual implementation of the image processing algorithm. In many cases, it consists of several stages of image processing algorithms, each one connected to the next stage using an AXI Stream.
These IP blocks may be provided by the Xilinx Vivado IP library, including IP blocks that can scale images up or down and layer video layers on top of each other, as demonstrated in Figure 2.
Alternatively, they can be implemented using a hardware description language or high-level synthesis, enabling higher-level languages such as C/C++ to implement image processing algorithms.
Using a higher-level language enables developers to leverage the vision domain Xilinx Vitis accelerated libraries. These libraries provide several advanced image processing functions like filters, bounding boxes, bad pixel correction, warp transformation, and stereo block matching.
If the image needs to be made available to the processor system for higher-level algorithms, for example, Video Direct Memory can be used to transfer the video stream to the PS DDR. This transfer can also operate to transfer data from the processor system to the programmable logic. Such PS-PL transfers can be used to provide an overlay on the image, presenting information on the display if required.
Output Pipeline
Once the image has completed the algorithm pipeline, the processed image is output to the appropriate display. This could be MIPI-DSI, HDMI, SD/HD/UHD-SDI, or traditional parallel video with pixel clock, V Sync and H Sync. In the programmable logic, the developer needs to convert the processed image from AXI Stream format into the correct output format.
The video must also be re-timed for output. Just like with the image capture and algorithm pipeline, the Vivado IP library contains the necessary IP cores to generate the output video in the correct format. Figure 3 below shows a typical output pipeline.
A frame reads from the PS DDR passes data to the AXI Stream to Parallel Video output, operating under the control of the video timing controller.
It is possible to move the image into the processing system DDR memory during the algorithmic processing, allowing the processing system to perform high-level algorithms on the processed image contents. The software can further process the image and output it back into the image processing stream if so desired.
Software-Defined Image Processing
Several approaches can be undertaken when working with the image in the processing system. Regardless of the approach taken, the image processing implemented within the programmable logic is highly configurable by the developed application software.
Bare Metal
Bare metal developments are often used as an initial stage in developing the image processing system. They enable developers to quickly and easily demonstrate the programmable logic design and enable the image sensor to be correctly configured.
This allows for the creation of a transparent path that displays the captured image on the selected display. The bare metal application does not include the complexity of an embedded Linux stack.
As such, it is beneficial for debugging and commissioning the design using the Internal Logic Analysers (ILA) and memory views and the debugging capabilities to inspect register contents.
PYNQ
The open-source PYNQ framework enables developers to leverage the power of Python to control IP within the programmable logic thanks to several PYNQ APIs, drivers, and libraries.
Thanks to these PYNQ provisions, developers can focus on algorithm development because the PYNQ frameworks include drivers for most AXI connected IP blocks on the programmable logic. PYNQ runs on an Ubuntu-like distribution allowing developers to focus on the image processing algorithms using OpenCV and other popular Python frameworks.
It enables focus on the algorithm development using real-world sensors, including seeing the limitations of the sensor under different conditions and the impact on the implemented algorithm.
Once we know what the algorithm is, we can implement the functionality in the programmable logic using Xilinx IP, HLS or the Vitis accelerated library function.
PetaLinux
An embedded Linux solution may be considered if higher-level algorithms or communication is needed.
In this instance, PetaLinux can be used in conjunction with the Video4Linux (V4L) and GStreamer packages to create higher-level image processing algorithms. This framework may also be used if USB3 cameras are used that are connected to the processing system.
Using PetaLinux enables the developer to make use of the Vitis AI flow and the Xilinx Deep Learning Processor Unit (DPU). This aids the acceleration of the machine learning inference into the programmable logic using the DPU and supporting frameworks such as TensorFlow, Caffe, and PyTorch.
Conclusions
The parallel nature of programmable logic frees the developer from the sequential world of traditional software image processing solutions, thereby enabling higher resolutions and frame rates for better embedded vision systems.
Creating a software configurable image processing pipeline can be implemented using the Xilinx Vivado IP library, HLS, and Vitis accelerated libraries.