• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Codemotion Magazine

We code the future. Together

  • Discover
    • Events
    • Community
    • Partners
    • Become a partner
    • Hackathons
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
    • Manifesto
  • Companies
  • For Business
    • EN
    • IT
    • ES
  • Sign in
ads

Adam TaylorSeptember 30, 2021

Creating Embedded Vision Systems

AI/ML
facebooktwitterlinkedinreddit
Table Of Contents
  1. Elements of a PL Image Processing System 
    • Image Capture
    • Algorithm 
    • Output Pipeline
  2. Software-Defined Image Processing 
    • Bare Metal
    • PYNQ 
    • PetaLinux
  3. Conclusions

Embedded vision systems are on the rise. There is something special about seeing an image that you have created on a display. That display could be demonstrating a simple, transparent image showcasing an image sensor’s capability.

Alternatively, it could be implementing an advanced image processing solution that identifies and classifies objects or tracks movement in the image. 

Recommended article
allucinazioni
May 21, 2025

AI Hallucinations: Who Controls the Past Controls the future

Arnaldo Morena

Arnaldo Morena

AI/ML

Of course, with the correct sensor selection, we can push the range of vision beyond the visible range into the infrared or X-ray elements of the EM spectrum.

Implementing image processing algorithms for embedded vision systems is computationally intensive, especially as image resolutions increase beyond HD and move to 4K.

A color HD image of 1920 pixels by 1080 lines using a 30-bit pixel has to process 3.73 Gbps to achieve 60 frames per second. Moving to 4K resolution, which has 3840 pixels and 2160 lines with a 30-bit pixel and 60 frames per second, requires significantly more at 14.92 Gpbs. 

Therefore, each stage of the image processing algorithm must be able to support this data rate to achieve the desired frame rates, even when doing complex calculations on each pixel. 

The truly parallel nature of programmable logic provides an ideal technology to implement image processing pipelines. 

The parallel nature frees the developer from the sequential software world where each stage of the image processing algorithm must be implemented in sequence. In programmable logic, the algorithm’s elements run in parallel, enabling increased throughput and more deterministic performance – both critical for many image processing applications that use embedded vision to interact safely with the environment. ADAS or vision-guided robotics are two good examples.

Developers can get the best of both worlds by using a heterogeneous SoC such as the Zynq-7000 SoC or Zynq UltraScale+ MPSoC. 

These devices combine programmable logic with high-performance Arm processors, providing significant flexibility because the image processing algorithms can be implemented within the programmable logic. 

While the processing system can provide the image processing algorithm configuration to allow easy adaption to new image sensors or requirements, it can also implement high-level algorithms that take the output from the image processing system. 

Elements of a PL Image Processing System 

Implementing an image processing system in programmable logic is not as daunting as it first may seem. The image processing pipeline can be broken down into three distinct elements: image capture, algorithm, and output pipeline.

Image Capture

The image capture pipeline connects directly with the image sensor or camera. As such, the image capture interfaces externally to the programmable logic over HDMI, SD/HD/UHD-SDI, MIPI or Parallel / LVDS. 

Thanks to the flexible nature of programmable logic IOs, most standards can be implemented using the IO structures without the need for an external PHY. 

To help capture the image, Xilinx provides a range of IP cores in the Vivado IP library that enables the image to be captured and made ready for further processing – possibly required to obtain a useable image for the image processing algorithm. 

This additional image processing may require color filtering (debayer) to convert raw pixel values to RGB pixels. 

The image capture phase may also include gamma correction, noise filtering, and color space conversion. In adaptive systems, the input video timing and resolution is detected to enable the image processing system to configure itself for the video format received automatically. 

An example image capture pipeline can be seen in Figure 1 below, showing a MIPI CSI-2, demosaic (debater), and frameBuffer to write to PS DDR memory. 

Figure 1 – Example Image Capture Pipeline

Algorithm 

This is the actual implementation of the image processing algorithm. In many cases, it consists of several stages of image processing algorithms, each one connected to the next stage using an AXI Stream. 

These IP blocks may be provided by the Xilinx Vivado IP library, including IP blocks that can scale images up or down and layer video layers on top of each other, as demonstrated in Figure 2.

Alternatively, they can be implemented using a hardware description language or high-level synthesis, enabling higher-level languages such as C/C++ to implement image processing algorithms. 

Using a higher-level language enables developers to leverage the vision domain Xilinx Vitis accelerated libraries. These libraries provide several advanced image processing functions like filters, bounding boxes, bad pixel correction, warp transformation, and stereo block matching.

If the image needs to be made available to the processor system for higher-level algorithms, for example, Video Direct Memory can be used to transfer the video stream to the PS DDR. This transfer can also operate to transfer data from the processor system to the programmable logic. Such PS-PL transfers can be used to provide an overlay on the image, presenting information on the display if required.

Figure 2 – Image Processing Video Mixer mixing live video with a test pattern.

Output Pipeline

Once the image has completed the algorithm pipeline, the processed image is output to the appropriate display. This could be MIPI-DSI, HDMI, SD/HD/UHD-SDI, or traditional parallel video with pixel clock, V Sync and H Sync. In the programmable logic, the developer needs to convert the processed image from AXI Stream format into the correct output format.

The video must also be re-timed for output. Just like with the image capture and algorithm pipeline, the Vivado IP library contains the necessary IP cores to generate the output video in the correct format. Figure 3 below shows a typical output pipeline. 

A frame reads from the PS DDR passes data to the AXI Stream to Parallel Video output, operating under the control of the video timing controller.

Figure 3 – Example Output Pipeline

It is possible to move the image into the processing system DDR memory during the algorithmic processing, allowing the processing system to perform high-level algorithms on the processed image contents. The software can further process the image and output it back into the image processing stream if so desired. 

Software-Defined Image Processing 

Several approaches can be undertaken when working with the image in the processing system. Regardless of the approach taken, the image processing implemented within the programmable logic is highly configurable by the developed application software.

Bare Metal

Bare metal developments are often used as an initial stage in developing the image processing system. They enable developers to quickly and easily demonstrate the programmable logic design and enable the image sensor to be correctly configured. 

This allows for the creation of a transparent path that displays the captured image on the selected display. The bare metal application does not include the complexity of an embedded Linux stack. 

As such, it is beneficial for debugging and commissioning the design using the Internal Logic Analysers (ILA) and memory views and the debugging capabilities to inspect register contents. 

PYNQ 

The open-source PYNQ framework enables developers to leverage the power of Python to control IP within the programmable logic thanks to several PYNQ APIs, drivers, and libraries. 

Thanks to these PYNQ provisions, developers can focus on algorithm development because the PYNQ frameworks include drivers for most AXI connected IP blocks on the programmable logic. PYNQ runs on an Ubuntu-like distribution allowing developers to focus on the image processing algorithms using OpenCV and other popular Python frameworks.

 It enables focus on the algorithm development using real-world sensors, including seeing the limitations of the sensor under different conditions and the impact on the implemented algorithm. 

Once we know what the algorithm is, we can implement the functionality in the programmable logic using Xilinx IP, HLS or the Vitis accelerated library function. 

PetaLinux

An embedded Linux solution may be considered if higher-level algorithms or communication is needed. 

In this instance, PetaLinux can be used in conjunction with the Video4Linux (V4L) and GStreamer packages to create higher-level image processing algorithms. This framework may also be used if USB3 cameras are used that are connected to the processing system. 

Using PetaLinux enables the developer to make use of the Vitis AI flow and the Xilinx Deep Learning Processor Unit (DPU). This aids the acceleration of the machine learning inference into the programmable logic using the DPU and supporting frameworks such as TensorFlow, Caffe, and PyTorch. 

Conclusions

The parallel nature of programmable logic frees the developer from the sequential world of traditional software image processing solutions, thereby enabling higher resolutions and frame rates for better embedded vision systems.

Creating a software configurable image processing pipeline can be implemented using the Xilinx Vivado IP library, HLS, and Vitis accelerated libraries. 

Related Posts

AI mesh architecture

Agentic Mesh Architecture: A Scalable Approach to AI in the Enterprise

Codemotion
April 28, 2025
chatbot, artificial intelligence, AI

The hidden cost of AI – and why it matters

Gloria de las Heras Calvino
April 14, 2025

The Rise of Vibe Coding: Beyond the Hype and the Hate

Codemotion
April 3, 2025

Lost in Translation: A Humorous Look at AI Hype, Bad Content, and Algorithmic FOMO

Diego Petrecolla
March 25, 2025
Share on:facebooktwitterlinkedinreddit

Tagged as:IoT Machine Learning

Adam Taylor
Adam Taylor is an expert in design and development of embedded systems and FPGA’s for several end applications. He is the author of numerous articles and papers on electronic design and FPGA design, a Chartered Engineer, Fellow of the Institute of Engineering and Technology, Visiting Professor of Embedded Systems at the University of Lincoln and Arm Innovator. He is also the owner of the engineering and consultancy company Adiuvo Engineering and Training.
Think 2021, the mindset behind an OpenShifting architecture
Previous Post
How MDOTM built a Production-ready Architecture for AI-Driven Investment Modelling and Analysis
Next Post

Footer

Discover

  • Events
  • Community
  • Partners
  • Become a partner
  • Hackathons

Magazine

  • Tech articles

Talent

  • Discover talent
  • Jobs

Companies

  • Discover companies

For Business

  • Codemotion for companies

About

  • About us
  • Become a contributor
  • Work with us
  • Contact us

Follow Us

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions