Inferencing is the second phase of machine learning, following on from the initial training phase. During the training phase, the algorithm generates a new model or repurposes a pre-trained model for a specific application and helps the model learn its parameters. During the inferencing phase, predictions and decisions on new data are made – based on the learned parameters.

Learning requires a significant amount of time, computation power and electricity. In contrast, the inferencing phase requires less processing and draws less power too. However, the traditional way of computing in the central cloud may be just too resource-intensive for IoT devices. Each IoT node residing at the edge collects large datasets, making edge-to-cloud (and conversely cloud-to-edge) data transfer expensive and slow. Instead of relying on the cloud-based servers to do all the processing, “computing at the edge” performs most calculations directly and only transfers relevant information back to the cloud (and vice versa) when completely necessary. While computing at the edge reduces data transfer costs and time, this model also has certain drawbacks. For example, the need for IoT devices to be power-efficient runs contrary to the hefty amount of processing power that learning and inferencing demand. This is a problem that accelerators for AI edge computing can potentially address.

AI Accelerators

Both hardware- and software-based AI accelerators expedite machine learning. Hardware acceleration can target training, inferencing or possibly both. In some instances, the hardware can reduce the power requirement. In other cases, the hardware can improve the processing capacity. Several main types of chips, or processing units, exist for hardware acceleration. These include central processing units (CPUs), graphics processing units (GPUs), field-programmable gate arrays (FPGAs), system-on-chips (SoCs), application-specific integrated circuits (ASICs), vision processing units (VPCs) and neuromorphic ICs. In addition to hardware acceleration, solutions on the market also comprise software-based approaches – like machine learning frameworks for improving AI software development and optimizing system performance.

CPUs and GPUs

CPUs are what AI traditionally uses. While CPUs are designed to be all-purpose, they are often inadequate in supporting the massive calculations used in model generation and inferencing. In response, companies including ARM (with its DynamIQ product offering) and Samsung (with its Exynos 9 series), have started making AI-specific chips. While ARM and Samsung have chosen to stick with AI-specific CPUs, others are shifting toward GPUs.

Originating in the video gaming industry and built for processing massive datasets, GPUs are a good match for machine learning. Because GPUs have more processing units per chip and higher throughput, plus more parallel processing capability than CPUs, they cut down computation time significantly. In addition, a GPU’s single processing unit weighs less than the multiple units CPUs use, making GPUs a better fit for constrained IoT devices, which require small and nimble components. The companies that are making AI-specific GPUs include AMD (Radeon Instinct), NEC (SX-Aurora), NVIDIA (DGX) and Qualcomm (Adreno).


While CPUs and GPUs have considerable processing power at their disposal and are effective for accelerating learning and inferencing, they spend a lot of time and energy moving data between memory and processing. Since CPUs and GPUs are densely packed with circuits, they can often overheat and cause system failures. For remotely located IoT devices, the combination of high energy consumption and potential system failures is far from ideal. It makes sense to find a way to offload some tasks to more energy-efficient hardware.

Based on programmable logic, FPGAs are a type of IC that can be reconfigured by customers or designers in the field after production. While generally not as powerful as CPUs or GPUs, FPGAs offer fast processing for some calculations (such as multiplication, addition, integration, differentiation and exponentials) by computing inside the chip instead of transferring data. Although an FPGA offers more flexibility, it tends to be quite bulky, so miniaturization for IoT devices is a challenge for this type of chip. The major companies that offer AI-targeted inference chips include NVIDIA (TensorRT) and Xilinx. Also, Microsoft is using FPGA chips to accelerate inference, and Intel is currently expanding its FPGA portfolio.


SoCs contains can contain a combination of electronic components (microprocessors, microcontrollers, digital signal processors, on-chip memory, hardware accelerators, etc.). Due to the integration of the components onto a single semiconductor substrate, a SoC is more powerful than a microcontroller chip. In a smartphone, the SoC might integrate video, audio and image processing capabilities. ARM has developed its Machine Learning Processor and its Object Detection Processor – and these will be incorporated into SoCs in the future. HiSilicon, a Huawei-backed company, has licensed the IP from ARM to to make SoCs that are seeing preliminary utilization in phone handsets and tablets. Also, HiSilicon is making the Ascend chips for Huawei. Another big player in the SoC space is Arteris, which is developing a network-on-chip interconnect fabric technology (FlexNoC) that many mobile and wireless companies are using. Because Arteris holds a dominant position in IP, it has a bird’s-eye view of the space. Other companies likely to soon be making a play in the AI SoC market include Intel (via its Movidius subsidiary), NXP, Renesas, Toshiba, Texas Instruments and STMicroelectronics.

ASICs, VPUs and Neuromorphic Chips

ASICs are specifically built for accelerating the training of deep learning algorithms, with examples including Google’s Edge TPU and Intel’s Nervana. A vision processing unit (VPU) is designed to accelerate machine vision tasks and run machine vision algorithms, such as convolutional neural networks (CNNs) – so VPU video processing capabilities differ from those of a GPU, which does not offer the same type of task-specific processing. Examples of VPUs include Intel’s Movidius Myriad chips, Google’s Pixel Visual Core, Microsoft’s HoloLens, Inuitive’s NU series and Mobileye’s EyeQ.

Digital chips and analog chips have their respective deficiencies: digital circuitry is precise but gobbles energy, while analog circuitry keeps both latency and energy consumption low but lacks precision. Therefore, researchers are looking for ways to combine the technical advantages of digital and analog chips while sidestepping the weaknesses. Inspired by the human brain, neuromorphic chips are designed to adhere to what is essentially a digital architecture, but use analog circuitry for mixed-signal processing. IBM’s TrueNorth is a neuromorphic processor targeting sensor data pattern recognition and intelligence tasks. Also, Columbia University, Stanford University’s ‘Brains in Silicon’ project, and the DARPA-backed University of Michigan IC Lab are all working on various aspects of neuromorphic system implementation.

Machine Learning Frameworks

AI accelerators also include software. For example, machine learning frameworks, which can be interfaces, libraries or tools, help reduce the complexity associated with machine learning so that developers can build models and optimize performance more quickly and easily. Such frameworks are built to specific languages, like Python or Java. Some of the most popular open-source machine learning frameworks come from Amazon (AWS), Apache, Caffe2, Keras, Theano, Microsoft (Azure) and Google (TensorFlow). Also, some companies offer in-house platforms. For example, Intel’s OpenVINO toolkit is a software and hardware accelerator that optimizes inference with CNN models. In addition, Qualcomm’s Snapdragon is a mobile platform and a software accelerator, IBM has its Watson machine learning accelerator platform, and Huawei has recently launched its MindSpore AI framework.

First Steps with AI

Mouser now offers various items of hardware that can form the initial building blocks for AI implementation. Intel’s plug-and-play Neural Compute Stick 2 can aid engineers with early prototyping of deep neural networks. It relies on the company’s Movidius X VPU to deliver a compelling mix of power efficiency and performance – attaining 4TOPS. Targeted at industrial computing, the highly compact AAEON UP AI Core processing module is based on the mini-PCI Express format. It also features an Intel Movidius VPU (this time the Myriad 2 2450 – with 512MBytes of DDR memory, plus 12 VUW programmable SHAVE cores and dedicated vision accelerators all built in). The Gumstix Aerocore 2 board employs an array of NVIDIA Jetson TX1 and TX2 CUDA cores to give it strong parallel processing capabilities, along with an ARM Cortex-M4 microcontroller and numerous peripherals. It is particularly well suited to object recognition, production line inspection and various other kinds of machine vision.

Looking to the Future

With NVIDIA remaining dominant in industrial AI applications, most newcomers are focusing on the IoT AI space. GreenWave and Reduced Energy Microsystems are in the low-power chip arena, while Mythic and Syntiant are developing battery-powered processors. Similarly, Wiliot is making a Bluetooth chip that can be powered by ambient radio frequencies. In the massive parallel data processing space, there are Vathys, Graphcore, Cerebras and Wave Computing. Meanwhile, Hailo Technologies and Horizon Robotics are working on specialized chips for autonomous vehicles. In the deep learning space, BrainChip has made the first spiking neural processor, Thinci has rolled out a streaming graph processor, and Gyrfalcon is developing a deep learning processor with proprietary AI processing in memory (APiM) technology. Lastly, at Groq, the ex-Googlers who designed Google’s TPU are developing a chip with ultra-low latency. As the field of machine learning witnesses astonishing progress, many technical challenges remain for IoT edge computing – with hardware and software developers continuing to reach for a superior processing performance/energy efficiency balance.


Do you want to be involved in the Mouser coding challenge? Click here!


Mouser Electronics is a worldwide leading authorised distributor of semiconductors and electronic components for over 800 industry-leading manufacturers. They specialise in the rapid introduction of new products and technologies for design engineers and buyers. Mouser Electronics extensive product offering includes semiconductors, interconnects, passives, and electromechanical components.

About the author

Mark Patrick joined Mouser Electronics in July 2014 having previously held senior marketing roles at RS Components. Prior to RS, Mark spent 8 years at Texas Instruments in Applications Support and Technical Sales roles and holds a first class Honours Degree in Electronic Engineering from Coventry University.

#ai ·
#IoT ·