In this six-article series from Mouser Electronics, we explore why AI is moving to the network edge and the technology that’s making it possible. The first article examined why AI needs to move to the edge. Here, we dive in to the hardware and tools that are making edge machine learning a reality.
Introduction
Artificial intelligence is primarily driven by machine learning models. Typically, these models are extremely processor-intensive, so they tend to be run in data centres. However, as we saw already, this causes issues for some AI applications. Running your ML model in the core of the network creates four issues:
- Latency. More and more AI applications need to work in real-time. These range from self-driving vehicles to real-time facial recognition. Real-time implies a maximum latency of the order of 1ms. Clearly, that’s only possible if your ML models are running at the network edge.
- Not-spots. Many industrial AI applications have to run in environments with limited or no network access. Other times, local security policies may block access to outside networks. In both these cases, your only option is edge ML.
- Security and privacy. AI applications often handle sensitive data. This ranges from biometric data (e.g. face pictures) to sensitive medical data. Laws like GDPR mean it is often easier (and more secure) to handle this sort of data locally, rather than send it across the network.
- Costs. All major cloud providers offer specialist virtual machines designed to run complex ML models at scale. However, these instances are generally quite expensive. Nowadays, it is often preferable to try and run your models at the edge, thus saving you money.
So, edge machine learning is definitely desirable. But how do you actually go about doing it? First, we need to understand what it takes to create a machine learning model.
A typical ML workflow
There are three main forms of machine learning. These are supervised learning, unsupervised learning and reinforcement learning. The most common is probably supervised learning, so we will focus on this. In all cases, you are trying to teach a computer to identify interesting patterns in data. In supervised learning, you provide a set of labelled data. The computer learns to identify the data features that match with each label. You can then get the computer to correctly label new data it sees. As a concrete example, you might provide your ML model with a million photos containing cats. The model learns to identify cats and can now use inference to tell you whether a new photo also contains a cat.
There are seven steps involved in training your model:
- Data ingestion. This consists of gathering all your raw labelled data in one location (typically a data lake in the cloud).
- Feature engineering. Here, you clean up your data in order to simplify the number of features. This includes removing features that only exist in some data, merging other features, and deleting irrelevant features.
- Model selection. Nowadays, there are many thousands of different ML models available. To get some idea of the number, take a look at the Model Zoo. Each model is optimised for a particular use. Data scientists use their knowledge and experience to select a suitable model for each dataset.
- Training. This is the key step where your model is trained. Training is an incremental process that can take some time. You typically use 70-75% of your data for training. The rest is then used for evaluation and optimisation.
- Evaluate. At this stage, you test whether your model performs well. There are a number of different metrics to assess the model. These include using a confusion matrix and the F1 score. What you must avoid is over-fitting of your model. This happens when your model becomes too good at identifying the training data and loses the ability to recognise interesting features in other data.
- Hyperparameter tuning. This involves refining the model by tweaking the hyperparameters to improve accuracy. Common approaches include gradient-descent and Bayesian optimisation.
- Deploy. The final step is to deploy your trained model. This is where things get interesting for edge ML. Traditionally, your model is deployed in a data centre or on a server. However, in edge ML, you are typically deploying it on a microcontroller.
What forms of model are suitable for edge ML?
There are several categories of ML models that are ideal for running at the edge. These include:
- Image recognition. This is important for applications including facial recognition, self-driving cars, and automated pick-and-place machines. Usually, image recognition relies on neural networks. These need processors or microcontrollers that are able to cope with parallel operations such as Intel’s Neural Compute Stick, used in this Facial Recognition project on Mouser’s Open Source Project pages.
- Anomaly detection. Here you are trying to identify anomalies in your data. For instance, you might be listening to the sound of a pump engine in a mine. An impending issue, such as a failed bearing, will create an anomaly. Your model can detect this before it becomes a failure.
- Speech-to-text + NLP. Virtual assistants, like Alexa, rely on two technologies. Speech-to-text, converts your voice into text. Natural language processing (NLP) then parses this text to extract the meaning. Alexa devices usually offload this functionality to the cloud. The only part that is run locally is identifying the wake word (e.g. “Alexa”). However, this is a real limitation for virtual assistants.
The hardware requirements for edge ML
So, now we come to the heart of this article: What are the requirements for deploying machine learning at the edge? Well, firstly, you need to choose suitable hardware. Secondly, you need a machine learning framework that is optimised to run on that hardware. Let’s start by looking at the hardware.
Processing power
As we saw in the last article, edge ML implies the need for significant processing power. Modern servers and desktop machines have sufficient power to run most models, but they are not optimised for edge operations. What is really needed is a processor designed for embedded applications but with similar power to a desktop. This rules out low-power microcontrollers and SoCs.
Power draw/efficiency
Many edge ML applications are embedded. That means you have to be aware of the power needs and overall efficiency of the system. This rules out multicore CPUs since even the most efficient Intel processors usually consume 10’s of Watts. Instead, you have to look at options such as ARM processors or ATMega microcontrollers.
Architecture
Edge machine learning requires processors that are capable of running deep neural network models. These use large numbers of parallel operations, which is why cloud instances for machine learning rely on GPUs so heavily. Here, you are forced to make a compromise when you move to the edge. GPUs are extremely power-hungry. You could try using FPGAs, but they also have issues. In practice, most people end up using a processor optimised for mobile use.
Suitable MCUs
Taking all the above into account, your edge ML project will probably end up using an MCU based on an ARM Cortex M series processor. There are a number of these on the market currently, with varying processing power and capabilities. Here is a selection of development boards you can look at:
Board | Processor | Core | Speed |
NXP i.MX RT1050 EVK | MIMXRT1052DVL6A | Cortex-M7 | Up to 600MHz |
Microchip SAM E54 Xplained Pro | ATSAME54P20A | Cortex-M4 | Up to 120MHz |
Infineon XMC4700 Relax Kit | XMC4700-F144 | Cortex-M4 | 144MHz |
SiLabs SLSTK3701A starter kit | EFM32 Giant Gecko 11 | Cortex-M4 | 72MHz |
Arduino Nano 33 BLE Sense | nRF52840 | Cortex-M4 | 64MHz |
TensorFlow Lite: enabling ML at the edge
Now you have chosen your hardware, the next step is porting your ML model to the edge. Fortunately, TensorFlow Lite makes this relatively easy. TensorFlow was originally developed by Google. They describe it as “an end-to-end open-source platform for machine learning.” TensorFlow Lite is a version optimised for low-power devices, such as mobile phones and embedded MCUs.
TensorFlow Lite allows you to perform inference in your end-device using embedded machine learning models. Importantly, it is optimised for devices with limited resources. It generates small binaries, making it ideal for embedded applications. However, it also supports hardware acceleration, allowing it to offer real-time performance. This makes it the ideal framework for ML at the edge.
The limitations of TensorFlow Lite
TensorFlow Lite only supports the most common TensorFlow operations used in inference models. This allows it to reduce the footprint of both the binary and its core framework. It achieves this by simplifying TensorFlow models, eliding and fusing some operations and mapping the result to TensorFlow Lite operations. Some operations map directly. Others have strict usage requirements before they will map. However, not every TensorFlow operation has a counterpart in TensorFlow Lite. In some cases, it is possible to include the TensorFlow operator at the cost of increasing the binary size. But there is no support at all for some operators, such as tf.depth_to_space
.
Another important observation is that not all TensorFlow Lite operations work on all data types. Normal tf.float32
, tf.int8
and tf.uint8
are always supported, but some operations don’t support tf.float16
and tf.string
. So, if your model has previously been optimised in TensorFlow, you may need to refactor it before converting it to TensorFlow Lite.
If you are hoping to create your own TensorFlow Lite models, you should fully understand the impact of these limitations. A great starting point is the relevant page on the TensorFlow Lite site.
How to deploy a TensorFlow Lite model
There are five steps to deploy a TensorFlow Lite model.
- Build a TensorFlow Model
- Convert your model to a TensorFlow Lite FlatBuffer
- Convert this FlatBuffer to a C byte array
- Integrate the C++ Library
- Deploy to your device
- Let’s look at each step in a little more detail.
Build a TensorFlow model
As we saw above, creating a machine learning model from scratch is a 7 step process, including deployment. However, there are numerous pre-compiled TensorFlow models available. One of the best sources is the Model Zoo. Here, you can download over a hundred suitable ML models. These include chatbots, object detection and speech recognition models. You can use pre-trained versions of these models, or you might choose to retrain them with your own data.
Convert to a FlatBuffer
FlatBuffers are an efficient serialised flat data structure. Unusually, they allow you to pack structured hierarchical data, while still maintaining direct access to all that data. You need to use the TensorFlow Lite Converter to convert your TensorFlow model into a FlatBuffer. The Converter also handles remapping TensorFlow operations to TensorFlow Lite equivalents.
Convert to a C byte array
Generally, microcontrollers do not offer native filesystem support. This means you need to compile your model directly into your binary. To do this, you need to use standard tools to convert the TensorFlow Lite FlatBuffer into a C byte array. For instance, you can do this using the Linux xxd command.
Integrate the C++ library
You are now ready to build your binary. This will need to import data (e.g. from sensors), perform inference with the compiled ML model and utilise the results. One of the core requirements is to import and integrate the TensorFlow Lite C++ library. This provides all the functionality to interpret and run your ML model. Many MCUs offer this library via their SDKs or as pre-compiled middleware.
Deploy the binary
The final step is to deploy the model onto your chosen microcontroller. Clearly, this step depends on your choice of platform. Once deployed, you can test the performance of the model. If needed, you can use the Model Optimization Toolkit to improve the performance or reduce the size of the binary.
Conclusions
By now I hope you are convinced that edge machine learning is both desirable and feasible. This is a rapidly evolving area of ML, and edge machine learning models are getting ever more capable. You still need to train your original model and need to be aware of the limitations of TensorFlow Lite. However, nowadays it is pretty easy to convert machine learning inference models and run them in embedded devices. Next time I will show you a practical example of creating an edge machine learning model. I will also look in more detail at some of the hardware available from suppliers like Mouser.