Why the glittering future of Machine Learning is Tiny

7 min readJan 5, 2023

By Ishara Neranjana — Associate Machine Learning Engineer

Machine Learning (ML) has advanced significantly over the past twenty years from a primarily academic field to a pervasive commercial technology that supports a variety of industries.

It enables developers to enhance business processes and employee efficiency through data-driven automation. This dynamic and vigorous discipline of computer science has influenced almost every digital product we use today, including social media, cell phones, automobiles, home appliances, and more.

There are infinite use cases for which using ML would be invaluable, yet there are difficulties in implementing it. This is due to most cutting-edge machine learning applications consuming high computational resources available to host the inference. Many machine learning applications are now restricted to cloud deployments, where high-performance computer resources are easily accessible.

Tiny Machine Learning, often known as TinyML, is a result of this pursuit of finding ways to make ML inference easier on smaller, more resource-constrained devices so that it can grow in scope and open up new avenues for numerous applications.

What is TinyML?

TinyML is a fast-growing field of machine learning that focuses on ML applications that perform on-device sensor data analytics on extremely low-power hardware. TinyML aims to push machine learning to its absolute limit by enabling embedded devices powered by batteries and microcontrollers to respond instantly to ML commands.

*Figure 01: Positioning of TinyML with its main subdomains*

The figure shown above illustrates how TinyML is positioned along with its related subdomains.

Why is TinyML evolving fast?

There are numerous factors contributing to TinyML’s rapid development. The scalability is made possible by inference on inexpensive embedded devices, and its low power consumption permits operation in off-grid, isolated areas. TinyML is an excellent candidate for local ML tasks that were once incredibly expensive, like distributed sensor networks and predictive maintenance systems in industrial manufacturing settings.

Applications for TinyML are limitless and are growing as the field matures. The distinctive benefit from this approach mostly derives from placing ML near the sensor, at the source of the data stream. TinyML enables a wide range of novel applications that are not even possible with regular ML implementations. Because of the bandwidth, latency, economics, reliability, and privacy constraints of standard ML server inference architecture.

Whether you are aware of it or not, TinyML is probably a part of your daily life in some capacity. An example of a TinyML application in daily life is the audio wake-word detection model used inside Google and Android devices. In order to “turn on” when they hear the words “OK Google,” Android devices use a 14 KB speech detection ML model that runs on a digital signal processor (DSP). The same can be said for many other virtual assistants.

The general procedure of a wake word application is shown below in Figure 02.

First, it obtains the audio and followed by the features of that sample, and then uses the Tensorflow model to see what the input is and then responds according to that signal. When the user says the pre-defined wakeword, the application will detect it and trigger some sort of action, such as activating a virtual assistant or opening an app.

*Figure 02: Components for a wake-word application (source*)

Hardware used in TinyML Applications

TinyML is amazing in that it tries to operate on some quite underwhelming hardware. In some ways, the main objective is to carry out ML inference with the least amount of power possible. As opposed to most other machine learning applications, TinyML does not rely on graphic processing units (GPUs), application-specific integrated circuits (ASICs), or microprocessors for computation.

There is some specific hardware that supports the use of TinyML. The figure below shows Raspberry Pi Pico, Arduino Nano 33 BLE, and STM 32 boards.

*Figure 03: Components used in TinyML applications*

The applications of TinyML use less capable computational gear, such as microcontrollers (MCUs) and digital signal processors, to achieve the challenge of less than 1 mW of power usage. These systems can be anticipated to have clock rates in the tens of MHz, less than a few hundred KB of RAM, and comparable amounts of flash. Apart from these sensors (such as a camera or microphone) and potential BLE (Bluetooth Low Energy), connectivity can be found on a TinyML device.

TinyML Software — TensorFlow/ PyTorch

The software that powers the ideas and tools underlying TinyML is its most crucial component in many ways. In general, TensorFlow Lite for Microcontrollers is the most well-known and developed ecosystem for TinyML development (TFLite Micro).

The goal of TFLite Micro, which focuses on microcontroller units (MCU), was to perform ML on devices with low resources.

The Tensorflow Lite for Microcontroller sample workflow is shown below in Figure 04.

*Figure 04: The TensorFlow Lite Micro workflow (source*)

PyTorch Mobile is also an open-source machine-learning framework for mobile platforms and is compatible with TinyML.

Optimization techniques for compressing neural networks

Modern precision in visual identification tasks like image classification and object detection is made possible by deep neural networks. The current tendency is toward deeper and more closely coupled structures, although modern networks already contain millions of learned connections.

This presents a problem for the implementation of cutting-edge artificial neural networks (ANN) on devices with limited resources, like smartphones or mobile applications. There are various methods for compressing the models of neural networks to increase their usability on embedded devices. These are some of the methods such as quantization, pruning is used to reduce model complexity while preserving accuracy.

Quantization

This is the process of constraining an input from a continuous or otherwise large set of values (such as real numbers) to a discrete set (such as integers).

Pruning

This is the process of removing parameters from a neural network that already exists. This can entail removing a single element or a group of elements, such as neurons, or it can involve both. By using this approach, the network’s efficiency will be increased while maintaining its accuracy.

Figure 05 shows the impact on the neural network after the pruning process.

Knowledge distillation

This is the process of transferring knowledge from a huge, cumbersome model or series of models to a single, more manageable model that can be used in practice under restrictions found in the actual world.

Steps to deploy a TensorFlow Lite for Microcontrollers Application

Machine learning models may be executed on microcontrollers and other devices with only a few kilobytes of RAM due to TensorFlow Lite for Microcontrollers. On an Arm Cortex M3, the core runtime is only 16 KB in size but can execute a variety of simple models. It does not need any conventional C or C++ libraries, dynamic memory allocation, or operating system support.

A 32-bit platform is needed to run TensorFlow Lite for Microcontrollers, which is built in C++ 11. It has undergone extensive testing with a variety of processors built on the Arm Cortex-M Series architecture, and it has been ported to additional architectures, such as ESP32. An Arduino library for the framework is offered. It can also produce projects for development environments like Mbed. It is open source and is compatible with all C++ 11 projects.

These are the steps that are required to deploy and run a TensorFlow model on a microcontroller:

1. Train a model

Generate a small TensorFlow model that can fit the target device and contains supported operations. We can use the TensorFlow Lite converter to convert the TensorFlow model into a format that can be run on a microcontroller.

Convert to a C byte array using standard tools to store it in a read-only program memory on the device.

Transfer the compiled model to the microcontroller. This can typically be done over a USB connection or by flashing the model onto the microcontroller’s memory.

2. Run inference on a device using the C++ library and process the results

The limitations of microcontroller development are specifically addressed with TensorFlow Lite for Microcontrollers. The standard TensorFlow Lite framework may be simpler to incorporate if you are working on more capable hardware (for instance, an embedded Linux device like the Raspberry Pi).

Wrapping up

TinyML is a promising technology that offers several advantages over traditional machine learning approaches. By enabling the deployment of machine learning models on low-power, resource-constrained devices, TinyML has the potential to unlock new applications and opportunities for machine learning in the Internet of Things and other domains.

To use TinyML, developers can leverage existing tools and frameworks, such as TensorFlow Lite, to train and deploy machine learning models on tiny devices. As technology continues to evolve and improve, it is likely to play an increasingly vital role in the future of machine learning and the broader field of artificial intelligence.