Simple Neural Network on MCUs

Neil Tan
Neil Tan
Feb 10, 2019 · 8 min read

Edge computing is one of those things where you have the nails and are still looking for a hammer. In an earlier post, I wrote about Why Machine Learning on the Edge is critical. Pete Warden has also shared interesting insights in Why The Future of Machine Learning is Tiny. There will be many exciting technologies coming out to accelerate the development in this space. Today, we are going to look at how to deploy a neural network (NN) on a microcontroller (MCU) with uTensor.

Checkout An End-to-End Tutorial Running Convolution Neural Network on MCU with uTensor by Dboy Liao

Image for post
Image for post
(📷: Azmi Semih OKAY on Unsplash)

uTensor (micro-tensor) is a workflow that converts ML models to C++ source files, ready to be imported into MCU projects. Why generate C++ source files? Because they are human-readable and can be easily edited for a given application. The process is as follow:

Image for post
Image for post

In this tutorial, we will be using uTensor with Mbed and TensorFlow. It covers tool installations, training of the neural network, generating C++ files, setting up Mbed projects and the deployment. Although these instructions are for Mac OS, they are applicable to other operating systems.

Requirements

Installing the Requirements

$ /usr/bin/ruby -e “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"$ brew update

Next, install a working version of GCC-arm cross-compiler:

$ brew install https://raw.githubusercontent.com/osx-cross/homebrew-arm/0a6179693c15d8573360c94cee8a60bdf142f7b4/arm-gcc-bin.rb

If you already have Python installed, skip this step. However, do not work with your system Python directly, it is a good idea to use a virtual environment.

$ brew install python3

Mbed-cli will need Mercural and Git:

$ brew install mercurial git

The installation of Mbed-CLI. Check out the official doc for more details.

$ pip3 install mbed-cli
$ mbed --version
1.10.1

Getting uTensor-cli:

$ pip3 install utensor_cgen

Finally, grab CoolTerm from Roger Meier’s website.

Setting Up the MCU Project

$ mbed new my_uTensor
$ cd my_uTensor/
$ ls
mbed-os mbed-os.lib mbed_settings.py

We will need the uTensor runtime library. It contains all the function implementations that will be linked during the compilation time.

$ mbed add https://github.com/uTensor/uTensor
$ ls
mbed-os mbed_settings.py uTensor.lib
mbed-os.lib uTensor
Image for post
Image for post

Finally, we will need some input data feeding to the neural network to verify it is working. For the purpose of demonstration, we will use a generated header file that contains the data of a hand-written digit 7.

The input-data file has been prepared for you. Download and place it in your project root:

$ wget https://gist.github.com/neil-tan/0e032be578181ec0e3d9a47e1e24d011/raw/888d098683318d030b3c4f6f4b375a64e7ad0017/input_data.h

That’s it. We will revisit these files later.

Training the Model

Image for post
Image for post
mxnet Handwritten Digit Recognition

For simplicity, we will train a multi-layer perceptron (MLP) the handwritten digit dataset, MNIST. The network architecture is shown above. It takes in 28 by 28 greyscale image of a hand-written digit, flattens it to a linear 784 input. The rest of the network is consisted of:

  • 1 input layer
  • 2 hidden layers (128 and 64 hidden units respectively)with ReLu activation functions
  • 1 output layer with softmax

The Jupyter-Notebook contains the code, or, you may prefer a plain-old Python file here: deep_mlp.py.

The script defines the MLP and its training parameters. Running the script you should see something like:

$ python3 deep_mlp.py
...
step 19000, training accuracy 0.92
step 20000, training accuracy 0.94
test accuracy 0.9274
saving checkpoint: chkps/mnist_model
Converted 6 variables to const ops.
written graph to: mnist_model/deep_mlp.pb
the output nodes: ['y_pred']

A protocol buffer that contains the trained model will be saved to the file system. It is what we will supply to uTensor-cli for C++ code generation in the next step.

$ ls mnist_model/
deep_mlp.pb

deep_mlp.pb is what we will supply to uTensor-cli for C++ code generation in the next step.

Generating the C++ Files

Here’s the fun part. Turning a graph, deep_mlp.pb, into C++ files:

  • deep_mlp.cpp
    Contains the implementation of the model
  • deep_mlp.hpp
    This is the interface to the embedded program. In this case, a function signature: get_deep_mlp_ctx(…)
    We shall see how to use this in the main.cpp later.
  • deep_mlp_weight.hpp
    Contains the weight of the neural network

The command for generating the C++ file is:

$ utensor-cli convert mnist_model/deep_mlp.pb --output-nodes=y_pred
... Applying sort_by_execution_order
... Transforming graph: mnist_model/deep_mlp.pb
... Applying quantize_weights
... Applying quantize_nodes
... Applying sort_by_execution_order
... Graph transormation done
... Generate weight file: models/deep_mlp_weight.hpp
... Generate header file: models/deep_mlp.hpp
... Generate source file: models/deep_mlp.cpp

Specifying the output node helps uTensor-cli to traverse the graph and apply optimisations. The name of the output node is shown in the training message in the previous section. It depends on how the network is setup.

Compiling the Program

  • Mbed OS
  • uTensor library
  • Generated C++ model
  • Input data header file

All we need is a main.cpp to tie everything together:

The Context class is the playground where the inference takes place. The get_deep_mlp() is a generated function. It automatically populates a Context object with the inference graph, takes a Tensor class as input. The Context class, now contains the inference graph, can be evaluated to produce an output tensor containing the inference result. The name of the output tensor is the same as the output node’s as specified by your training script.

In this example, the static array defined in the input_data.h is being used as the input for inferencing. In practice, this would be buffered sensor data or any memory-block containing the input data. The data is arranged in row-major layout in memory (same as any C array). The application has to keep the input memory-block safe for during inferencing.

Now, we can compile the whole project by issuing:

$ mbed compile -m K66F -t GCC_ARM

The Mbed-cli needs to know what board it is compiling for, in this case, K66F. You may want to update this to the target name of your board. We are also using a custom build profile here to enable C++11 support. Expect to see the similar compilation message:

...
Compile [ 99.9%]: uTensor_util.cpp
Compile [100.0%]: quantization_utils.cpp
Link: my_uTensor
Elf2Bin: my_uTensor
| Module | .text | .data | .bss
|--------------------|-----------------|-------------|--------------
| CMSIS_5/CMSIS | 68(+68) | 0(+0) | 0(+0)
| [fill] | 505(+505) | 11(+11) | 22(+22)
| [lib]/c.a | 69431(+69431) | 2548(+2548) | 127(+127)
| [lib]/gcc.a | 7456(+7456) | 0(+0) | 0(+0)
| [lib]/m.a | 788(+788) | 0(+0) | 0(+0)
| [lib]/misc | 248(+248) | 8(+8) | 28(+28)
| [lib]/nosys.a | 32(+32) | 0(+0) | 0(+0)
| [lib]/stdc++.a | 173167(+173167) | 141(+141) | 5676(+5676)
| main.o | 4457(+4457) | 0(+0) | 105(+105)
| mbed-os/components | 16(+16) | 0(+0) | 0(+0)
| mbed-os/drivers | 844(+844) | 0(+0) | 0(+0)
| mbed-os/hal | 1472(+1472) | 4(+4) | 68(+68)
| mbed-os/platform | 3494(+3494) | 260(+260) | 221(+221)
| mbed-os/rtos | 8313(+8313) | 168(+168) | 6057(+6057)
| mbed-os/targets | 7035(+7035) | 12(+12) | 301(+301)
| models/deep_mlp1.o | 148762(+148762) | 0(+0) | 1(+1)
| uTensor/uTensor | 7995(+7995) | 0(+0) | 10(+10)
| Subtotals | 434083(+434083) | 3152(+3152) | 12616(+12616)
Total Static RAM memory (data + bss): 15768(+15768) bytes
Total Flash memory (text + data): 437235(+437235) bytes
Image: ./BUILD/K66F/GCC_ARM/my_uTensor.bin

Flashing the Binary

Image for post
Image for post
  • Connect your board
  • Locate the binary under ./BUILD/YOUR_TARGET_NAME/
    GCC_ARM/my_uTensor.bin
  • Drag and drop it into the Mbed DAPLink mount point (shown in the picture)
  • Wait for the transfer to complete

Getting the Output

Image for post
Image for post
  • Fire up CoolTerm
  • Go to Options
  • Click on Re-Scan Serial Ports
  • Select the Port to usbmodem1234, this may vary every time you reconnect the board.
  • The baud rate is 115200, reflecting the configuration in the main.cpp
  • Click OK
  • Click Connect

Press the reset button on your board. You should see the following message:

Simple MNIST end-to-end uTensor cli example (device)Predicted label: 7

Congratulations! You have successfully deployed a simple neural network on a microcontroller!

Closing Remarks

  • Fully Connect Layer (MatMul & Add)
  • Convolutional Layer
  • Pooling
  • ReLu
  • Softmax
  • ArgMax
  • Dropout

I believe edge-computing is a new paradigm for applied machine learning. Here, we are presented with unique opportunities and constraints. In many cases, machine learning models may have to be built from the stretch, tailoring toward low-power and specific use-cases. We are constantly looking into ways to drive the advancement in this field.

On behalf of the uTensor team, thank you for checking this tutorial out! My twitter handle is @neil_the_1. You may find us at the uTensor Slack as well.

Special thanks go to Kazami Hsieh, Dboy Liao, Michael Bartling, Jan Jongboom and everyone who has helped the project along the way!

Image for post
Image for post

Hackster Blog

Hackster.io,

Neil Tan

Written by

Neil Tan

Developer Evangelist at Arm, Software and Hardware Hacker, Creator of uTensor, Robotics and Physics Enthusiast

Hackster Blog

Hackster.io, an Avnet community, is the world’s largest network for hardware & software developers. With 1 million members and 17,000+ projects, beginners and professionals can learn and share how to build robotics, industrial automation systems, AI-powered machines, and more.

Neil Tan

Written by

Neil Tan

Developer Evangelist at Arm, Software and Hardware Hacker, Creator of uTensor, Robotics and Physics Enthusiast

Hackster Blog

Hackster.io, an Avnet community, is the world’s largest network for hardware & software developers. With 1 million members and 17,000+ projects, beginners and professionals can learn and share how to build robotics, industrial automation systems, AI-powered machines, and more.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store