TensorFlow Lite is Going to Space
A guest article by Jacob Manning (University of Pittsburgh)
Space Computing and CubeSats
The Space, High-performance and Resilient Computing (SHREC) Center is a national research consortium of the Industry-University Cooperative Research Centers (IUCRC) at the National Science Foundation (NSF). SHREC consists of more than 30 industry, government, and academic partners. The lead site for SHREC is located at the University of Pittsburgh.
One of the primary research areas of SHREC at Pitt is small satellites including CubeSats (defined in “units”, or “U”, of 1,000 cm³ units). A 1U CubeSat is approximately the size of a softball. CubeSats and all spacecraft must tolerate harsh environmental conditions, including vibration during launch, large temperature swings, vacuum, and cosmic and solar radiation. Thus, electronic devices on spacecraft often feature slow, radiation-hardened (RadHard) components to improve system reliability in these harsh environments.
One of the most recent and significant achievements of SHREC is the research, development, verification, and deployment of their hybrid and reconfigurable flight computer, called CSP, which combines RadHard with commercial off-the-shelf (COTS) components, fixed- and reconfigurable-logic circuits, and fault-tolerant computing. This strategy aims to reduce overall system cost and to significantly improve performance while also maintaining a high degree of reliability. CSP (shown below) features a Xilinx Zynq-7020 processor, which includes an Artix-7 Field-Programmable Gate Array (FPGA) and a dual-core ARM Cortex A9 CPU.
In collaboration with the Department of Defense (DoD) and its Space Test Program (STP), as well as the NASA Goddard Space Flight Center (GSFC), an experiment with two CSPs is featured on the STP-H5 mission. In February 2017, STP-H5 was launched on SpaceX CRS-10, and its CSP experiment has been operating successfully ever since, under control from a ground station in SHREC at Pitt
Machine-Learning Space Applications
In an effort to demonstrate the capability of the CSP platform, a core area of research for SHREC is space-computing applications. With the rise in popularity and capability of machine learning (ML), SHREC began exploring ML applications for space in 2017.
As part of the STP-H5/CSP system, an Earth-observing camera is included for use by the dual CSP units for sensing and processing. The full-resolution images are 2448x2050 pixels, but connection speeds to Earth are in the range of tens of kilobits, and thus these images take a great amount of time to downlink from space. Additionally, the images downloaded are often not necessarily interesting. Due to the day effect of the ISS’s day and night cycle, many images downloaded were either entirely black or washed out and not useful. We saw this behavior as an opportunity to use machine learning for image classification on-board CSP. Our goal was to identify images on-board our system and to download interesting full-resolution images while discarding useless ones. As an experiment to test the utility of modern image classification methods and to demonstrate the compute capability of CSP, we used TensorFlow to construct a convolutional neural network (CNN) to classify our images.
Over the last year, we have downloaded approximately 8,000 image thumbnails (489x410 pixel images) from our camera on STP-H5/CSP. The images were used to create a small dataset to train image-classification models. Our images depict one of five classes: black, clouds/water, distorted, land, or white (shown below). A black image indicates capture has occurred during the night phase of the ISS’s orbit. A white or distorted image may indicate incorrect camera exposure settings due to changes made by another experiment aboard the STP-H5 pallet.
Transfer Learning with TensorFlow Hub
Because our STP-H5/CSP dataset is limited, we opted to use transfer learning to re-train deep CNNs pre-trained on ImageNet. This codelab demonstrates how to use transfer learning to re-train a model on an unseen dataset. Transfer learning was a crucial part of our training process as we did not have sufficient data, time, or compute to train deep CNNs from scratch.
Furthermore, TensorFlow Hub allowed us to re-train many CNN architectures and compare accuracy results rapidly. Using the same retraining code as that in the codelab, a TensorFlow Hub module is used as the starting point for transfer learning.
python retrain.py \
— image_dir ~/flower_photos \
— tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/2
We focused our study on four architectures: MobileNetV1, MobileNetV2, Inception-ResNetV2, and NASNet Mobile. All four architectures performed decently — each architecture achieved greater than 93% top-1 accuracy and greater than 99% top-2 accuracy.
Why TensorFlow Lite?
Unlike x86-based systems, most ARM systems (like those on CSP) do not have a supported pre-built TensorFlow package (i.e. installation isn’t as simple as
pip install tensorflow). Furthermore, on embedded systems with limited memory and compute, the Python frontend adds substantial overhead to the system and makes inference slow.
Before the release of the TensorFlow Lite developer preview in November 2017, we were exploring the use of the full TensorFlow C++ library. To determine the feasibility of CNNs on embedded hardware we trained a simple LeNet-5 model for the MNIST dataset and benchmarked the performance of the TensorFlow C++ library on CSP. After the release of TensorFlow Lite, we benchmarked the performance of TensorFlow Lite on the same MNIST example.
The main drawback of vanilla TensorFlow C++ on CSP was high memory usage. TensorFlow Lite provides faster execution and lower memory usage compared to vanilla TensorFlow.
Freezing, Converting, and Running the Trained Model with TensorFlow Lite
After training and testing a model with TensorFlow, there are three steps to deploying the trained model with TensorFlow Lite: freezing the trained model, converting to the TensorFlow Lite model format, and writing a C++ program to execute the converted model.
Freezing a model refers to removing training operations from the TensorFlow graph and serializing the trained weights.
freeze_graph — input_graph=/tmp/mobilenet_v1_224.pb \
— input_checkpoint=/tmp/checkpoints/mobilenet-10202.ckpt \
— input_binary=true \
— output_graph=/tmp/frozen_mobilenet_v1_224.pb \
TensorFlow and TensorFlow Lite use different model formats (Protocol Buffers and Flat Buffers, respectively), so TensorFlow models must be converted to TensorFlow Lite format before they can be run using TensorFlow Lite.
— output_file=/tmp/mobilenet_v1_1.0_224.tflite \
— graph_def_file=/tmp/mobilenet_v1_0.50_128/frozen_graph.pb \
— input_arrays=input \
Finally, to run the TensorFlow Lite model, the model and input data must be loaded, and then the input data is fed through the model. The best demonstration of this process is the built-in example for ImageNet models. This code can be modified as necessary with different paths for data, models, etc., and demonstrates the boilerplate for interacting with the TensorFlow Lite API. Once finished, the TensorFlow Lite Makefile can be modified to compile the program (following the structure of the
MINIMAL_BINARY example to link against the TensorFlow Lite library).
TensorFlow Lite on CSP
We deployed our trained MobileNetV1 models on CSP and benchmarked inference performance.
CSP ran the smallest MobileNetV1 variant (width multiplier 0.25 and image resolution 128x128 px) at 11 FPS (89 ms per image classification) while using only 8 MB of memory. The largest MobileNetV1 variant (width multiplier 1.0 and image resolution 224x224 px) ran at less than one FPS (1383 ms per image classification) while using 41 MB memory. We found that the width multiplier affected execution time and memory usage more than the image resolution.
TensorFlow Lite makes state-of-the-art deep learning accessible to embedded, on-board space processing systems, such as CSP. Additionally, TensorFlow Hub simplifies the transfer learning process and allows quick prototyping of deep-learning models.
In the future, SHREC researchers have plans to explore many embedded machine-learning applications including semantic segmentation for space, hyperspectral image classification, swarm logic for drones and CubeSats, and generative compression. For more information about SHREC, please visit https://www.nsf-shrec.org. For more information about this specific research, the full paper is available online.