Introduction to deep learning on mobile

Key considerations when getting started with deep learning as a mobile app developer.

NanoGiants
Published in
6 min readJul 19, 2021

--

Technological progress and innovation have changed our everyday life in countless ways. While for previous generations the TV was a life-changing innovation, today a child cannot notice anything special about taking a photo with its smartphone and sharing it on social media with people around the world . As it obviously does not take long until we take improvements for granted, it is important for developers to keep up the pace with bleeding edge technologies.

During recent decades, machine learning has become more and more popular. Based on the pace of its development, it seems to be the next technology capable of impressing people and will certainly impact many aspects of our everyday life in the future.

There is a wide range of use cases for machine learning in mobile apps. The technology can solve computer vision problems such as image classification, image segmentation or object recognition. A famous example is Google Lookout, an app helping blind people to discover their surroundings. Furthermore, the latest smartphone cameras make use of deep learning to reduce noise in pictures taken at low light, improve image quality and apply filters. Another large field of application is natural language processing with text generation, keyword extraction or automatic translation of text. An example for audio processing is the app BirdNET that recognizes birds from sounds. Above all, deep learning is used to personalize content for users. It suggests interesting articles to read, products that fits the user’s needs or its next favorite song.

Generally speaking, machine learning gives insights in large amount of data, making new functionalities possible in mobile applications that were inconceivable just a few years ago .

Supervised Learning

The large field of machine learning can be divided into supervised, unsupervised and reinforcement learning. This article only focuses on supervised learning with multi-layer (deep) neural networks because most mobile related tasks use this approach and many beginner-friendly tutorials can be found.

Supervised learning is divided into two steps: During the training phase a neural net learns without being explicitly programmed. Therefore, it iterates over a dataset with pairs of input data and expected output (ground truth). For each pair, the neural network makes a prediction, compares it to the ground truth, calculates an error and uses back propagation to optimize the neural network accordingly. After many iterations the trained neural network (also called model) is used in applications to make predictions for real data, this phase is called inference. After all, the model is refined, retrained and deployed frequently based on insights from its performance in the real live application.

Restrictions of mobile devices

Deep learning tasks are usually performed on servers with powerful GPUs, allowing to parallelize the training with datasets such as ImageNet with more than 150 GB or running large models quickly and efficiently. In comparison, mobile devices have a lot of hardware restrictions.

The smartphone battery life is limited and apps that negatively impact it are likely deleted by the user. The available computational power (e.g. Apple A14 Bionic, Qualcomm Snapdragon 888, Samsung Exynos 2100), RAM (e.g. iPhone 12 4GB, Galaxy Z Fold2 12GB) and amount of storage (up to 512 GB) has been increasing significantly during the last years . While this is impressive, most everyday apps do not need more computational power and a slowdown of this trend is probable. More importantly, not every user of your target group has the latest device. Therefore, the overall application architecture and often the neural network itself needs to be adapted accordingly to these restrictions of mobile devices. Dependent on your requirements, the available time and budget there are three common architecture patterns.

Architecture — Cloud only

The simplest solution is a client-server architecture, where training and inference is both performed on the server. The smartphone only sends data (e.g. an image) that should be analyzed and receives back the result (e.g. the classification label).

While the neural network for the server can be developed from scratch there are also APIs providing purchasable ready-to-use solution for all kind of problems allowing to create applications in a short amount of time. Examples for vision related tasks are Microsoft Cognitive Services, Google Cloud Vision or Amazon Recognition. The app TapTapSee is using this approach with the CloudSightApi.

The main advantage of this architecture is its independence of the client. There are no special requirements for the smartphone (or any other clients such as web browser or IoT device), as the heavy computing is done in the cloud. But there are also disadvantages. An internet connection is always necessary to use the app and the connection quality determines how fast your application is. With the expansion of 5G networks in the future this might become less relevant. Another aspect to consider is privacy and trust because data (for example photos taken by the users) does not stay on the user’s device.

Architecture — Smartphone and Cloud

Another approach is splitting the computational work between client and server. The training of a neural network is very demanding on resources. Large amount of data needs to be stored and hyperparameter tuning as well as hundreds of iterations over the whole dataset are necessary until the model performance (e.g. accuracy, loss) is sufficient, even on powerful GPUs. So, the training is done on the server and the final trained model is contained in the mobile app for inference on the device. The application Google Lookout is based on this architecture.

Various frameworks are available to implement inference on a smartphone. TensorFlow Lite is an open-source solution developed by the Google Brain Team. It should not be confused with TensorFlow mobile, their former approach. Facebook developed the open-source framework PyTorch Mobile which is currently still in beta status. It seems to be the successor of Facebooks’ Caffe2. Apple provides CoreML to integrate machine learning models into apps, but in contrast to its competitors it is not open source. With GoogleML there is also a ready-to-use kit for specific machine learning functionalities that are performed on the device containing already trained models.

This architecture has the advantage to be independent of any internet connection (only metadata is sometimes transmitted to monitor the networks performance in real life tasks). Running the computation directly on the device produces faster results and there are less privacy concerns.

However, the hardware restrictions of smartphones need to be considered. The model should be as small as possible and performant. A lot of research has already been done on optimize neural networks for mobile. Without going into much detail, examples that could be a first orientation for network structures are the Convolutional Neural Networks MobileNet , ShuffleNet , EfficientNet and SqueezeNet. Furthermore, most frameworks provide ready-to-use optimization techniques such as quantization, which maps continuous values (e.g. weights in the network) to a finite range of discrete values to reduce their storage size.

Architecture — Smartphone only

Finally, it is also possible to perform both training and inference on the mobile device. This is rarely used and exceeds the scope of an introduction article to deep learning. However, an interesting example can be found on the TensorFlow Blog where a model is partially trained on the device via transfer learning.

Summary

All in all, there are different approaches how to benefit from deep learning in a mobile app, the decision is always a tradeoff and of course dependent on your previous experience in deep learning. This article barely scratched the surface giving an overview about the topic.

Now it’s your turn to get started with your first deep learning app.

--

--