Here at Zyl, we have been committed in doing Deep learning operations on mobile for more than two years now. Even before CoreML, we were here. And things were not always easy to say the least… 🤕
Here’s a series of our top 3 “fun” facts when we started embedding models on device (not so fun at the time):
- For a few weeks, our app weighted +150MB on the App Store — 😱 people couldn’t download Zyl with a 4G network.
- We had a huge drop in our onboarding process as we needed to process a lot of data to show a happy memory to each new users.
- Several users reported having dropped and broken their phone screen because their device was literally burning during data processing (ok, I made this one out, but you see the point).
Now you must be wondering: why haven’t they tested the app or read articles to avoid these issues? Well, you can find many content online, and anyone with a little skill in python might think it’s easy to build a « deep learning model » for his/her next business — especially two years ago. But between tutorial and production, there is a gap.
At Zyl we live on mobile, and we stay on mobile. Because we think that your pictures are something precious, it shall not got out of your phone. So, no remote datacenter for inference. Hopefully mobile makers put a lot of efforts in bringing accelerated hardware to help with these kind of tasks.
I’ll guide you through how we tackled the challenge of running an AI platform on iOS. This is more general guidance than copy/paste code.
#1 Become mobile compliant
You really should educate any people working with data about the constraints of your platform.
Even experienced data scientist usually don’t understand constraints that define the platform in production. (Yes, your Jupyter Notebook is not gonna be integrated « as is »)
It’s like designers, you have web designers, print designers, mobile designers. One does not fit all. Same thing applies for data scientists. You have to help them become « mobile compliant ».
You need everyone to understand how it works on device. Just to name a few constraints:
- Computation is slower
- You wouldn’t want to embed a 500 Mo model
- Computation time depends on the time the user lets the app open and varies a lot
#2 Use what’s provided by Apple
Because they can do things very low level on device, you will not beat Apple when it comes to performance. So use the frameworks and tools they provide as much as possible. Even if it means you have to split your pipeline.
You made a model that can recognize John on a picture. You might have a few layers that will look for John’s round face (1) then some that will extract features (2) and finally you output a probability of John being on this picture (3). If you want to run this model on mobile, efficiently, you will have to rethink its architecture: use Apple Vision framework to discover potential faces on your pictures (1) then feed those to a smaller version of your model whose job starts at the second steps (2 & 3). Usually smaller models mean faster models (and lightweight too, very good point on mobile if you want your app to be downloadable on cellular)
#3 Preprocessing can be harder than using the model
Python (and even more, its community and all the open source libraries) is an amazing language, it allows data scientists to do complicated things very easily.
When it comes to images (or worst, videos), pre-processing operations like extracting features or swapping image encoding, can be done with a framework and a few lines of python. It’s another story when you have to replicate those steps on mobile. You don’t have ready to use tools for these kind of tasks and you might not want to import huge frameworks like (just to name one) OpenCV.
Weight is a major metric when you talk about apps for one simple reason: over 150 MB, your app needs to be downloaded using WIFI (which creates a lot of friction and a drop in your App Store conversion rates).
And last but not least, when downloading a new app, users pay attention to the size (think about 8 or 16GB iPhones).
Models tend to be big enough for it to be an issue (hello 5MB models).
Great news for us, reducing the size of a model is a hot topic in the deep learning community. Lots of tools help you reduce the size of your model (sometimes by a big factor and it’s not necessarily linked to a big loss in accuracy)
A good approach is, when available, to split your model on subtasks. Better to have lots of small models than 2 big ones if you need to reuse them.
You can also extract the pre-trained weights of your model and download them after the install of the app. But you will have to explain this to your UX teammates :
Hey ! New users will have to wait for an unknown amount of time (you know because we can never assume the quality of the phone connectivity) before enjoying the actual benefits of our app.
Solutions do exist but it makes things more complicated. You may think of shipping a smaller version of your model in the app package (with usually a very bad accuracy) then as soon as you can download the better version of your model and when it’s done swap the two without your user knowing.
(Look here and you can also find tools that will help you in this task.)
#5 Time of execution
At some point we were running multiple models on thousands of pictures (the average size of a gallery is approximatively 4800 medias, but many of our users have > 100k pictures…) and we realized that many of them where leaving before the end of the onboarding.
Digging on this issue was easy enough: running all models took around 1 min for a thousand pictures on an iPhone X. It was clearly a mistake to think that users will diligently wait. 🤦♂️
But what can you do to fix this issue ? Reducing the inference time of a model by a significant factor is a nice dream to have and accelerating the hardware part is not in our power.
We did not found a clear solution.
We had to live with it and try to mitigate it as much as possible. We worked a lot on 3 points:
- The first thing we worked on was to reduce the amount of pictures you will process through your models. To do so, we used the « fast access » datas we had on each media.
- Then we built one of our biggest piece of engineering: what we called the KTCD Keep Things Cool Dispatcher. It’s an infrastructure embedded in our mobile app that allows us to:
- Run small batches of operations and maintain consistency between runs. Because you don’t know when your app might be interrupted, you need to be good at saving progress incrementally and retry.
- Run components in the right order and handle priority.
- Help optimize horizontally between the different components (reusing memory is a good idea)
- Keep your phone cool (So the C in KTCD): it controls the throughput regarding the kind of phone you are using. Users tend to find things suspicious when their phone begins to burn while testing a new app.
3. And finally we try to be explicit with the user, with an engaging onboarding flow that keeps him interested while we run our processes.
Deep learning on mobile is a rocky path, you find yourself fighting with the OS more than you should. Here I covered our story on the iOS integration but replace CoreML by TensorFlow and you have the equivalent on Android (Yet, being able to run long running background processes on Android phone is definitely a huge convenience).
Despite all these issues, running an AI contained on a mobile phone is a major step for general acceptance. Seeing it as a small assistant that stays in your pocket is rather different than an agent connected to a remote mastermind.
Happy to discuss the topic further, if you have anything to add to this list or questions, please reach out or comment below ✌️