Communication is Key: How to Create, Test and Deploy Machine Learning Models on Mobile

Developing ML models on mobile is very different to desktop. It takes a new approach to communication between mobile engineers and data scientists. Here’s how it’s done.

BCG Digital Ventures - Part of BCG X

Published in

BCG Digital Ventures Engineering

7 min readJul 19, 2019

By Kristina Georgieva and Valeriia Vagner, Senior Engineers at BCG Digital Ventures

Creating and running machine learning (ML) models on different mobile devices is far more complex than running them in desktop environments, and requires answering a host of questions.

These can be split into three main areas, each with a series of secondary questions:

How will the model be executed on the device?
- What kind of library/technology is suitable?
- Is it possible to convert the current model to the required format?
- How will mobile engineers understand what to feed the model, and what result should they expect?
How will the model perform on the device?
- Does the model provide the expected result?
- If not, why is it not performing as expected?
- If additional algorithms are used in addition to the ML model output, are they performing as expected and have they been implemented correctly?
Does the model fulfil performance and security requirements?
- How big is the model?
- Where will the model be stored on the device?
- How will the model be distributed, and how will it receive updates?

So, many questions…

As a data scientist and a mobile engineer, we noticed that neither of us alone had the answers to all of these questions. It was therefore vital that we created a common language and testing strategy to help us communicate efficiently and answer all the relevant questions. By creating this framework, we were able to consider the best way for us to answer these questions and, through this, implement ML models effectively on mobile devices. Here’s how we did it.

Communication is Key

When implementing ML models on mobile, data scientists and mobile engineers have different requirements: Mobile engineers need to have what they need to use the model on the mobile device, while data scientists need to understand exactly what’s happening with the model when it’s let out into the wild.

There are two important points that need to be effectively communicated and clear to both data scientist and mobile engineer: The parameters that will be used to access the model (like input and output node names for a neural network), and the order of input data (should data be fed as [minimum, maximum, median] or [median, minimum, maximum], for example).

The next consideration is how to build a thorough testing framework, addressing whether the model is being used correctly on the device and evaluating any additional algorithms implemented around the results of the model, which need to be re-implemented on the native application.

Finally, you need a feedback loop between mobile engineer and data scientist, so that the latter can gain a full understanding of what the model is actually doing in the real world.

Creating the Model

What does this mean in practice?

Although pair programming is one solution to the communication issue, it does not help us when we want to be able to experiment fast by updating models often. We can agree on parameters and ordering of data input, while still allowing for fast experimentation, by providing a description of our models and how they can be used when we create them in a format our applications can understand.

The following JSON configuration example demonstrates the type of information that can be communicated about the model:

With the above configuration, the mobile application can be configured to dynamically swap models, without the need for a mobile engineer to adapt the code and release a new version.

This allows mobile engineers to build more reliable apps. In our case, we had many input parameters that needed to be considered in order to run the model: What kind of raw information to gather, the size of the buffer, what kind of math operation to apply to the raw data, in which order to feed to the model, what to expect from the model, how to interpret the result etc.

This was way too much information for the client. But by running our “model configuration” approach, we were able to solve various problems. There were several advantages:

Easy to unit test and change. Mobile client just needed to get the info from the model configuration and use it for each step of the model running. It provides all the information already — you just need to map it. So, you need just agree on the format of configuration, implement the mapping and test it. Easy to unit test, easy to change. No need to change a host of parameters all over the code.
Easy to distribute. The configuration file could be distributed together with the ML model via the same channel
Easy to iterate. Everything is on the place already. Want to try a new model? Just adjust the configuration — it’s as easy as that.
Easy to communicate. As data scientists and mobile developers both use the model configuration, they can simply open the file and make changes together. You talk the same language here. You both understand the changes and how they will affect the final result.

One of the most useful of these points is the ability to iterate with ease. This means providing the information needed by the mobile engineer to write an automated test and checking that the model results are what we expect them to be. In order to write an automated test, the mobile engineer needs to know the expected output given a specific set of inputs. The data scientist can provide this via a file containing both values, for example a CSV that looks as follows:

The same approach can be used to solve the fourth point, which refers to the testing of additional algorithms that need to be implemented around the model results. If you want to avoid implementing additional algorithms, you could try:

Inserting the additional functionality in the model file (after all, the model is a set of computations)
Writing the algorithms in Kotlin Native, which allows us to share the same code across various platforms

In our case, we needed to implement additional algorithms on mobile devices. As the data scientist is responsible for creating an algorithm, mobile engineers should implement it correctly on the device. In order to make sure that the algorithm has been implemented correctly, we decided to go with an approach that involved testing each algorithm against the same static set of the data.

This led us to a “test first” approach — you first implement the new version of the algorithm in a test environment, making sure both mobile engineer and data scientist have the same results and implement only after these have been confirmed. This approach also provided unexpected positive consequences — the mobile engineer started collaborating with the data scientist in designing appropriate algorithms!

As you have algorithms in your testing environment, with a good architecture you can replace real data with static data and run quick experiments on the algorithm. This allows you to run different versions of the algorithm in parallel, compare them and share the results with your colleagues.

There was only one negative in our approach — we implemented the same algorithm three times. On one occasion, data scientists created an algorithm, and iOS and Android each implemented it to their respective clients. So the same code written three times, tested three times, and explained by a data scientist twice. The better approach here would be to write the algorithm once, test it once and then distribute to the mobile clients. Kotlin/Native is suitable for this task, as it’s possible to compile Kotlin code to jvm (Android) and native (iOS framework).

Lastly, this communication channel must not be the only one. It is important for the data scientist to also gain an understanding of what is happening on the mobile device. This can be done through either:

Analytics: Events fired when certain actions take place on the mobile device
Special logging: This can refer to JSON files sent from the mobile device to a service, which stores them for the data scientist to analyze. This approach can be useful in experimental applications, where more data needs to be gathered to understand the process taking place on the mobile device than the average user would be happy with.

Deploying the Model

When we talk about deploying a model, we refer to the need of the final model file and configuration to be available via a service. Through this, mobile devices can call the endpoint to download the latest version and dynamically begin to use it. For this, the following needs to be considered:

What format of the model file the mobile device can consume (for example TensorFlow Lite for Android and coreML for iOS)
How the versioning of the models will work, and therefore how will the mobile device know that a model should be downloaded and replaced

There are various options for deploying the model. The most convenient approaches from the mobile client perspective are “Custom service” or “Firebase”:

Conclusion

Defining this format of ML configuration allowed us to improve communication in the team, iterate faster on the model and algorithms and set up testing processes in a way that helps to adjust our approach to the real world scenarios. It could be used together with models distribution via server or just locally on the devices. Skipping the server distribution part might be justifiable for multiple reasons, but the configuration part made our life so much easier. We’ll definitely start with this approach on a new project from the very beginning in the future.

Interested in working with us at BCGDV? Want to find out more? See our current vacancies.

Find us on Twitter @DV_Engineering.