On-device machine learning is a recent breakthrough that enables machine intelligent features to run on users’ personal devices. A few years back, deep learning changed the paradigm for machine learning (ML), and with that we saw many new products emerge in various verticals including health care, education, industrialization, and finance. Most of these are possible because of the unlimited compute of the cloud, along with the intersection of a vast amount of data. However, this has also led to thoughts and questions on the challenges around privacy and data governance, and it’s opened a greenfield for various companies looking to bring the intelligence closer to the data, on users’ devices. On-device ML also brings other benefits like real-time performance and personalization, and therefore is an important and natural space for growth in the ML world, despite its associated challenges.
The industry is witnessing a lot of innovations across the stack around on-device ML, from new hardware to applications. However, for an enterprise starting fresh in this space, it is a struggle to architect a solution that can scale across millions of devices with different versions and different usages. These are similar to the challenges we saw pre-cloud when we were distributing software to customers; it became very difficult to upgrade and maintain various versions. Once again, we find ourselves facing this problem.
Here at Adobe, given our vast number of applications and the supported devices, this challenge has been even more daunting. Let’s start by walking through some of the challenges we’ve faced when implementing on-device ML in our offerings. The figure above shows some of the popular libraries and interfaces used for on-device ML.
The challenge of framework choices
One of the key challenges we faced was to come up with a solution that spans several frameworks, with each one providing its own set of features and benefits. There is no common intermediate layer, and to add to that, the ecosystem is still evolving with inference libraries getting more mature and technologies like TVM and NNAPI still emerging.
To address these challenges, our approach was to build a solution that wouldn’t lose agility but would support cross-platform work. This solution is what we call the Adobe Sensei On-Device Framework. We created an abstraction layer to allow experimenting with evolving technology and quick iterations. Cross-platform portability, platform-specific optimization, and performance were all key requirements — so we built a common C++ SDK that used a specific h/w layer optimization and ML framework combo for a given device. For example, for devices using Windows operating systems, we bundled WinML, whereas for devices on iOS we used Core ML. The top applications use this SDK to enable ML features, and the SDK takes care of encapsulating all the details of the different frameworks.
This has helped us build a solution that uses state-of-the-art ML while at the same time giving us the flexibility to change the underlying framework without needing to push new releases for applications. It’s definitely future-proofing us in an ever-changing landscape.
The challenge of model conversion
Most of the training and research for ML models is still done using Pytorch, Caffe, and Tensorflow frameworks, and these models can reach up to hundreds of megabytes in size. Downloading and using such big models is not feasible for devices. Therefore, the models need to be converted into a device friendly format like CoreML, TFLite, or WinML, which also bring the added bonus of better inference performance. However, these device frameworks only support basic operators, and the moment you try to convert Pytorch or Tensorflow models, they start to fail at various levels — from operators not being found, to compilation mistakes, to big sizes, and more.
We had to develop a pipeline for conversion that automates parts of the problem. However, many parts of the pipeline are still manual and need intervention at various points to do the conversion and compression. The open-source conversion tools out there don’t work for all frameworks and have limited support around compression.
The challenge of device configuration
Another large challenge we faced had to do with device versioning. Users have different device preferences, ranging from the iPhone 6 to iPhone 11 to the Samsung Galaxy 7 to Pixel 3. Many of these users never upgrade their devices and are running older versions of their OS.
Most of the frameworks worked on the latest versions of various operating systems, but since our customers use all various versions we need to support all of them. We had to either compile them on older versions or find alternate solutions for implementing compatibility between systems.
Similarly, not all Android vendors implement the NNAPI in the same way as Google, and TFLite cannot be used as per this reference. Therefore, we must create a pipeline for solving this problem at scale — a project that is still underway.
The challenge of project execution
We are seeing lots of interest from our product teams in integrating this SDK to deliver ML on- device, but a big win for our project was the execution of this project. We created an internal open-source project with different teams coming together to contribute. It was very important for people from different domains, like those that work with images, videos, or documents across the company, to come together to contribute and build something that can be reused and scaled.
This resulted in many features that are reusable across teams. Getting different teams on the same page and dealing with tradeoffs between features was a big win for all of us.
This is just a start
Still, we’ve only solved part of the equation. We are still in the middle of building a full-fledged conversion pipeline and working on various areas in the research around this. As we embark upon this journey further, our goal is to make it easier for AI/ML features to be available across all surface types at scale. Stay tuned to see what we accomplish next!