A guest article posted by the VSCO Engineering team
At VSCO, we build creative tools, spaces, and connections driven by self-expression. Our app is a place where people can create and edit images and videos, discover tips and new ideas, and connect to a vibrant global community void of public likes, comments, and follower counts.
We use machine learning as a tool for personalizing and guiding each user’s creative process. VSCO has a catalog of over 160 presets, which are pre-configured editing adjustments to help our users transform their images, including emulations of old film camera stock.
However, our research suggested members were overwhelmed by the number of presets and stuck to using the few favorites they knew and liked instead of trying new presets.
Our challenge was to overcome decision fatigue by providing trusted guidance and encouraging discovery, while still leaving space for our users to be creative in how they edit their images. Our Imaging team thoughtfully curated presets that complement different types of images, allowing us to deliver a personalized recommendation for each photo. We decided to solve this problem by suggesting presets for images with on-device machine learning using deep convolutional neural network (CNN) models because these models can understand a lot of the nuances in the images, making categorization easier and faster than traditional computer vision algorithms.
With all this in mind, we developed the “For This Photo” feature, which uses on-device machine learning to identify what kind of photo someone is editing and then suggest relevant presets from a curated list. The feature has been so loved by our users that “For This Photo” is now the second most used category after “All”, which shows all presets.
Video of “For This Photo” in action
To understand the flow of how “For This Photo” feature works when a user starts editing an image, we will walk through the steps that take place. When a user loads the image in Edit View, inference via the model is instantly kicked off and the model returns a category for the image. The category ID is then matched with the ID in the cached catalog and a list of presets for that category is retrieved. Six presets are picked, which are a combination of free presets as well as presets that are part of VSCO membership to be shown to the user in the “For This Photo” section. Showing free and paid presets was important to us in order to provide value to our non-members. The presets that are part of VSCO membership can still be previewed by non-members to provide them context on the value of VSCO membership. For members, it has the benefit of educating them on the type of images their member entitlement presets can be used for.
On-device ML ensures accessibility, speed, and privacy
From the get-go, we knew server based machine learning was not an option for this feature. There were three main reasons we wanted this feature to use on-device ML: offline editing, speed, and privacy.
First, we did not want to limit our members’ creativity by offering this feature only when they were online. Inspiration can strike anywhere — someone could be taking and editing photos with limited network connectivity, in the middle of a desert, or high up on a mountain. A large percentage of our user base is outside the United States, so not everyone has access to high-speed internet at all times.
Second, we wanted to ensure editing would be fast. If we offered “For This Photo” with the ML model in the cloud, we’d have to upload users’ images to categorize them (which takes time, bandwidth, and data), and users would have to download the presets, making the process slow on a good connection and impossible on a poor one. Doing the ML on-device means that everything happens locally, quickly, with no connection required. This is crucial for helping users capture the moment and stay in the creative flow.
Third, the editing process is private. A server-side solution would require us to upload user photos while the users were still editing them, before they published them. We wanted to be cognizant of our users’ privacy in their creative process.
Knowing that we wanted to do on-device machine learning with our custom model, TensorFlow Lite was an obvious choice due to the ease of taking a model trained on the server and converting it to a model compatible for the phone (.tflite format) by using the TFLiteConverter.
Also, we had already experienced the success of TensorFlow and TensorFlow Serving in production systems for our server-side ML. TensorFlow’s libraries are designed with running ML in production as a primary focus so we felt TensorFlow Lite will be no different.
We used ML Kit to run inference on a TensorFlow Lite model straight-forward and incorporating it into our app very seamless. This enabled us to take the feature from prototype to production-ready pretty quickly. ML Kit provided higher level APIs for us to take care of initializing and loading a model as well as for running inference on images without having to deal with the lower level TensorFlow Lite C++ libraries directly, making the development process much faster and leaving us with more time to hone our model instead.
Overview of VSCO’s Machine Learning Stack
For our ML stack, we use TensorFlow for deep learning on images and Apache Spark for shallow learning on behavioral data.
In production, we have a real-time cloud-based inference pipeline using TensorFlow Serving that runs every image uploaded to VSCO through various convolutional neural networks in real-time. The inference results of these models are then used for product features like Related Images, Search, Just For You and other sections in Discover, etc. For on-device machine learning, we use the mobile friendly parts of the TensorFlow ecosystem — ML Kit and TensorFlow Lite — we’ll get into the details of this part of our stack in the next section.
We also have a Spark-based recommendation engine that allows us to train models on large datasets from different sources in our data store: image metadata, behavioral events and relational data. We then use the results of these models to serve personalized recommendations in various forms, for example: user suggestions.
To power other parts of our ML pipeline, we use Elasticsearch for search and relevance features, Apache Kafka for log-based distributed streaming of data that serves as input for all our ML models, and Kubernetes for deploying all our micro-services. The languages we use include Python, C++, Go, Scala, and for on-device integrations, we use Java/Kotlin and Swift/Obj-C.
On-device ML: How “For This Photo” Works
Step One: Categorizing Images
In order to build a model that serves the “For This Photo” feature, we wanted to be able to first assign a category to the image and suggest presets that were designed to work well for that category. The diagram below depicts the process for categorizing an image:
We started with image data tagged by our in-house human curators. These curators are photography experts who have a front-row seat to our user behavior, so they know better than anyone what type of content is being shared and what trends are developing. They helped our engineering team come up with image categories for our model, which include art, portrait, vibrant, coastal, nature, architecture, light and shadow, monochrome, and many more. As the saying goes, 90% of machine learning is about cleaning up your data. These steps helped take care of making sure the data our model was based on was reliable.
Using the categorized dataset we created with our human curators, we trained a CNN model in TensorFlow based on SqueezeNet architecture. We chose this architecture because of its smaller size without much loss in accuracy. We converted this trained model from TensorFlow’s Saved Model format to TensorFlow Lite (.tflite) format using the TFLiteConverter for use on Android. Some of the causes of our initial bugs in this stage were due to the mismatch in the version of TFLiteConverter we used compared to the version of TensorFlow Lite library that ML Kit had a reference to via Maven. The ML Kit team was very helpful in helping us fix these issues as we went along.
Once we had a model that could assign categories to images, we were able to bundle it into our app and run inference on images with it using ML Kit. Since we were using our own custom trained model, we used Custom Model API from ML Kit. For better accuracy, we decided to forgo the quantization step in model conversion and decided to use a floating point model in ML Kit. There were some challenges here because ML Kit by default assumes a quantized model. However, with not much effort, we were able to change some of the steps in model initialization to support a floating point model.
Step Two: Suggesting Presets
The next challenge was to suggest presets based on these categories of images. We collaborated with our in-house Imaging team who had created these presets to come up with lists of presets that work well for images in each of these categories. This process included rigorous testing on many images in each category where we analyzed how each preset affected various colors. At the end, we had a curated catalog with presets that are mapped to each category.
These curated catalogs are ever-evolving as we add new presets to our membership offering. In order to make it easier for us to update these lists on the fly without the users having to update their app, we decided to store these catalogs on the server and serve them using an API, a microservice written in Go which the mobile clients can check-in with periodically to make sure they have the latest version of the catalogs. The mobile clients cache this catalog and only fetch when there is a new version of the catalog available. However, this approach creates a “cold start” problem for users who do not connect to the internet before trying out this feature for the first time, not giving the app an opportunity to talk to the API and download these catalogs. To solve this, we decided to ship a default version of these catalogs with the app. This allows all users to be able to use this feature regardless of their internet connectivity, as was one of the goals of this feature to begin with.
Results and Conclusion
With “For This Photo”, we accomplished our goal to make editing with presets easier to navigate. We believe that if our members don’t find value in getting new presets, their creative progress is being hindered. We wanted to do better. We wanted to help more users not just discover new presets, but zero in on those presets that best matched what they were working on.
We want to continue to improve “For This Photo” to provide recommendations based on other image characteristics as well as the user’s community actions (e.g. follows, favorites and reposts). In addition, we also want to provide greater context for those recommendations and to encourage our community of creators to discover and inspire each other.
As we look forward to the future of this feature, we are also reflecting on the past. We recognize that this feature and VSCO’s on-device ML capabilities would not have been possible without TensorFlow Lite and ML Kit. We are excited to continue to invest in this area and build more features leveraging this technology in the future.