Building a scalable machine vision pipeline
Kevin Jing | Pinterest engineering manager, Visual Discovery
Discovery on Pinterest is all about finding things you love, even if you don’t know at first what you’re looking for. The Visual Discovery engineering team at Pinterest is tasked with building technology that will help people to continue to do just that, by building technology that understands the objects in a Pin’s image to get an idea of what a Pinner is looking for.
Over the last year we’ve been building a large-scale, cost-effective machine vision pipeline and stack with widely available tools with just a few engineers. We faced two main challenges in deploying a commercial visual search system at Pinterest:
- As a startup, we needed to control the development cost in the form of both human and computational resources. Feature computation can become expensive with a large and continuously growing image collection, and with engineers constantly experimenting with new features to deploy, it’s vital for our system to be both scalable and cost-effective.
- The success of a commercial application is measured by the benefit it brings to the user (e.g., improved user engagement) relative to the cost of development and maintenance. As a result, our development progress needed to be frequently validated through A/B experiments with live user traffic.
Today we’re sharing some new technologies we’re experimenting with, as well as a white paper, accepted for publication at KDD 2015, that details our system architecture and insights from these experiments and makes the following contributions:
- We present a scalable and cost-effective implementation of a commercially deployed visual search engine using mostly open-source tools. The tradeoff between performance and development cost makes our architecture more suitable for small-and-medium-sized businesses.
- We conduct a comprehensive set of experiments using a combination of benchmark datasets and A/B testing on two Pinterest applications, Related Pins and an experiment with similar looks, with details below.
Experiment 1: Related Pin recommendations
It used to be that if a Pin had never before been saved on Pinterest, we weren’t able to provide Related Pins recommendations. This is because Related Pins were primarily generated from traversing the local “curation graph,” the tripartite user-board-image graph evolved organically through human curation. As a result, “long tail” Pins, or Pins that lie on the outskirts of this curation graph, have so few neighbors that graph-based approaches do not yield enough relevant recommendations. By augmenting the recommendation system, we are now able to recommend Pins for almost all Pins on Pinterest, as shown below.
Figure 1. Before and after adding visual search to Related Pin recommendations.
Experiment 2: Enhanced product recommendations by object recognition
This experiment allowed us to show visually similar Pin recommendations based on specific objects in a Pin’s image. We’re starting off by experimenting with ways to use surface object recognition that would enable Pinners to click into the objects (e.g. bags, shoes, etc.) as shown below. We can use object recognition to detect products such as bags, shoes and skirts from a Pin’s image. From these detected objects, we extract visual features to generate product recommendations (“similar looks”). In the initial experiment, a Pinner would discover recommendations if there was a red dot on the object in the Pin (see below). Clicking on the red dot loads a feed of Pins featuring visually similar objects. We’ve evolved the red dot experiment to try other ways of surfacing visually similar recommendations for specific objects, and will have more to share later this year.
Figure 2. We apply object detection to localize products such as bags and shoes. In this prototype, Pinners click on objects of interest to view similar-looking products.
By sharing our implementation details and the experience of launching products, we hope visual search can be more widely incorporated into today’s commercial applications.
With billions of Pins in the system curated by individuals, we have one of the largest and most richly annotated datasets online, and these experiments are a small sample of what’s possible at Pinterest. We’re building a world-class deep learning team and are working closely with members of the Berkeley Vision and Learning Center. We’ve been lucky enough to have some of them join us over the past few months.
If you’re interested in exploring these datasets and helping us build visual discovery and search technology, join our team!
Kevin Jing is an engineering manager on the Visual Discovery team. He previously founded Visual Graph, a company acquired by Pinterest in January 2014.
Acknowledgements: This work is a joint effort by members of the Visual Discovery team, David Liu, Jiajing Xu, Dmitry Kislyuk, Andrew Zhai, Jeff Donahue and our product manager Sarah Tavel. We’d like to thank the engineers from several other teams for their assistance in developing scalable search solutions. We’d also like to thank Jeff Donahue, Trevor Darrell and Eric Tzeng from the Berkeley Caffe team.