Engineering visual search inside Pinterest browser extensions

Kelei Xu | Pinterest engineer, Product Engineering

Pinterest is a visual discovery engine with 100B ideas saved by 150M people around the world. We recently launched three new ways to discover more ideas on Pinterest and from the world around you with Lens BETA, Shop the Look and Instant Ideas. Today we’re bringing that same visual discovery technology to the whole internet with the launch of visual search inside Pinterest browser extensions. For the first time, you can use our visual search technology outside of Pinterest across the web. Just hover over any image you see online and find related ideas and products without leaving the site you’re on. In this post, we’ll share how we built visual search into the Pinterest browser extension for Chrome.

Inception

The idea and initial prototypes for visual search inside Pinterest browser extensions started almost two years ago. Before we even launched visual search to Pinners, a couple engineers and a designer brainstormed product ideas where we could apply our visual search technology. The browser extension was one of the first ideas we came up with and prototyped. We were excited about the concept, but decided to prioritize launching visual search within our own app first. Since then, we’ve launched new visual search features like real-time object detection, and made significant improvements to our technology, including improving our visual model, developing new state-of-the-art visual signals and increasing the number of objects we recognize. Now, we’re launching visual search for the whole web.

Serving visual search requests outside Pinterest

There are two ways to visually search using the Pinterest browser extension. After you download the Pinterest browser button for Chrome, just hover over an image, click the visual search icon (magnifying glass) and get related results. You can also get results for the entire visible web page by right clicking on the page. Clicking on the visual search icon triggers a flow where we take the URL of the image and render it in our visual search overlay. When you right click on the page to search, we use Chrome’s captureVisibleTab API to screen capture the entire page. This allows us to visually search on things that aren’t static images, such as videos and GIFs. But, captureVisibleTab only works on background scripts and not the injected content scripts that handle all the UI. We use Chrome’s message passing API to send the screenshot data URI to our content script, resize it and display it as an image in our visual search overlay on the webpage. All of this happens in real-time, in a fraction of a second.

To set up the visual search cropping selector interface, where you can move and resize the search box around anything in the image, we resize the image or screenshot to fit inside the available page height and be no greater than 50 percent of the available page width. We draw the resized image as the background of an HTML element and overlay it with a transparent canvas which contains the cropping selector. When we initially show the visual search overlay, we select about 90 percent of the image, animating the selector inwards from the edges so it’s apparent to the Pinner what’s going on.

Backstage, we draw the original image into a hidden canvas and convert it to a data:URI using canvas.context.getImageData. In order to reduce latency, we resize the image to the minimum size necessary for our visual models. After the Pinner finishes making their crop selection, we send the data:URI to our background script, along with the selector’s top, left, height and width values, so we know what to search and where to look. In our background script, we convert the data:URI into a blob and send all the data to our API via an XMLHttpRequest.

We always search for the initial selection on load, so there are some results (and hopefully some annotations) to work with. Search results come back from the API in the form of Pin objects. We render these as Pins in a familiar-looking Pinterest grid, which can be immediately saved or run through Search again, right there on the page. We’ve also added hovering Search buttons to images found on the page when someone clicks the browser button to help make visual search more discoverable.

API layer

On the API layer, we need to do two main things: upload the image from the client to a temporary S3 store and send the image to our visual search service. These used to be dependent, sequential tasks until one of our engineers parallelized them, cutting latency greatly.

We temporarily store the image for performance reasons. On the initial search, we upload the raw image to the API along with crop coordinates, and the API sends back a link to the image. For second and subsequent searches we repeat this link back to the API along with new top, left, height, and width values, so we don’t have to keep sending the raw image data, which would be very wasteful.

Future plans

With this update, you can now use Pinterest visual discovery technology to find ideas in our app, across the web and out in the world. And this is just the beginning. Here are just a few of the things on the roadmap:

  • We’ll bring real-time object detection to browser extensions to parallel the visual search experience in our app. This enables Pinners to simply tap on objects we identify and get results vs. manually identifying and pinpointing objects within images.
  • We want to expand beyond visually similar search results to show you how to bring ideas to life, similar our approach with Lens’s results. For example, if the input image is an avocado, we want to show you more than other avocados, including health benefits, recipes and how to grow them.
  • We’ll bring visual search to all our browser extensions. To start, we’re rolling out visual search inside the Pinterest browser button for Chrome to Pinners globally today.

If you’re interested in solving computer vision challenges like these, join us!

Acknowledgements: Albert Pereta, Andrew Zhai, Christina Lin, Dmitry Kislyuk, Kelei Xu, Kent Brewster, Naveen Gavini, Patrik Goethe, Steven Walling, Steven Ramkumar, Tiffany Chao, Tonio Alucema