Instagram co-founder on the power of search, and co-engineering inside the Facebook empire

Derrick Harris
> S C A L E

--

On Wednesday, Instagram unveiled details of its new search features and search architecture, which is based on the Unicorn technology developed by parent company Facebook. You can read the details here in its blog post.

In this interview with Scale, Instagram Co-founder Mike Krieger talks about why search is so important to Instagram and why the move to Unicorn is such as big deal. He also discusses how to successfully work on technology integrations within large companies like Facebook, and he touches on the potential of deep learning, which is now making its way into Instagram’s search algorithms.

SCALE: At a high level, what’s the big improvement for users from moving to a Unicorn-based search architecture? More accurate, or more flexible, searches?

MIKE KRIEGER: It’s a combination of the two. I think of these things as verticals: We have our user search, our hashtag search and our places search — which is new as of our launch that we did in late June — and we also actually use the same search backend to show, even within a tag, what are the best photos representative of that tag.

So, as an example, the U.S. women’s soccer team parade was last week, and using the same Unicorn backend we were able to say which of photos are the most interesting ones from the event. We showed that it was everything from Hope Solo taking a photo at the parade, to people on the ground covering it from sports sites.

Switching to Unicorn, and all the work that the search team has done in the first half of the year, really unlocked huge new types of searches, like places, and then within each of those types of searches it made them far better in terms of their ranking and their relevance. And that is what’s borne out in the data, which is that people are actually finding what they’re looking for, which is something that just wasn’t as true before.

“We can dream up the products we always wanted to build but were held up by not having the right technology before.”

Can Instagram users go as far as searching only by followers, or even more granular? Or is that type of info just used for ranking right now?

Right now, we don’t expose that as something you can specify — “by people I follow,” for example. But that’s a signal we can use for things like top posts, and you can imagine in the future even having the ability to drill into things just by people I follow, or people that are somehow in my network and so on.

The really cool thing is that we got a product out that starts having some of these possibilities, but the big thing is all that Unicorn unlocks with regard to flexibility of how you can search and query. We’re really well set up for the future in terms of all our stuff that matters is in there, and we can dream up the products we always wanted to build but were held up by not having the right technology before.

The Instagram search architecture. Source: Instagram

Does Unicorn work internally, too, for business intelligence or analytics?

It’s a little bit of both. On the business intelligence and internal analytics side, the same data source that gets piped into Unicorn also gets piped into Hive, which we use for a lot of the more-offline-type queries that we might interested in. But one internal thing that’s pretty cool with Unicorn, just from a team dynamic and product development side, is because trying a different query doesn’t involve re-indexing the data, you can just try it out using the really flexible search syntax that we use internally.

It took product-development ideas from, say, “Yeah, it would be cool if we could get a list of every photo you’ve liked and rank it by how close they are in your core community, but it’s going to be like two weeks for us to re-index the data and if it’s not the right product then we’ve wasted all this time,” to “I’m just going to try this out. Oh, this is cool data, now we can starting thinking about what product makes sense there.”

Lowering that iteration time on ideas kind of flips the bit from previously, when we might just dismiss the idea because it was such a big bet to go build it.

Can you walk me through how media search in Instagram differs from user or hashtag search?

The way I think about the first few kinds of search — locations, hashtags and users — is those are all pointers to what really matters on Instagram, which is the content, the moments, the photos and videos. With those, you’re really just ranking … when I type the word “John,” which John should show up is a really personal experience and it’s really all about finding the accounts.

But then within a hashtag or a location — and in the future you can imagine other things, maybe a topic that’s trending — we want to be able to say, for example, “Alright, we know you’re interested the Stanford Dish. What are the most amazing things we can show you that are happening at the Stanford Dish recently?”

What’s interesting with Unicorn is we can iterate on that daily, or even try out quick tests and say that for 1 percent of people it will be about the photos that are taken by people who are closest to you in your network, depending on who you follow. Or the photos that were liked by the most people, and we don’t just want likes for all time because we want stuff that’s fresh and relevant. So maybe it’s photos that were liked recently, and all of those different algorithmic tweaks.

It’s interesting, we can start thinking into things like “is this a verified account?”. Or “is this a video?” because we think this page should be mostly about videos. Maybe it’s a concert by Taylor Swift, and we think videos of concerts are way cooler than just photos. We can focus on that and try that out and see how we like it, and how our users like it, as well.

“Every week we add millions of people to the platform … The haystack gets bigger and bigger, and you’re still looking for one needle.”

Where does search rank in terms of importance for the Instagram experience, and in terms of new products or features?

I think about this in two ways: There’s stuff that looks like search, and stuff that doesn’t look like search but is actually search.

Stuff that looks like search is a little more intuitive and obvious. And that’s one where even though we’ve improved and made huge strides in how relevant our results are when you search for people, it’s something that will continue to be important because every week we add millions of people to the platform and we say, “How are we going to make those folks easy to find?”.

The haystack gets bigger and bigger, and you’re still looking for one needle. That can be anything from making sure you’re able to find a friend who’s joined Instagram, to knowing that you’re really interested in tennis or you’re really interested in a particular place in the world, and over time getting better at making sure we’re surfacing those things.

That’s something we’ll absolutely continue to work on through the end of the year and beyond.

And then there’s the parts that don’t really look like search but are search, and we just talked about one of them, which is media search. We’ve been traditionally only chronological when you go to a hashtag page or a place page, and we only introduced the ability to be able to see top posts in the last month, when we launched the new search and explore revamp. I’m really excited about where that can take our product, which gets 80-million-plus photos and videos a day, and search and explore helps surface interesting trends around where those photos are being taken or what those photos are about using trending tags.

Let’s say we slice even within something pretty big, like July 4th: If you want to have a 1-minute-long experience about what’s going on around the United States on July 4, how do we use the same search technology to find stuff that you’re going to love? The most interesting videos of fireworks, the best photos from your friends, the best photos from, maybe, the White House if it posted photos of fireworks above the White House. The White House actually posted photos of it lit up after the recent Supreme Court gay marriage ruling, and that was something our search algorithms were able to surface within media search.

So the parts that are very explicit searches, we’ll continue to make those really relevant, which becomes really important as we keep growing. And then the parts I think of as implicit searches, like if I tack on a hashtag, that’s actually a search of our backend even if the person tacking on the hashtag isn’t thinking, “Oh, I’m going to query Unicorn.” They’re obviously not doing that, but what we can do there is give them the best possible technological experience for a tag, a place, a set of accounts or whatever it might be.

Media search in Instagram. Source: Instagram

Working effectively with the Facebook mothership

How regularly does the Instagram engineering team work with the Facebook engineering team? Was Unicorn a fairly novel integration effort?

It’s becoming more common, but this is definitely the biggest one that we’ve taken on. The way we looked at this was we had a big project at the end of 2013, and into most of 2014, which was to move our infrastructure from being inside Amazon’s datacenters and into Facebook’s datacenters. But even that move wasn’t so much about integrating deeply into Facebook — it was more about just running inside the Facebook datacenter, but still using our own technology, the stack we were on before.

But what it unlocked for the end of 2014, and this year and beyond, is picking the integrations that make the most sense. The ones that make the most sense to us are the ones where whatever we’re using that’s open source is not really fitting the bill, or it takes a lot of our team’s time to operate. Because we have pretty small infrastructure team, and I’d rather them be focused on things that are really high-leverage and unique to Instagram, whereas search is something that Facebook has definitely solved at scale.

“Instagram would have taken several years to build anything of that equivalent power. So it’s actually pretty amazing how quickly we can integrate that, given the overall development time it would have taken.”

The other criteria is that it’s a product, project or technology that Facebook itself has thought of as a service. And what I mean by that is that multiple teams inside Facebook are already used to using Unicorn; it’s not so purpose-built that it can only do one thing. The team is already used to thinking “How would this apply to user search at Facebook or how would it apply to Facebook ads?” and for us it’s “How would this apply to Instagram?”.

That’s how we’ve been picking these integrations, and I hope to do more of them. It’s not like you can flip a switch and they’re done. This was definitely a several-months project to be able to unlock that functionality, but when I think of it, Instagram would have taken several years to build anything of that equivalent power. So it’s actually pretty amazing how quickly we can integrate that, given the overall development time it would have taken.

Do you end up working really closely with the Facebook engineering team on this sort of project, or do they play more of a support function?

This is a learning experience, and we’ve iterated our way to finding a setup that we like that makes sense. The thing that we’ve found is most important is having a couple of people on the team that we’re collaborating with at Facebook who know really deeply about Instagram, and for them to be our points of contact.

In this case, we had two engineers who were up in Seattle, where some of the search development happens, and they were, for the first half of the year, effectively on the Instagram team. It’s really important to get them included in our team all-hands, in our events, in our happy hours — just have them feel like they’re brought into what we’re trying to go for as Instagram. That has worked well.

What’s worked less well is the vaguer, “Oh, we’ll collaborate with your team. Whoever has some time on that team will come help out.” That’s led to some less-effective integrations.

Having those champions on their team is huge, so we’ll have that going forward for every project we do as something that we’ve learned.

“Coming in late and saying, ‘This is really important to us,’ and it’s like the tenth most-important thing to them is not usually a recipe for success.”

I can see how the latter scenario would happen. Everyone in every company is busy with their own stuff most of the time.

That’s another thing, is having the empathy to realize that we have our roadmaps and we’re planned out for at least the next six months — if not the whole year — but that’s true for all the other teams at Facebook, too. So if you come a month before they’re trying to ship something new and say, “Hey, we have this great idea. Can you just help us out and build this thing?” they’ll often say, “Look, we’d love to, but we’re kind of doing this other stuff, as well.”

If I had to summarize the integration pieces, one is having points of contact on their team. The other is coming to other teams early enough in the process where they can really incorporate that into their roadmap, and they have in their head and on paper how your project ranks with their priorities. Because coming in late and saying, “This is really important to us,” and it’s like the tenth most-important thing to them is not usually a recipe for success.

“I like to nerd out and read the papers and get excited about the future, too.”

Dabbling in deep learning

I noticed that deep learning networks are a component of the new search experience. Is that something you’re investing in internally, or do you leverage Facebook’s expertise there?

This is one area where it’s really nice being inside Facebook; they’ve hired Yann LeCun, who basically invented a lot of the new convolutional neural network stuff. That’s been really cool. That’s a team we keep in touch with and we’re always interested in what they’re doing.

Obviously, we’re hyper-visual and object recognition is something that they’re interested in, as well. We’re not using it a ton yet. There’s places where it shines and places where it’s still a little bit iffy, but we’re keeping an eye out and in the future it could be really interesting.

The example we like to play with internally is Halloween. It would be interesting if you were to see who are all the people posting Halloween costumes, and what’s the most popular costume. That would be an interesting use case if we had really great neural networks and machine learning. But right now we’re mostly using the textual content of people’s captions and their accounts.

So the metadata is still more important than the actual images in terms of classifying them?

Yeah, exactly. It’s definitely less ambiguous right now. But I like to nerd out and read the papers and get excited about the future, too.

--

--

Derrick Harris
> S C A L E

Hi :) Find me on Twitter to see what I’m up to now.