# Teaching a neural net to be interesting

If you’re reading this, doubtless you’ve seen this post on Google’s new YouTube thumbnail selection model. It came across my Twitter feed, followed by Slack, and even in texts from friends who know I am an engineer at Neon. “Look at what Google does!” they told me, breathlessly. I had one thought: it was pretty damn cool. In light of this, I figured it was time to share a bit about how Neon does what we do.

Google and Neon, and all companies who use machine learning and deep learning, need a model (to do the predicting) and an objective function (to tell the model how it’s performing). In the case of Neon and Google, these are coupled with some fancy calculus to ensure that the model is always improving.

Though our methods are broadly similar, the interesting bit is that Neon and Google have different objective functions. Google wants to take a video and predict which frames are the most like those selected by a video uploader. Neon takes a different tack and, given images, predicts which will be perceived as the most interesting by a potential viewer.

Predicting which images are the “most interesting” sounds pretty simple, but it’s actually incredibly complex. How can you even measure interestingness? You might be inclined to think, “simple — have a bunch of people rate a bunch of images on a scale of 1 to 10.” There’s a problem, though. Let’s imagine someone is rating some images.

Computer:

“Rate these on a scale of 1 to 10”

Image 1:

You:

“Ugh I hate Brussels sprouts. 1/10”

Image 2:

You:

“Oh doughnuts. Compared to Brussels sprouts, they’re the best! 9/10”

Computer:

Image 3:

You:

“Beyoncé!!! Yasss! 10/10!”

Clearly, if doughnuts are a 9 out of 10, Beyoncé is something like 42 out of 10. Doughnuts are definitely awesome, but they’re way less awesome than Beyoncé.

This is an exaggeration, but it illustrates the problem well: when people are asked to sequentially rate images (or, anything really), their ratings are biased and tend to depend on the order in which things are presented. This is a problem. We want the truth!

I won’t bore you with the details, but the short answer is that we use math to directly measure how good the ranked images are.

The next thing to figure out is what a deep neural network model needs to know in order to predict how interesting an image is. Believe it or not, deep neural nets need to be trained to do things, just like kids need to be trained not to eat things they find on the ground. Again, I won’t bore you with the details (besides, they’re top secret), but in short, it involves image rankings plus what we know of how the brain perceives.

Let’s go back a minute. That part was important. We’re not training our models to find images it thinks people will select as a thumbnail. We’re actually trying to figure out which images people will perceive as the most interesting.