Automating BuzzFeed

How BuzzFeed is using automation processes to ensure that every piece of content has a fair chance of reaching its target audiences.

How are AI and automation used at BuzzFeed to inform publishing decisions across their many, many pages? At a meeting with the GEN Study Tour group in New York City, Gilad Lotan, VP head of data science at BuzzFeed, let us take a peek at some of their internal dashboards, learn about their AB testing process, and find out how to stop a chili dog recipe from getting automatically posted on BuzzFeed Animals.

Data to drive decisions

BuzzFeed has invested a lot in what they call distributed platforms including Twitter, Facebook, Snapchat, Instagram, Youtube, and Pinterest. The publisher has 300m subscribers on YouTube, over 100 Facebook pages, and BuzzFeed.com enjoys 200m+ monthly unique visitors.

These different platforms are not only used to drive content views, but seen as places to experiment and from which to apply learnings to other platforms.

‘This only works if you really collect data about audiences right? So what we’ve built is this this state of the art collection pipeline and tech. We’ve really invested in the tech internally which helps us basically understand every piece of content that we publish anywhere on an off site: what is happening with it, how people are engaging with it, and how [consumption] is changing over time in different slices of whatever data is available’, said Lotan.

This data is then used internally to make decisions and ensure that every piece of content is given a fair opportunity to reach the right audience.

Humans learn from dashboards

BuzzFeed has a number of internal dashboards that consume the data collected from their pipeline.

One of the simpler ones is called El Dashboard. It lets anyone at BuzzFeed see performance measures for content posted on all platforms, including items that have been adapted and republished. For example, El Dashboard can track a video that has been created for BuzzFeed’s food vertical Tasty, as well as its modified version which features on a YouTube compilation video. The viewer can then have access to and compare the performance measures for both. This helps the team understand the core components of the content: where it was promoted, where it was published, and what the audience reaction was.

El Dashboard

Dashbird is a more visual and opinionated dashboard, which calculates a metric called social lift. This is a ratio between viral views (views outside of BuzzFeed’s promotion and control) and seed views (views from BuzzFeed’s own promotion on their site and on social networks). Dashbird allows the viewer to see when content was published and on what pages, helping them analyse what works and if benchmarks are being met.

Dashbird

‘Everyone can access this in the company. Everyone has access to all the performers measures, which I think is really powerful. I know some media companies don’t want their writers to know, but I think it’s important to understand what the audiences are reacting to. And this is so ingrained in the culture here’, said Gilad.

Automation, automation, automation

BuzzFeed has a set of machine learning based models to help guide the publishing process across their many, many pages.

1. Reuse, translate, recycle

‘We think a lot about adapting content internationally: how do we identify content that is likely to perform well if we were to translate it?’, said Lotan.

BuzzFeed has collected a lot of historical data over the years. Using heuristics and logistic regression, the model they have built can predict the type of article that might do well in other languages. For example, if the team were trying to identify what kind of content to adapt from the English-speaking to the Portuguese market, the model would go through data, including performance split by country, all historical articles, and all articles that have been translated from English to Portuguese in the past, and a ‘hotness score’ would be generated based on the output of the logistic regression. The higher the hotness score, the more likely it is that the article will be a success.

Hotness scores

A slack pager bot then notifies editors of any recommendations. And then it’s up to the editors to choose whether to go ahead or not.

2. Automating survival of the fittest

AB testing is another practice that requires a bit of automation at BuzzFeed. An AB testing tool is built into their CMS (content management system) and in any article, writers can choose if they want to test any combination of titles and images. The results of the test are then sent to Slack where writers can see which headlines and images work well and which don’t.

3. Machines learning from human behaviour

Feed ranker is an internal service which leverages machine learning to rank and feature BuzzFeed’s most engaging content on their homefeed.

A multi arm bandit algorithm takes all the new content that’s being published and then tries it out on prominent places for a certain period of time. It learns about performance and how people engage with the content, including clicks, time spent on an article, completion rate, and shares. Based on what the algorithm learns, it adapts the placement of the content on the homefeed.

‘It’s very dynamic and reinforced by the actual audience’, said Lotan.

4. Automated cooking inspiration

Word2vec leverages neural networks to understand word associations. BuzzFeed has created a variant called recipe2vec, which is used for BuzzFeed’s food vertical Tasty to add ‘related recipes’ to a recipe page.

Recipe2vec is an internal system that surfaces similar recipes by learning vectorized representations for words in a corpus of recipes. It doesn’t just put recipes together that use the same ingredients, but ones that have similar preparation methods or flavour profiles.

For example, rather than grouping all pork based meals together, the related recipes links on the Garlic Herb-stuffed Pork Chops recipe page might also include Fajita stuffed chicken!

5. Neural networks for publishing decisions

BuzzFeed’s internal tool Social Mission Control automatically determines on to which Facebook page specific content should be posted in order to promote it.

‘Some of these pages are really massive with millions and millions of followers, so we use a variety of historical data to inform these decisions’, said Lotan.

How does this automated curation work?

BuzzFeed uses a bag of words classifier, which groups together similar words. (Classifiers in a neural network group things together by shared characteristics).

Once classifiers are trained with words on all topics covered by BuzzFeed, they can start identifying places on the different pages where there’s an opportunity to promote certain content. It is then up to the human editor to accept or reject the recommendation. If BuzzFeed doesn’t have anyone in charge of curating a certain page or the page is less important, this process is fully automated.

Relevance and republishing

BuzzFeed is continuing to add more ‘state of the art technology’ to these publishing systems, to make them as effective as possible. One example is optimising relevancy. On BuzzFeed Animals, for example, you’re likely to come across the words ‘dog’ and ’adorable’ far more than once, so articles containing these words can be automatically matched to this page. But how can they make sure that an edible chili dog doesn’t find itself among all the adorable, fluffy dogs? Neural networks are trained to understand not only the words but their relationship to each other. The words in the chili dog recipe wouldn’t be associated with the words in the piece about the animal, meaning the relevancy score would be low, and the chilli dog would stay well away.

Recommendations for evergreen content are also automated so if editors need to publish fully baked content right now, they have quick access to relevant stories.

‘Reusing and adding multipliers to our content gives BuzzFeed a big advantage in participating in this multi-platform ecosystem. We’ve seen some promising results. For some first pages where the process was fully automated, we’ve seen as good results, if not slightly better’, said Lotan.

‘We’ve seen growth in the number of items that are published across these pages. And so we’re more effectively distributing our content to the right audiences this way. Many times, this content is just forgotten, because we have so many items published per day. When you have many places where you can publish these items, these tools become really impactful in action’.

At the end of the day, human creators still have full control and can overwrite the decisions made by the machine, said Lotan. According to him, the humans working at BuzzFeed don’t feel like the machines are taking over their jobs, but the tools are taking away the most boring bits, giving them more time to strategise.


Gilad Lotan is the VP head of data at BuzzFeed. He’s also adjunct professor at NYU. Prior to this, he worked at betwaworks, Microsoft among others.