1,082 YouTube videos and an algorithm under scrutiny: Inside our Ontario election research project

ALEXA PAVLIUC
CBC Digital Labs
Published in
6 min readJun 4, 2018

What are Canadians seeing when they watch videos about the Ontario election on YouTube?

This was the question we posed after poring over the results of an exhaustive research study of YouTube’s recommendation algorithm.

For background, I’m a Research Analyst at CBC examining how Canadians go through shared national experiences with their public broadcaster. I gauge our audience’s interaction and loyalty on platforms that CBC doesn’t control, including social media and Google searches.

From internal research to CBC News story

My YouTube study started as an internal research project with Tara Kimura, a producer in CBC’s Digital Products department. Together, we discovered that YouTube videos from an account titled Steeper33 were highly recommended alongside those from traditional media sources, including CBC, TVO, CityNews and CTV. What was unusual about these videos was that they often contained click-bait headlines matched with snippets from Question Period.

We brought the story to senior CBC reporter Mike Wise, who searched for who might be behind the Steeper33 account. The result: this article, “How a 9/11 Truther may be influencing which Ontario election videos you see,” on CBC Toronto.

Steeper33 has posted videos of Question Period as well as others that support conspiracy theories, including U.S. “deep state” operatives, the Bilderberg group and the destruction of the World Trade Centre.

So, when you search for content about the election and click on a traditional news source (like a CityNews Toronto video in the following example), you are recommended these headlines:

Recommendations for the CityNews Toronto video “Kathleen Wynne could lose her seat in next election: poll”.

It’s environments like these, and their effects on Canadians, that I work to better understand.

Before the Ontario election, I primarily analyzed CBC’s reach on Twitter and Facebook. The game changed when I read an article about Guillaume Chaslot, an ex-YouTube developer who worked on the platform’s recommendation algorithm. He left the company over conflicting opinions on the direction the algorithm should take and told The Guardian that, “the recommendation algorithm is not optimising for what is truthful, or balanced, or healthy for democracy.” It was his work, and its applicability to my research on elections and CBC, that made me decide to add YouTube to the list of platforms I was researching.

Chaslot has since developed his own algorithm to allow anyone with a little Python knowledge to tap into YouTube’s recommendations for themselves and discover empirically what the video platform recommends for any topic. This algorithm has been applied in several research projects, including this one from the Algorithmic Media Observatory which had similar findings to my own.

YouTube responded to our story on CBC.ca, saying they’ve improved the algorithm significantly since Chaslot left the company. In an email to Wise, they said, “We made algorithmic changes to better surface clearly-labeled authoritative news sources in search results, particularly around breaking news events.”

Methodology: capture the news environment

I approach all events in the same way: I capture the general media landscape of a specific news event, and then I see where CBC is situated in that environment. With this approach, I can see CBC’s presence on the internet under a more holistic lens, and provide un-assuming insights to CBC’s digital success. This is the approach I used to gain a holistic look at the environment of recommended videos on YouTube around the Ontario election, including those from CBC.

To collect the YouTube recommendations, I used Chaslot’s scraping algorithm in Python that he shared on GitHub. The scraping algorithm assumes no prior search history, and we collected about 25 recommendations for six search terms twice a week. Every Tuesday and Thursday in March and April, I scraped the 25 most recommended videos for the following search terms:

  • Doug Ford
  • Kathleen Wynne
  • Andrea Horwath
  • Ontario Election
  • Ontario Minimum Wage
  • Ontario PC Election Results (only collected in March)

I chose the latter three terms because they were related to the party leader names on Google Trends, and therefore reflected people’s real search habits.

After collecting data for about two months, I combined the results for all search terms into one spreadsheet for each. Then, I combined all six spreadsheets into one, adding a new column to specify which recommendation was a result of which search term. In total, we collected 1,082 recommended videos.

Network visualizations bring data to life

A directed network visualization contains “sources” and “targets.” For my purposes, a source is a YouTube page name (like Steeper33 or CBC News), and a target is a search term (like “Andrea Horwath”). So in the following visualization, each YouTube page will point to the search terms for which it is recommended. This is the language that Gephi, a network visualization software, speaks. Those relationships between pages and search terms became this network visualization:

Network Visualization of YouTube search terms and recommendations.

Network visualizations bring people’s collective habits to life. They show us who is interacting with whom in a way that spreadsheets can’t. The structure of the graph (placement of YouTube pages in relation to each other) is determined by an algorithm called Force Atlas 2, further taking human bias out of the picture.

When we visualized what YouTube page was recommended to which search term, we found that Steeper33 was right in the middle:

Network Visualization of YouTube search terms and recommendations, highlighting Steeper33.

Not only were his videos recommended to five out of six search terms (they were not recommended if you searched for “Ontario minimum wage”), some were recommended upwards of 11 times on a given day. Out of the top 20 most often recommended videos, half of them were by Steeper33.

Types of fake news and information disorders

Since the U.S. election, there has been deep scrutiny over fake news, particularly in online platforms such as Facebook, Twitter and YouTube. I wasn’t expecting Russian dis-information campaigns, and that’s not what I found. As I’ve learned, there are multiple types of fake news and information disorders, each with a varying degree of un-truth. Here are the Ethical Journalism Network’s definitions:

  • Fake News: Information deliberately fabricated and published with the intention to deceive and mislead others into believing falsehoods or doubting verifiable facts.
  • Dis-information: Information that is false and deliberately created to harm a person, social group, organization or country.
  • Mis-information: Information that is false, but not created with the intention of causing harm.
  • Mal-information: Information that is based on reality, used to inflict harm on a person, organization or country.

What would you say this video from Steeper33 is about?

Or this?

Politics aside, these are clips from Ontario Question Period and a CTV News broadcast. They’re also two of the most highly recommended videos you’ll see if you search Ontario election related terms on YouTube. So if you search for “Doug Ford” or “Kathleen Wynne,” you’ll see traditional news videos, but you’ll also see misleading headlines like those above, and multiple times.

There’s a discrepancy between the content and the headline — meaning it’s mal-information, according to the Ethical Journalism Network’s definition.

Research like this is the reason I was drawn to data science in the first place. I wanted to enter a field where curiosity was king and where breadcrumbs of knowledge could grow into loaves of insight. When data gives you a trail to follow, let it lead the way.

Want to interact with the network visualization on your own? Simply download this folder from Google Drive, un-zip, and open the HTML File in Firefox!

--

--

ALEXA PAVLIUC
CBC Digital Labs

Social Data Science PhD Student at the Oxford Internet Institute. Not made by Amazon.