Using BERTopic to Analyze Qatar World Cup Twitter Data: Part 1

DamenC
5 min readMar 11, 2023
Photo by Rhett Lewis on Unsplash

Qatar World Cup was full of surprises! From Saudi Arabia shocking the world by upsetting Argentina to Morocco's historic run to the semifinals, you must heard or witnessed those moments during that soccer craze. In this post, I will use BERTopic to analyze tweets posted during the World Cup 2022. Let's see what were the most popular topics related to the World Cup and if we can make sense of those topics.

Prepare the Data

First, we need to retrieve data from social media. This time, I will use text data retrieved from Twitter as our study object. To scrape tweets, we will use snscrape, a scraper for social networking services such as Facebook, Twitter, and Reddit. To install the development version:

pip install git+https://github.com/JustAnotherArchivist/snscrape.git

Using this scraper, we can get attributes users, user profiles, hashtags, searches (live tweets, top tweets, and users), tweets (single or surrounding thread), list posts, communities, and trends from Twitter. To initiate snscrape:

# Get tweets using SNSCRAPE 
import snscrape.modules.twitter as sntwitter
import pandas as pd

Then, let's get some tweets written in English containing search term: world cup, from Nov. 20 to Dec. 18 2022. Note: since we…

--

--

DamenC

MS in Informatics; MA in Area Studies. Interested in using AI for sustainable development of nature and human societies. Contact me at: s2226089@u.tsukuba.ac.jp