Twitter Analysis: Looking at the Connection of Words Related to K-Pop

Belle Do
INST414: Data Science Techniques
3 min readMay 12, 2022

The K-Pop fandom has dominated social media. Even if you do not follow K-Pop media yourself, it is likely you have come across K-Pop related posts when using social networks, especially on Twitter where I have personally seen tons of K-Pop related replies made on tweets that do not relate to K-Pop at all.

For this analysis, I wanted to see what words were the most to co-occur when searching “Kpop” on Twitter. I thought it could be interesting to see what the current K-Pop discussion is online. I also follow the Korean entertainment industry myself and was curious to see if I could recognize the results of this social network analysis. More practically, understanding the “popularity” of certain words could be helpful for those, such as influencers or brands, who want to engage with and market to the K-Pop community online.

To start, I needed to access a Twitter developer API which I did in order to obtain a key, secret, and token. In addition to a Twitter API, I will be using Tweepy — a python library for Twitter — , Pandas, and NetworkX to visualize the networks of the word paid occurrences. I set my search term as “kpop”, filtered through tweets from one month ago and collected the most recent 1000 tweets. Next, I needed to clean up the tweets such as ensuring they were all in lowercase letters and removing stopwords. To remove stop words (commonly used words), I used the Natural Language Tool Kit in python.

In order to find the occurrences of certain words in relation with “K-Pop”, I used the bigram function part of the Natural Language Tool Kit (NLKT) in python. This function will generate word pairs in the collected tweets. To keep track of these word pairs and count their frequency, I created a dictionary where the keys were bigrams and the values were their count. I then put the top 15 most occurred bigrams into a pandas data frame.

After creating this data frame, I used the NetworkX package to create a graph and visualize the most occurring “kpop” related bigrams on Twitter. The graph showed a cluster of words that mostly paired with the word “kpop”. Those words are concept, members, debut, and knetizens. As mentioned previously in the article, I do follow K-Pop and after running this analysis I am pleased to have been able to recognize these words.

In terms of complications, I struggled with understanding how to use the Twitter API. However, after reading the documentation, I could better understand how to access certain endpoints provided by Twitter’s API.

--

--