My and ChatGPT’s Analysis of a TikTok Network
Motivations
In the digital age, understanding the dynamics of online social networks can offer valuable insights for a variety of stakeholders, from marketers to sociologists. By analyzing the web of connections between entities whether it be people, websites, or articles we can uncover patterns and nodes of significance that define the network’s structure and influence flow. This is why I chose to examine TikTok influencer data. TikTok is one of the newest social media platforms to be widely popularized, so there should be abundant data to build a network of influencers from TikTok data.
Defining the Question and Stakeholder
The question I wanted to answer was how related TikTok influencers are to each other based on their popularity in terms of likes, views, and follower counts. I used statistics such as likes, follower count, and views to determine how popular TikTok influencers are connected as a network.
A stakeholder might ask “Which users are the most influential within a specific online community or platform?” This question is of paramount interest to various stakeholders, including marketers, content creators, and platform administrators. For marketers and brands, identifying these influential users can inform targeted advertising and partnership decisions, aiming to leverage their reach and impact for promotional activities. Content creators seek to understand the traits of these influential nodes to emulate their strategies and increase their own visibility and influence. Meanwhile, platform administrators may use this information to enhance user engagement, monitor content dissemination, and ensure a healthy digital ecosystem. Another example of a stakeholder could be a digital marketing agency specializing in influencer marketing. This agency seeks to identify top influencers for campaign partnerships to maximize brand exposure and engagement.
Data Overview
The data comprises TikTok influencer profiles, capturing fields such as username, follower count, total likes, and video views. This dataset is pivotal as it provides a quantitative measure of each influencer’s reach and engagement, essential for identifying key nodes within the network.
Data Collection
This data was found on Kaggle.com. It exists as part of a collection of CSV files containing social media data from 2022 on Instagram, YouTube, and TikTok. https://www.kaggle.com/datasets/ramjasmaurya/top-1000-social-media-channels?select=social+media+influencers+-+Tiktok+sep+2022.csv
Network Graph Construction
In the analysis, each node represents a TikTok influencer, encapsulating their digital persona and influence sphere. Edges signify the relationship between influencers, defined by similarities in likes, views, and follower counts. This structure allowed me to visualize and quantify the connectivity and influence flow within the network. In graph construction, I used the modularity class to sort the graph into sets and used a Yifan Hu layout. When it comes to the graph the purple edges represent likes, the green represents followers and the orange represents views.
Defining “Importance” and Identifying Key Nodes
Importance in this context refers to an influencer’s ability to affect the network through high engagement rates, extensive reach (follower count), and content virality (likes and views). We can use metrics such as degree centrality (number of direct connections), betweenness centrality (influencer’s role as a connector between other nodes), and PageRank (overall network influence) to identify key influencers.
Important Notes: Based on preliminary analysis, three influencers emerged as particularly significant:
Khaby.lame: Highest follower count, acting as a major content distribution node.
Centralcee: Exceptional engagement rates based on likes, indicating high content virality.
Mr.Beast: High engagement based upon views.
Outliers: In general, several nodes were outliers for the dataset. These nodes reside on the outskirts of the graph and symbolize influencers who may not be popular but have gone viral in the past.
Analysis and Findings
Using Gephi, I quantified the influence and connectivity of the nodes, uncovering patterns of engagement and influence flow. This analysis revealed a subset of influencers who are not only popular in terms of follower count but also serve as central hubs for interconnectedness in the case of my analysis.
Modularity (0.279): Modularity values can range from -0.5 to 1. A modularity score of 0.279 suggests a moderate level of structure within the network, with identifiable communities or modules. While not extremely high, it’s significant enough to suggest that the network is not random and that there are clusters of nodes that are more densely interconnected. This implies the presence of groups within the TikTok influencers’ network that are likely to share common metrics when it comes to popularity levels within likes views and followers.
Density (0.195): Network density is the ratio of the number of edges in the network to the number of possible edges. A density score of 0.195 is relatively high for large networks, which usually have a much lower density. It indicates a well-connected network overall, with many influencers having some form of connection or similarity with many others. This could point to a vibrant community where content and influence have the potential to spread widely and quickly.
Data Cleaning and Common Bugs
Data cleaning involved removing duplicate entries, correcting missing data points, and normalizing fields for consistency. Common issues included inconsistencies like view formats, addressed by cleaning the data by filtering out ‘M’ or ‘K’ at the end of the numeric value and making all values integers. Additionally, I use Min/Max scaling to normalize the dataset and put the values for columns I am interested in with a range from 0–1.
Limitations and Bias
This analysis acknowledges limitations such as the dynamic nature of social media influence, potential data collection biases towards more visible influencers, and the exclusion of qualitative influence measures. Future work could incorporate sentiment analysis and deeper engagement metrics to refine our understanding of influence. Furthermore, my analysis is solely based on the popularity interactions of these TikTok influencers. It does not consider the type of content these creators are producing or interactions where creators are liking each other’s posts. This analysis is exclusively based on the counts for followers, likes, and views but leaves other potential factors out of the analysis for the sake of time.
Conclusion
My exploration into the TikTok influencer network offers valuable insights for stakeholders aiming to navigate the intricate web of digital influence. By identifying key nodes and understanding their roles within the network, we can unlock strategies for effective engagement and influence leveraging.
Github: https://github.com/Dante4k43/INST414_Module_2_Assignment-.git