Popular Content On Instagram

Alice Miles
INST414: Data Science Techniques
5 min readApr 14, 2024

Introduction

In today’s society, there are many influencers all across different social platforms, all vying for the attention of users of these platforms. Each of these influencers have their own areas of expertise and interests that their followers find captivating. All across different social media platforms such as TikTok, Youtube, and Instagram, these influencers are all trying to grow their following and be at the top of these rapidly growing and competitive platforms. In this article, we will delve deeper into the importance of network data within Instagram and how it can be used to analyze the top influencers and which content types are the most popular among influencers.

Stakeholder and Question

A question that can be answered using network data is how can we analyze which content types are the most popular among influencers? The stakeholder that is asking this question is influencers who are on Instagram or trying to make it on Instagram. The decisions that the answer to this question will inform is content strategy such as what kind of content to make if it wants to stand out among influencers.

Data Description

The data that can answer this question is data collected of social media influencers on Instagram in September 2022 from Kaggle. This dataset includes influencers’ name, account name, category which represents what field of influence they are in such as modeling, sports, music, etc, and what kind of content they make, the amount of followers/subscribers they have, the country they are from, the average amount of audience engagement, and which country the majority of their audience is from.

Data Collection

I collected this dataset from Kaggle and then downloaded it. I looked through each dataset for different social media platforms and I mainly looked at the dataset’s columns and values to try and determine which one was more relevant to my question. The dataset included all the information and data I needed in the csv file so I just imported it to my working environment. There were some empty values but the majority of the values were in the dataset so I did not think the empty values affected the dataset too much. So I just left the empty values there as it originally was.

Node

In this particular network data graph, each node represents different Instagram influencers. So the entity represented by a node is each of the Instagram influencers, since each influencer has their own unique account that captivates certain audiences. The edges, which represent the relationship between these nodes, shows a connection between influencers based on their content type.

Importance

“Importance” in my graph means finding out which nodes have the most connections with other nodes because it can highlight the most significant nodes and the relationship between the nodes. Three important nodes in this network data based on degree centrality are Zendaya, Tom Holland, and Millie Bobby Brown, and all three of them fall into the cinema and actors/actresses content type.

Data Exploratory Analysis

From the graph analysis I completed, I can answer my question of how we can analyze which content types are the most popular among influencers. Using the network data graph, I can point out which nodes are the most important, which can potentially lead us to the answer of which content type is the most popular among influencers. Also, the bar graph shows the number of influencers that create these kinds of content types. From this graph, the most popular content type among influencers is cinema and actors/actresses, then lifestyle, and then sports with a ball.

To analyze the data and extract the necessary insight from it, I first read the csv file containing all the necessary data such as content type and the influencer’s name/account name. Then I added the nodes and edges to create a network graph. The nodes represent the influencers and the edges represent the connection to other influencers based on content type.

After that, I counted the number of influencers who make each content type and displayed it into a bar graph. The graph displays the content types and the number of influencers who make these types of content, showing which content types are the most popular among influencers.

Lastly, I calculated the degree centrality and the pagerank centrality and printed the top three most central nodes based on these calculations. The degree centrality calculates the number of edges that a node has, so the higher the degree centrality, the more central the node is. Based on this calculation, Zendaya, Tom Holland, and Millie Bobby Brown are the top three most central nodes. The pagerank centrality shows the importance of a node. But based on this calculation, the top three most important nodes are Dhanashree Verma, Selena Gomez, and SUGA of BTS.

Cleaning

This data had some missing values but they did not really affect the dataset as a whole, so I just left the data how it originally was, since the majority of the dataset didn’t have missing values. When I displayed the data, I included all the columns, but when I created the data into the graphs, I only included the content type and the influencer’s name column.

Limitations

Even though this data provides an analysis that answers the question, just like any dataset, it is not perfect and there are limitations that come with it. One limitation that comes with this dataset is that this data is collected from Kaggle, which only gives a sample of all influencers on instagram and the types of content that can be found on Instagram. So it does not give the entire spectrum of the kinds of content that circulate around Instagram. Another limitation with this data is that the categorization of content types can be subjective to each person, so one person might think an influencer falls into one category but another person might think that that same influencer falls into a different category. Lastly, this data might show a sampling bias and overrepresent one content type over another, potentially leading to a skewed data distribution.

Github

https://github.com/achan520/Module-2-Assignment

Resources

Jas, R. (2022, December 27). Social Media Influencers in 2022. Kaggle. https://www.kaggle.com/datasets/ramjasmaurya/top-1000-social-media-channels?resource=download&select=social%2Bmedia%2Binfluencers%2B-%2Binstagram%2Bsep-2022.csv

--

--