Data-Driven Exploration of Social Media Influencers.

Published in

INST414: Data Science Techniques

5 min readMar 29, 2024

The marketing industry has seen a radical change in the digital age, with social media influencers becoming increasingly important in influencing consumer behavior and increasing brand engagement. Since influencers have a great deal of power over their followers, marketers attempting to negotiate the complexities of digital advertising must fully comprehend influencer dynamics and audience impact. Through exploring the creation and characterization of clusters in a social media influencer dataset, I will set out to reveal the complex classification of this influential group. I seek to identify discrete groups of influencers that share characteristics and interactions with their audiences employing in-depth examination of many variables, including influence score, follower count, engagement rate, and country. For those involved in the marketing sector, this investigation is a wealth of knowledge, providing invaluable direction for making decisions concerning influencer partnerships and campaign targeting tactics. Through the process of analyzing influencer clusters, marketers can discover unrealized opportunities, customize their outreach strategies, and establish genuine relationships with a variety of target demographics. As a result, this Medium piece acts as a spark for practical innovation, enabling marketers to successfully use influencer marketing in the digital sphere.

This is a question one could ask using this network data; What are the top groups of influencers that share similar patterns of audience engagement that a particular company might be interested in partnering with? Some potential stakeholders that might be asking this question are Marketing Managers, Brand Executives, Influencer Specialists and Data Analysts/Scientists. Given that they are in charge of creating and carrying out marketing strategies, seeking out potential partners to negotiate on behalf of brands, and identifying patterns in data to support marketing teams, these stakeholders will be curious to know the answers to this question. Some decisions that this question will inform are strategies for influencer collaboration, tailoring brand campaigns and many more.

The data that could answer this question is different types of top influencers, their names and country of origin, the types of content they post and some brands they endorse. It is relevant because it will help answer the pressing questions the stakeholders will have. The data was obtained from kaggle and the subset of data was collected using python libraries to enable precise and relevant analysis on the data to answer the question using the network data.

Important characteristics of influencer activity and audience engagement in the dataset include influence score, followers, average likes, and total likes. An influencer’s total reach and impact can be gauged by their influence score, and the size of their audience can be inferred from the number of followers. The average number of likes on a post provides information about the average level of engagement and the connection of the content with the audience. Total likes are a measure of an influencer’s overall effect over time, combining all of the engagement they have received. Using these features, I computed the distance metrics using the Euclidean distance to evaluate the similarity between influencers. Using this metric, we can find groups of influencers that have comparable patterns of audience engagement, which gives marketers the ability to effectively collaborate with influencers in support of their brand goals and efficiently customize campaigns.

Based on the distribution of influencers throughout the dataset, I chose k=4 as the value of k for the KMeans clustering algorithm. This choice was made to make sure that the number of clusters offers insightful information and is consistent with the data’s useful interpretation. I wanted to avoid extremely few or disconnected clusters and instead generate a manageable amount of clusters that adequately split the influences, so I set k to 4. In addition, I noticed that there were significant differences in the number of influencers inside each cluster when taking into account the distribution of influencers among the clusters. Selecting k=4 contributed to a more balanced and informative clustering outcome by balancing the number of influences in each cluster.

The dataset is divided into discrete groupings using clustering, with each cluster denoting influencers with comparable traits. The following interpretation of the clustering result is possible:

Cluster 0: Influencers with Diverse Engagement Levels

Influencers in this cluster span a broad spectrum of engagement levels. Along with other celebrities like Beyoncé and Khloe Kardashian, it features prominent people like Selena Gomez and Leo Messi. These influencers have different engagement patterns that represent a mix of high, medium, and low engagement metrics, despite their varying levels of influence.

Cluster 1: Influencers with High Engagement

Influencers in Cluster 1 are well-known for having consistently high levels of engagement across a variety of measures. This category includes influencers with enormous followings, substantial likes, frequent posting activity, high impact scores, and Kim Kardashian, Ariana Grande, and The Rock.

Cluster 2: Distinct Profiles of Engagement

Influencers with a variety of engagement profiles are included in this cluster. With a wide range of influence and engagement measures, it includes influencers including Miley Cyrus, Nike, National Geographic, Taylor Swift, and Kendall Jenner. This group consists of a variety of well-known people, companies, and brands with various approaches to audience interaction.

Cluster 3: Individual Highly-Involved Influencer

Kylie Jenner, the only influencer in Cluster 3, is well-known for her substantial engagement and influence metrics. Kylie Jenner is the only person in this cluster, making her stand out in the dataset as a unique high-engagement influencer.

Four major groups of influencers with different engagement profiles are revealed by our analysis: Singular High-Engagement, Low-Engagement, Medium-Engagement, and High-Engagement. Influencers with distinct qualities, such as influence scores, posting frequency, and audience engagement levels, are represented by each cluster. Marketers can more successfully target particular demographic segments with their advertising by knowing the subtle differences between each cluster.

I attempted to use string manipulation and conversion routines, among other Python techniques, but I couldn’t seem to translate the string representations of integers with suffixes like ‘k’,’m’, and ‘b’ into numerical values. After failing in these approaches, I turned to Excel’s formulaic method for effective data transformation. To convert these to numerical values, I used Excel’s formula functions to replace these characters with the equivalent numeric multipliers (‘k’ to 1000,’m’ to 1000000, and ‘b’ to 1000000000). I read the data into Python using Pandas for additional analysis after it was properly prepared in Excel. This is the process I utilized to properly clean the data and get the necessary information needed for analysis. A significant limitation of the analysis was the lack of country data for specific influencers within the dataset. This made it impossible to classify these influencers according to their nationality, which might have affected how accurate the clustering procedure was. I made an “Unknown” nation category and put influencers who lacked country information in there in order to lessen this restriction. This method made it possible to include all influencers in the research, but it might have added inaccurate information when interpreting trends or patterns that were unique to a given nation.

Below is the link to the GitHub repository for this medium: https://github.com/EwuraImpraim/MODULE-4

Data-Driven Exploration of Social Media Influencers.

Written by ewuraimpraim