Inside the Ring: Unraveling WWE’s Storylines and Fan Pulse through Data-Driven Insights

Ali Sakhi Khan
13 min readJan 12, 2024
Photo Credit: PWMania

Growing up WWE had been an integral part of my life, from mimicking The Rock’s moves to being captivated by characters like Kane and Triple H. As I matured, I realized my love wasn’t just for the action but the compelling stories these characters portrayed. Attending The Royal Rumble last year, the main event featuring two WWE giants was exciting, but what truly enthralled me was the intricate storyline. Witnessing Sami Zayn’s inner conflict within Roman Reigns’ faction added depth to the narrative. Months of loyalty tests led to a pivotal moment where Sami, torn between friendship and allegiance, defied expectations and left the group. This is the essence of WWE — a grand narrative, akin to a TV show, where interlocking storylines unfold over years.

For years, fans felt unheard by WWE’s script decisions, but ‘The Bloodline’ storyline marked a turning point where the company seemed to heed fan sentiments. Inspired by this, I had a revelation in the world of data science — why not use it to recommend WWE storylines based on social media sentiment analysis? Why not decode the fan chatter and contribute to scripting WWE narratives?

Collecting data

The initial step involved scraping fan sentiments from various online forums, but the challenge was finding a platform with diverse WWE-related discussions. WrestleTalk, a YouTube channel covering a range of wrestling topics, proved ideal due to its generalized subject matter, ensuring discussions encompassing various themes. To overcome scraping issues on YouTube, I shifted to using Web Scraper, collecting a year’s worth of WrestleTalk video links. Utilizing the YouTube API, we gathered and stored comments along with author IDs, totaling over 10,000 comments.

Simultaneously, my team members and I organized wrestler information through web scraping www.thesmackdownhotel.com. The project was divided into Python data processing and Snowflake for SQL queries (where I faced some hiccups), allowing a comprehensive analysis of sentiments, lift analysis, and merging with wrestler details. This endeavor not only gathered valuable insights but on a personal front, provided an opportunity to learn the Snowflake platform.

Cleaning and Preprocessing

The initial phase involved refining the YouTube comments dataframe. I systematically eliminated common English stop words, directing focus towards substantive content. Subsequently, I tokenized the comments, breaking them into individual words to facilitate in-depth analysis. I observed certain punctuations lingering after the stop words removal, prompting the application of regular expressions to eliminate inverted commas, quotation marks, and the like. This refinement allowed me to calculate the word frequencies, offering a bird’s-eye view of prevalent fan discussions. In addition, the VADER (Valence Aware Dictionary and sEntiment Reasoner) tool helped assign sentiment scores to each comment. These sentiments were thoughtfully categorized as positive, negative, or neutral, before being stored in a new dataframe.

The data preparation process for this analysis posed considerable challenges and required meticulous merging of various tables, along with the creation of multiple columns. Despite the complexity, this undertaking provided a valuable learning experience. Exploring nuanced sections of Python to analyze the data not only expanded my skill set but also equipped me with the ability to derive actionable insights. The intricacies of the task enhanced my understanding of data manipulation and reinforced the importance of a systematic approach to handling diverse datasets. Overall, overcoming the challenges in data preparation contributed significantly to my proficiency in data analysis and problem-solving using Python.

Analysis

Popular wrestlers

Vince McMahon, the owner of WWE, claims the top spot on the leaderboard for the highest number of mentions, making him the most prominent figure among individuals who engage with Wrestletalk’s YouTube videos. This underscores his unparalleled popularity within the community.

Moreover, Roman Reigns, the reigning World Heavyweight Champion, stands out as the most favored among the current WWE wrestlers. This bodes well for the company, as the champion assumes the role of the company’s face, ensuring extensive exposure and representation for WWE.

Adding to the positive outlook, Cody Rhodes, identified by the WWE as the next face of the company and slated to challenge Roman Reigns for the title, emerges as the second most popular wrestler among the current roster. This aligns with WWE’s strategic vision, suggesting that their plans are indeed on the right track.

The next step involves gauging the fans’ sentiment towards Cody Rhodes. Are the prevailing opinions positive or negative? This analysis will provide valuable insights into how the audience perceives the anticipated successor to the World Heavyweight Championship. But, before we get there, let’s explore the data some more.

Gender popularity

The disparity in mentions between male and female wrestlers caught my attention, prompting me to delve into the popularity dynamics within each category.

Focusing on male wrestlers, I excluded Vince McMahon from the analysis to gauge the current roster’s popularity. The audience’s fervent support for Sami Zayn and LA Knight during live events is palpable, evident in the resounding cheers and chants that accompany their performances. Consequently, their inclusion in the top 5 male wrestlers comes as no surprise. What adds an intriguing twist is the presence of Luke Gallows in the rankings. Despite minimal appearances in shows and limited company promotion, Gallows has managed to secure a spot in the top 5. This anomaly prompts an exploration of fan sentiment towards him through a sentiment analysis.

Shifting attention to the Women’s Division, Jade Cargill emerges as a standout figure. Having previously garnered significant popularity in WWE’s rival promotion, AEW, her transition to WWE naturally positions her as one of the most talked-about female wrestlers. In a contrasting scenario, the sentiment analysis for Charlotte Flair becomes a point of interest. Amidst fan complaints about her forced dominance, exploring the overall sentiment will provide valuable insights into how audiences perceive her within the current wrestling landscape.

Leveraging Wrestlers’ Birthplaces for Enhanced Fan Connection

Understanding a wrestler’s origin holds significant importance in the WWE. With weekly shows and special events spanning across the United States and overseas, pinpointing each wrestler’s home state becomes a strategic tool in event planning. The ability to align events with the hometowns of most wrestlers not only cultivates a sense of connection and belonging for fans but also serves as a catalyst for scripting compelling storylines. For example, Sami Zayn clinching a championship in his Canadian hometown, elicits the possibility of an overwhelmingly positive response from the crowd.

In the context of this analysis, where North Carolina and California emerge as the leading producers of active wrestlers, WWE can strategically plan major events around their most popular North Carolina-based wrestler, Charlotte Flair, and the California-based superstar, Jey Uso. By leveraging information about wrestlers’ birthplaces, WWE not only enhances fan engagement but also crafts immersive experiences that resonate with the diverse origins of their roster. The result is a more personalized and connected audience experience, enriching the overall appeal of WWE events.

Note: Access to the authors’ locations would further support the analysis. For instance, knowing the location of authors from North Carolina could reveal how frequently they mention Charlotte Flair or other wrestlers from the state. High or significant mentions suggest that organizing events in those locations would boost engagement.

Lift analysis

The reason why I decided to focus on the lift values is because it comprehensively considers all assessed wrestlers. Wrestlers with high lifts exhibit a strong association, indicating that the mention of one wrestler often coincides with the mention of the other. This association could suggest two possibilities. Firstly, there might be a desire among the audience to see these wrestlers compete against each other. Secondly, it could imply a preference for them to team up or that they are already part of a tag team.

This observation prompts us to explore two key aspects. Firstly, we should investigate whether sentences containing the names of both wrestlers convey a positive or negative sentiment. Secondly, we should explore the content of these sentences to discern what is being expressed about the wrestlers.

Based on fan comments, there is a notably high positive sentiment towards Dragon Lee and Wes Lee, as well as Jimmy Uso and Jey Uso. Without delving into the specific comments, it’s apparent that Dragon Lee and Wes Lee are held in high regard, potentially stemming from a match they had eight months ago. Fans seem either excited about the match or responded positively to their wrestling dynamics. The Usos, Jimmy and Jey, being a tag team, garner significant appreciation from fans, who evidently enjoy their performances together.

Contrastingly, Joe Gacy and Joe Coffey received more negative comments when wrestling each other. This suggests that WWE might consider avoiding matchups between them or, alternatively, analyze fan feedback to enhance their characters or storylines.

In the case of Cody Rhodes and Roman Reigns, fans express more positive sentiments. While the planned match between Cody Rhodes and Roman Reigns has been anticipated for a long time, a thorough analysis of fan comments is needed to understand their sentiments more precisely. Despite the positive sentiment, the pair isn’t garnering as much discussion as desired given they are two of the most popular wrestlers in the company. To maintain relevance, WWE should consider crafting a compelling storyline that generates more fan discussions about them.

It’s worth noting that individuals like Adam Pearce and Michael Cole, who hold managerial or commentary roles, respectively, are intentionally excluded from this analysis, as the primary focus is on wrestlers.

In order to understand the sentiment of the fans towards the most popular male and female wrestlers, I conducted a sentiment analysis on comments related to some of the most popular wrestlers. The examination of sentiment counts reveals that LA Knight has received a favorable reception, characterized by a well-balanced proportion of positive and negative comments. In contrast, Sami Zayn stands out with a predominantly positive sentiment, signifying widespread positivity among fans. Luke Gallows, however, experiences a more varied reception, with a noteworthy presence of both positive and negative sentiments. Lastly, although Charlotte Flair has a lower overall comment count, the sentiment analysis reflects a consistently positive perception, indicating a generally favorable opinion among fans.

Most Promising talent:

The positive to negative sentiment ratio of wrestlers provides compelling insights into the online reception of lesser talked about performers. Standout figures like Nick Aldis boast an impressive positive-to-negative sentiment ratio of 15.0, indicating a remarkably favorable opinion among fans. A closer examination of fan comments would help us determine if Nick Aldis should remain as the general manager of WWE’s show, Smackdown or if he should transition to a full-time wrestling role.

Liv Morgan and Becky Lynch follow suit with ratios of 10.0 and 9.0, respectively, underscoring their robust positive sentiment. Particularly noteworthy is Liv Morgan, who, despite recent absence due to injury, garners substantial positivity. This suggests an opportune moment for WWE to reintroduce her as a surprise entrant in their upcoming event, The Royal Rumble, potentially capitalizing on her overwhelmingly positive reception.

Finn Bálor maintains a commendable ratio of 8.5, signifying a positive fan response, while Shinsuke Nakamura’s ratio of 6.0 reflects a relatively positive sentiment. This analysis serves as a strategic cue for WWE, indicating potential storyline advancements for these wrestlers given their high positive-to-negative sentiment ratios. Notably, this segment of the analysis holds immense value, shedding light on wrestlers with fewer mentions but elevated positive sentiment ratios, suggesting untapped potential and fan admiration that WWE could leverage for enhanced utilization in future storylines.

Snowflake

Conducting the analysis on Snowflake was a section of the project that I was particularly excited about as I had the opportunity to learn and explore a new platform. Despite obtaining certification on Snowflake and successfully constructing databases with meticulously organized schemas and tables to ensure data integrity, I encountered challenges in crafting accurate SQL queries. Unfortunately, the output from these queries proved perplexing, yielding disproportionately high results that lacked coherence.

Since I’ve lost access to my Snowflake account, I’ve decided to document my initial code and analysis here until I’m able to retrieve my account and correct the code to get accurate results. This also allows me the prospect to revisit and refine my approach, gaining insights from this experience. While this section may not contribute to the current analysis, I welcome any feedback or suggestions you may have, which can be shared in the comments section.

I organized a WWE database featuring three distinct schemas: Comments, WWE_Authors, and WWE_Info. The Comments schema specifically stored comments sourced from Wrestletalk’s Comments_Sentiments table, while comprehensive author details were put in the Authors table, complete with Author_ID and Author_Name columns. The goal of following this was to establish clarity in author-comment relationships, laying the foundation for seamless scalability in the future. Furthermore, the WWE_Info schema encapsulated the WWE_INFO table and WWE_WRESTLER_ATTRIBUTES, capturing the intricacies of words associated with wrestlers.

WWE Database on Snowflake
Wrestler mentions
Wrestler mentions

In my analysis, I delved into wrestler information, uncovering Xia Li as the most mentioned female wrestler with 3981 mentions. On the Men’s side, LA Knight took the lead with 2144 mentions. Interestingly, Xia Li and LA Knight not only garnered attention but also received substantial positive feedback (2418 and 1265, respectively) along with a notable share of negative comments (1090 and 583). These insights underscore the significant impact and popularity of Xia Li and LA Knight in the wrestling community.

Wrestler sentiments based on the comments
Wrestler sentiments based on the comments

In my analysis, it’s evident that Xia Li and LA Knight have attracted the most positive comments, with 2418 and 1265 respectively, and the most negative comments, with 1090 and 583 respectively. This could be attributed to their popularity, considering they received the highest number of mentions.

Positive to Negative Sentiment Ratio
Positive to Negative Sentiment Ratio

However, when examining the positive-to-negative sentiment ratio, Shinsuke Nakamura from the Men’s Division (10.5:1) and Alexa Bliss from the Women’s Division (6.6:1) stood out. Their high ratios suggest potential for engaging storylines, highlighting WWE’s opportunity to feature them more prominently.

Creating the attribute table
Attribute association
Attribute association

I made a list of adjectives that are most attributed to a WWE wrestler. The analysis revealed that Xia Li is positively associated with “Determined” and “Dynamic,” hinting at a potential engaging underdog storyline. On the flip side, her negative association with “Loyal” suggests an intriguing storyline involving broken trust, offering viewers a fresh perspective.

For LA Knight, standout positive attributes include “Wise” and “Epic,” indicating a strategic emphasis on refining his mic skills to convey wisdom and incorporating memorable catchphrases. However, it’s worth noting that LA Knight is negatively associated with “Youthful,” aligning with his age, being over 40. To maintain a positive public image, WWE’s communication strategy should delicately navigate discussions about his age, focusing on his strengths and persona.

Finally, Triple H garners the most positive association with “Wise,” indicating widespread perception of his adept management of the company and wrestlers. This emphasizes the belief that he excels in crafting intelligent storylines. The negligible number of words negatively linked to Triple H underscores his overwhelmingly positive impact within the wrestling community.

In essence, my comprehensive analysis provides actionable insights into individual wrestlers’ strengths, potential storylines, and audience perceptions, guiding strategic decisions for WWE.

Recommendations for The WWE

Cody Rhodes for the Win: WWE’s plan for Cody Rhodes conquering Roman Reigns’ undefeated title reign and becoming the face of the company resonates positively among the fans. It is advisable to adhere to this plan and actively promote Cody Rhodes as the future cornerstone of WWE.

North Carolina as a Major WWE Event Hub: With a significant number of wrestlers hailing from North Carolina, and considering it’s Vince McMahon’s home state, hosting a major WWE PPV like Summerslam or Wrestlemania in North Carolina would be fitting. Aligning a major storyline with the home state’s favorites could add a compelling layer to the event.

Sami Zayn and LA Knight Momentum: Given the substantial popularity of Sami Zayn and LA Knight, WWE should continue supporting and building hype around them. Consideration should be given to elevating their status, possibly by securing titles such as the United States Championship or the Intercontinental Championship. If the positive audience response persists, gradual progression towards main event title contention would be warranted.

Luke Gallows, the Underdog?: Despite mixed sentiment comments, Luke Gallows has garnered notable positivity. WWE should conduct a deeper analysis to understand fan preferences and capitalize on these aspects. There’s potential to include him in storylines that align with what fans appreciate about him, providing an opportunity for further utilization.

Harnessing High Potential Wrestlers: Wrestlers with impressive positive-to-negative ratios like Liv Morgan, Finn Balor, and Shinsuke Nakamura signify strong fan regard. WWE should strategically push these talents into major storylines to bolster their characters and establish them as formidable forces. Additionally, closely monitoring Nick Aldis and evaluating fan responses could pave the way for a comeback, adding intrigue to in-ring action.

Next Steps:

This analysis serves as a foundation for a more intricate examination. The subsequent steps could involve exploring the attributes associated with each wrestler and determining whether these attributes are perceived positively or negatively by fans. This deeper understanding would aid scriptwriters in crafting characters that align with fan preferences, it will allow them to understand if the characters should be presented as either heroes or villains.

Beyond WWE, this project holds substantial potential for scalability across various reality shows like The Bachelor, The Traitor, and Hell’s Kitchen. Applying similar sentiment analysis techniques to understand viewer preferences could revolutionize contestant selection processes. For instance, knowing what aspects viewers enjoy about shows could influence contestant choices in alignment with audience preferences. The versatility of this approach opens doors for applications in diverse reality shows, offering boundless opportunities for refinement and adaptation.

Weblinks:

WrestleTalk: https://www.youtube.com/@WrestleTalk

The Smackdown Hotel: https://www.thesmackdownhotel.com/wrestlers/#sort=name&sortdir=asc&attr.ct39.value=wwe&page=1

Github: https://github.com/alisakhikhan/Unraveling-WWE-Storylines-and-Fan-Pulse-through-Data-Driven-Insights

--

--

Ali Sakhi Khan
0 Followers

Follow me as I venture into the captivating realm of Data Science, a newfound realm that has recently captivated my interest