Data Visualization with FRIENDS

Erika Ronquillo
Analytics Vidhya

--

Introduction

Those who have known me for a long time know that I love watching Friends. Not only is it funny and entertaining, but there is also something comforting about having these characters that I know so well appear on my TV screen. I often find myself reciting the lines along with the show, which sometimes gives others in the room a quick laugh.

When I discovered the dataset for the Friends TV show in the Tidy Tuesday challenge on Github, I was super excited to explore the data to see what would come up. Using R, I was able to find some interesting insights while also making some cool visualizations with ggplot2. Let’s dive in.

The Characters

While each of the main characters seemed to have a fairly equal amount of screen time, I was curious to see if being in more scenes lead to a higher number of lines. Below is a scatterplot showing each character’s scene count against their line count.

image by quilloanalytics

As we can see, more scenes do not necessarily lead to more lines. Chandler was in the most number of episodes but did not have the highest number of lines. Also, Rachel has the most lines but only had the second-highest number of scenes. Interestingly, Phoebe had a significantly lower number of scenes and lines compared to the rest of the cast.

While we love all the main characters, the show would not have been the same without the many awesome guest stars that appeared throughout the series. Some guest stars returned for several episodes while others showed up only once. Similar to my previous question, I wondered if a higher number of episodes lead to a higher number of lines. Below is another scatterplot to analyze this question.

image by quilloanalytics

As we can see above, Gunther was in the most episodes but also had the second-lowest number of lines. The adorable Mike had the highest number of lines amongst the guest stars but was in about the amount of episodes as the rest of the group.

Since it is a comedy, we can assume that the scripts contained mostly positive sentiments. To look into this further, I tokenized each of the lines and broke them into individual words. Those words were then compared against the list of words from “bing” to determine if they had either a positive or negative sentiment.

image by quilloanalytics

Looks like all of the characters had positive sentiments, with Joey having the biggest range. Rachel had the highest average sentiment, while Monica had the lowest average sentiment.

The Episodes

Before on-demand streaming became an everyday thing, people had to actually be at home at the same time a new episode aired on TV. Below is a line chart showing the number of views in the United States during the entire duration of the series.

image by quilloanalytics

The show had an average of 25 million views throughout the series. There were two events where the number of views reached over 50 million. The first was when two back-to-back episodes aired after the Super Bowl XXX in 1996. The second was the series finale which finally ended the anticipation of the will they, won’t storyline of Ross and Rachel.

Lastly, I was curious to see if there was any relationship between and the number of views and the IMDB rating.

image by quilloanalytics

The show had pretty high ratings throughout the series. While there does not seem to be a pattern in ratings by season, most of the episodes seem to hover between a rating of 8 and 9. The two highest-rated episodes were the series finale, as well as the hilarious episode where most of the friends find out about Monica and Chandler’s relationship.

Conclusion

Even though I’ve seen the show many, many times, I was still able to learn something new from the data.

In case you are interested in looking at the code behind the charts, you can access the entire rmd file here: https://github.com/erikar39/Friends-EDA

References

--

--