A Data-Driven Look at MKBHD’s YouTube Success

Discover what analytics reveal about content trends and audience engagement in tech video blogging

Vardaan Kakar
6 min readJun 18, 2024

In the world of tech YouTube, Marques Brownlee, also known as MKBHD, stands out as a leading authority. His engaging videos on the latest gadgets and tech trends have garnered millions of subscribers. But what goes beyond the sleek visuals and insightful commentary? Here, I delve into the world of MKBHD’s content through a data science lens, analysing his YouTube video transcripts to uncover interesting patterns and trends. In this article, I’ll share my findings, offering a unique perspective on MKBHD’s content through the years.

Source: MKBHD

Intro

MKBHD primarily dispenses his reviews and critiques through his YouTube channel, where he has amassed over 19 million followers, since his first tech post, way back in 2008.
He has now posted over 1.6k videos, which have more than 4.3 billion views together.

Idea

YouTube provides transcripts for videos on the platform. These transcripts can be set by the YouTubers themselves or are auto-generated captions (which is often gibberish, but we make do with what we have).

Our goal is to analyse these transcripts and uncover underlying relationships in the data.

Overview

All available transcripts as well as stats of MKBHD’s videos were extracted using the YouTube Data v3 API, and preserved in a DataFrame. Following this, NLTK, SentenceTransformers, SpaCy, UMAP, and HDBSCAN were employed to analyse these transcripts for multiple use cases:

  1. Views vs Duration of Video
  2. Clustering Videos Based on Topic
  3. Average Views/Likes for these Clusters
  4. Views vs Average Speech Speed
  5. Sentiment Analysis

Project

1. Views vs Duration of Video

It is often discussed that the YouTube algorithm promotes videos of duration around 5–10mins. The duration of videos on the channel are:

Views vs Video Duration

As can be seen, he has recently been uploading more videos in longer formats. This has not impacted his views as he has also grown in subscribers. On an average, videos posted in the last 2–3 years are performing better than the shorter videos uploaded previously.

We can thus conclude that content quality also plays a major role in garnering views.

2. Clustering Videos Based on Topic

MKBHD has videos ranging from phone reviews to camera reviews, and even a series on cars. I performed HDBSCAN on the embeddings for each transcript, and came up with 14 groups, based on the parameters specified.

Video Clustering Based on Transcripts

The navy blue dots on the bottom left depict videos about car reviews, whereas the orange cluster on the top left is about audio equipment, and the orangish-red group towards the bottom right represents Android features.

I merged all transcripts of the same cluster together, to form 14 super transcripts, on which I ran TFIDF algorithm to extract keywords for each group:

-1: nitrogen, ions, hangout, electrolyte
0: vaporware, regenerative, tow, armrest, interiors
1: easel, haswell, socket, overclocking, acer
2: xlr, teleporting, zeros, zippers, evf
3: speakerphone, electrocardiogram, flux, shower
4: polls, upsets, totals, semifinals, fakery
5: postures, accordion, watertight
6: wasteful, lappable, demensity, splotching
7: jailbreaking, peppermint, nexus5, lime, jailbreak
8: 8a, unblur, bathtub, 4a
9: coiled, bluebuds, mpow, sweatproof, audiophile, m70x
10: 10r, 11r, ingress, graphene, meteorite
11: qi2, 1plus, hamburger, emphasized
12: ltea, smudges, ip58, mediocre, metering, 50x

Based on the video clusters as well as keywords, here are the topics I could come up with, for each cluster:

-1: Outliers (What's on my phone (20xx), DALL-E, 5G)
0: Car reviews
1: MacBooks
2: Tech gear
3: Smart watches
4: Smartphone Cameras
5: Foldable phones
6: iPads
7: Android features and announcements
8: Google Pixel series
9: Wearable audio devices
10: iPhones and iOS
11: OnePlus and oxygenOS
12: Android smartphones

I just realised that cluster 1 is about MacBooks, and also has a video titled ‘The #1 Most Overpriced Tech in 2023’, and the cluster no 12 about Android phones, has ‘mediocre’ as a keyword.

3. Average Views/Likes for these Clusters

Identifying the content your audience likes, and capitalising on it, is a skill most YouTubers would kill to have.

Avg Counts vs Topics

The scatter plot depicts Average View and Like counts for each cluster. As we can see, the two clusters standing out are nos 5 (Foldable phones) and 10 (iPhones and iOS), followed by nos 0 (Car reviews), 6 (iPads), 8 (Google Pixel series) and 11 (OnePlus).

Thus we can conclude that viewers prefer watching advancements in mobile technology.

4. Views vs Average Speech Speed

YouTubers often experiment with how they present their content. They modulate the clarity of spoken words, speed of speaking, vocabulary etc. The graphs below show MKBHD’s Views vs Words Spoken Per Minute.

Views vs WPM
Views vs WPM through the years

Words Per Minute are calculated as number of individual words in the transcript divided by the total duration of spoken words in the video.

These graphs show that he has maintained his speech rate consistently at about 100–120 wpm, but has recently been experimenting with higher speeds, in the range of 190–200 wpm, which is significantly more than the usual.

We can conclude that speech rate does not significantly affect the views on this channel (but may affect the views of other channels, like Eminem, who has clocked speech reaching 450 wpm).

5. Sentiment Analysis

Being a tech reviewer, MKBHD has to have fair and unbiased opinions on the latest gadgets, along with providing his audience the necessary direction to make informed decisions.

Sentiment Analysis

Here, the lower the negative score, the lower is the usage of negative words on the channel. As is evident from the graphs, he maintains neutrality, and generally has positive reviews.
What’s curious is the fact that a recent video titled ‘The Worst Product I’ve Ever Reviewed… For Now’, about the Humane AI Pin sent the community into shock, as the review was one of the most negative ones he has provided in recent times. Marques even acknowledged the hope in the product and said he liked the future idea. The video has a negative score of 0.1, and a positive score of 0.26, which is pretty standard. This tells us that the video content itself may not be too harsh, but the netizens just came on too strong and blew it out of proportions.

Conclusion

Data is indeed one of the most desirable commodities in recent times. This has become even more evident in the business of influencing and content creation. Creators can work with previous data to come up with more efficient and engaging content, which can, in turn, result in higher reach and engagement. Content is a continuous process, and to be efficient and updated, data plays an extremely critical role.

Future Work

YouTube also has the ‘most replayed’ feature where you can view the most and least popular parts of the content.

This project focuses on the overall content and stats of videos, but it can be used as a base to also showcase similar features of these small parts of videos. This can in turn help YouTubers work on the consistent issues in the less popular parts, and implement of the qualities of the most replayed snippets in their content, at scale.

Other stats, like average view duration can also be extracted and pitted against these clusters, to analyse which topic viewers interact with, the most.

References

  1. https://www.youtube.com/watch?v=lXeNZeLSsgY&t=18s
  2. https://www.youtube.com/watch?v=qzKCEhYssAk&list=PLwm1cDL75rYMECvKZzSwUz2hwZNkuKO8T

Source code

P.S. — This project used around 1120 videos. Transcripts were not available for some videos, and they were hence, also not considered.

--

--