How does the network of YouTube music videos drive attention?

Published in

ACM CSCW

5 min readOct 30, 2019

This blog summarizes our CSCW 2019 paper “Estimating Attention Flow in Online Video Networks” by Siqi Wu, Marian-Andrei Rizoiu, and Lexing Xie.

Many online platforms present algorithmic suggestions to help users explore the enormous content space. Much effort has been on generating more accurate recommendations, but relatively little is said about the effects of recommender systems on overall attention, such as their effects on item popularity ranking, the estimated strength of item-to-item links, and global patterns on the attention gained due to being recommended. This work aims to answer such questions by presenting a large-scale study on the video network built by the YouTube recommender systems.

Recommender systems rank among the top for driving traffic on YouTube, if not the most. For example, a report reveals that 70 percent of watch time on YouTube is driven by the recommender systems. We also find evidence of view counts incited by the recommender systems.

Fig. 1 visualizes the videos that are within 3 hops distance from artist Adele’s video “Hello” by tracking the recommended list shown on the right-hand panel of the video webpage. The node color corresponds to the cumulative views for that video, and the node size corresponds to the in-degree. We observe that popular videos are in the center and they link to each other.

Fig. 1. Starting from video “Adele-Hello”, which videos can be reached within 3 hops.

Fig. 2 shows the recommendation network for 6 videos from Adele. It is a directed network and the directions imply how users can navigate between videos by following the recommendation links. Conceptually, the links act as conduits for user attention to flow through. When the video “Hello” was released, it broke the YouTube debut record by attracting 28M views in the first 24 hours. Simultaneously, we observe a traffic spike in all of her other videos, even in 3 videos that were not directly pointed by “Hello”.

Fig. 2. Observing the effects of recommendation network on video popularity.

The recommendation network on YouTube

We curate the VEVO music graph dataset, which contains 60,740 music videos from the complete set of VEVO artists who are active in 6 major English-speaking countries. VEVO is an ecosystem of its own that attracts tremendous attention — 94 of all-time top 100 most viewed videos on YouTube are music, and 64 of which are distributed via VEVO. For each video, we collect its metadata (e.g., title, description, uploader), its view count time series, and its recommendation relations with other videos for 63 days.

One notable detail is that we do not track users, as YouTube heavily personalizes the recommendation results. Instead, we track the relevant list returned by YouTube Data API. Unlike the ever-changing recommended list shown on the video webpage, the relevant list is non-personalized and consistent for all requests. For completeness, we also present a quantitative measure on the relation between relevant list and recommended list (detailed in the paper). We find videos that appear at an upper position on the relevant list are more likely to appear at an upper position on the recommended list, and vice versa.

Measuring the recommendation network

We measure the recommendation network by using the bow-tie structure. The bow-tie structure categorizes a network into 5 components. Among them, 3 principal components are (a) the largest strongly connected component (LSCC); (b) IN component, which contains nodes pointing to LSCC but not reachable from the nodes in LSCC; (c) OUT component, which contains nodes that can be reached by LSCC but not pointing back to LSCC.

Fig. 3(left) visualizes the bow-tie structure of our VEVO network. If we resize each component by the daily view counts for all the videos inside, we can derive the attention bow-tie, shown in Fig. 3(right). The bow-tie structure is a good conceptual description, because the directed edges exist only from the IN to the LSCC (similarly, LSCC to OUT) but not the other way around, indicating that the attention in the network can only flow in a single direction from the IN to the LSCC, then to the OUT. We observe that the core component LSCC (23.1% of the videos) occupies most of the attention (82.6% of the views).

Fig. 3. (left) The bow-tie structure of the VEVO network. (right) The attention bow-tie.

Estimating attention flow in the recommendation network

We also investigate that using the network information to predict video popularity. One important observation is that most viewing dynamics exhibit a 7-day seasonality, therefore we use seasonal naive, autoregressive (AR), and RNN with LSTM units as baselines. Built on top of the AR model, we propose the ARNet model by explicitly assigning a weight to each link. The assigned weight is constrained between 0 and 1, which can be interpreted as the probability that a generic user clicks on the target video, or the estimated attention flow over the recommendation link. Our experiments show that the ARNet model consistently outperforms other baselines.

The ARNet model allows us to estimate the network contribution for both videos and artists. We study the question that which artists are affected most if the recommender systems were to be turned off? We compute the popularity percentiles for each artist by using (a) the observed views and (b) estimated views but without network contribution. We visualize the percentile change in Fig. 4(left). The outliers (red circles) denote the artists who gain much more popularity through the network among their cohort. It shows that the network can help some artists massively increase their relative popularity (as high as 26%, American rapper J-Kwon in 4th bin). We take a closer look at the outliers by scattering them in Fig. 4(right). We notice a group of Indie artists and Hip hop artists rely more on the recommendation network to become popular.

Fig. 4. (left) Boxplot of artists’ popularity percentile changes when adding the recommendation network. (right) A closer look of artists identified as outliers in (left).

Concluding remarks

This work presents a large-scale study on the video network built by the YouTube recommender systems. We curate a new dataset that consists of 60,740 VEVO videos, representing some of the most popular music clips and artists. We construct the depersonalized YouTube recommendation network. We present measurements on the global component structure and temporal persistence of links. A model that leverages the network information for predicting video popularity is proposed, which achieves superior results over other baselines. It also allows us to estimate the amount of attention flow over each recommendation link. To the best of our knowledge, this is the first work that links the video recommendation network structure to the attention consumption for the videos in it.

This blog is posted by Siqi Wu, with edits from Lexing Xie. Siqi Wu is a PhD candidate in the Computational Media Lab of Australian National University.

Paper citation: Siqi Wu, Marian-Andrei Rizoiu, and Lexing Xie. 2019. Estimating Attention Flow in Online Video Networks. Proceedings of the ACM on Human-Computer Interaction. 3, CSCW, Article 183 (November 2019), 25 pages. https://doi.org/10.1145/3359285 [paper|code|data]

How does the network of YouTube music videos drive attention?

The recommendation network on YouTube

Measuring the recommendation network

Estimating attention flow in the recommendation network

Concluding remarks

Written by Siqi Wu