“Trust the process?” How online sports communities are shaped by the offline context
This blog summarizes our CSCW 2018 paper “‘This is Why We Play’: Characterizing Online Fan Communities of the NBA Teams” by Jason Shuo Zhang, Chenhao Tan, and Qin Lv.
(Cross-posted to Towards Data Science)
Many online communities do not only exist in the virtual world but are also deeply embedded in the offline context and attract people with similar offline interest. It is an important research question to understand to what extent and how online communities relate to the offline context.
Professional sports provides an interesting case because these online fan communities, in a way, only exist as a result of offline sports teams and games. Such connections also highlight the necessity to combine multiple data sources to understand how fan behavior in online social media correlates with on-going events of the topic of their interests.
Here we share an interesting story on Reddit. On Feb 7th, 2018, the NBA team Phoenix Suns lost to the San Antonio Spurs by 48 points, which tied the worst loss in the franchise history. After the huge loss, of course, the Suns fans were very disappointed. However, instead of giving the team up or abandoning the team subreddit, they turned this sports subreddit into a science subreddit, starting to discuss the Sun in the solar system! There were many funny titles posted on that night, like this one: “Did you know, the Sun contains 99.86% of all the mass in the solar system?”
Fascinated by such user behavior in online fan communities, in our new CSCW paper, we construct a large-scale dataset from Reddit which combines 1.5M posts and 43M comments in NBA-related communities (30 team subreddits and /r/NBA) with offline statistics that document team performance in the NBA. We study the interaction between team performance and fan behavior both at the game level and at the season level.
User activity is highly connected with the structure of the NBA season.
First, we find that the structure of the NBA season drives user activity in NBA-related subreddits. If we measure user activity using the number of comments generated in /r/NBA by month, during every offseason (July-mid October), the user activity decreases sharply, as no games are played during this period. In the regular season (late October-next March), user activity increases steadily. The activity peaks in May and June as the championship games happen in these two months.
Online fan activity mirrors offline game playing.
An important feature of NBA-related subreddits is to support game-related discussion. In practice, each game has a game thread in the corresponding team subreddit. Fans are encouraged to make comments related to a game in its game thread. By comparing the average proportion of comments made in each team subreddit by the hour on the game day during the 2017 season (normalized based on a game’s starting hour), we observe that game time is the most active time in the team subreddits. The user activity begins to increase an hour before the match starts and peaks at the second hour. This trend is exactly aligned with a typical NBA game that lasts around 2.5 hours.
What do fans talk about?
Next, we use topic modeling to analyze the content of user comments in these NBA-related subreddits. The five topics with the highest weights can be summarized as: “personal opinion,” “game strategy,” “season prospects,” “future,” and “game stats.” We will return to this later to look at how team performance shapes the discussion.
How does team performance at the game level affect fan activity?
One may hypothesize that winning a game would trigger a higher level of activity in the team subreddit. However, we find it is not the case for all the teams. Our hierarchical regression analyses suggest that top team fans tend to be more active after losses and bottom team fans show the exact opposite trend. Using the top-3 and bottom-3 teams in the 2017 and 2016 NBA seasons, consistent patterns arise:
- In all these teams, the average number of comments on game days is significantly higher than non-game days;
- The average number of comments on losing days is higher than on winning days for all the top teams, but the result is reversed for the bottom teams.
The interaction between the top and bottom teams suggest that “surprise” may play a considerable factor. For instance, in the 2017 season, the average winning percentage of the top three teams is above 75%. Fans of the top teams may get used to their supported teams winning games, in which case losing a game becomes a surprising event for them. In comparison, the average winning percentage of the bottom three teams is below 30%. It is invigorating for these fans to watch their teams winning. The “surprise” brings extra excitement which can stimulate more comments in the corresponding game threads.
For the details of all the hierarchical regression analyses mentioned in this blog, please refer to our paper.
How does team performance at the season level relate to fan loyalty in team subreddits?
Our results suggest that team performance (estimated by the Elo Rating provided by FiveThirtyEight.com) has a statistically significant negative impact on both season-level user retention rate and monthly user retention rate. Again using the top-3 and bottom-3 teams in the 2017 and 2016 season, it is consistent that bottom teams have much higher user retention rates than top teams, both between seasons and between months.
The famous “bandwagon” phenomenon in professional sports may help explain this observation: some fans may “jump on the bandwagon” by starting to follow a sports team that is doing the best and reigning championships at the moment. As long as the team is starting to perform poorly, they will jump off the ship and hop on the bandwagon of a different team that is doing better. In comparison, terrible team performance can serve as a loyalty filter. After a period of poor performance, only the die-hard fans will stay active and optimistic. They would keep supporting the team no matter how bad they perform.
How does team performance at the season level affect topics of discussion in team subreddits?
Our results show that team performance also drives what fans talk about. “Season prospects” and “future” topics are highly correlated with team performance. Better teams have more discussions on “season prospects” and worse teams talk more about “future.” If we project the topic weights of “future” and “season prospect” for all 30 NBA teams in the 2017 and 2016 season, we can see that the top-3 teams are consistently in the lower right corner (high “season prospects,” low “future”), and the bottom-3 teams are all in the upper left corner (low “season prospects,” high “future”).
Our results echo an earlier finding in sports management: framing the future is an important strategy for fans of teams with poor performance to maintain a positive identity in the recent absence of success. Connecting with our former observation regarding team performance and fan loyalty, it may suggest that a sports team’s struggle can provide a great opportunity to develop a deep attachment with loyal fans. Certain fans may be willing to persevere with their supported team through almost anything, including years of defeat, to recognize themselves as die-hard fans. By doing this, they gain the feeling of reaping more affective significance among the fan community when the team becomes successful in the future.
Dataset and more interesting observations
The dataset of this paper is available at http://jasondarkblue.com/. There are also more interesting observations discussed in the paper. For example, we find that the team’s market value, average age, the number of all-star players, and playing style all play significant roles in fan activity.
And “this is why we play!”