NBA and Its Hidden Social Data

Holden Bridge
9 min readDec 5, 2018

Holden Bridge, Tyler Brosius, Charlie Livaudais

NBA Twitter

NBA superstars are larger than life characters and their following extends beyond the court. Twitter has provided professional athletes with an outlet to talk trash, expresses their political opinions, and grow their fanbase by enormous numbers. We wanted to take a deeper look at the following.

  1. Explore the relationship between salary, twitter followers, and other NBA statistics
  2. Can we predict the Salary and number of Twitter followers for a given player using machine learning?
  3. How NBA players are currently using Twitter

Data

We got our data from Kaggle’s public data library. Our data set was title Social Power NBA and can be found here.

Top 10 Twitter Followers

Once we found the players with the most twitter followers, we wanted to see if those names also appeared on the highest paid players list.

Top 10 Salary Players

Just from the top 10 of each, six names appeared on both lists. We then decided to take a further look to see if we this correlation persisted across the entire NBA.

Predicting Salary and Twitter Followers using Linear Regression:

The goal here is to use linear regression to create an interactive prediction model

The first step in our process was determining what statistics we wanted to include in our model. We made scatterplots for many variables with the hopes of finding significant positive relationships.

EXPLORATORY DATA ANALYSIS:

  • When first thinking about what makes a player get paid among the highest players in the league, we first considered points, assists, rebounds and age as significant statistics affecting a player’s salary. We then made scatterplots for these variables. All of them had a strong positive correlation that can be seen below.

1. Points Per Game

Salary vs. Points

2. Assists Per Game

Salary vs. Assists

3. Rebounds Per Game

Salary vs. Rebounds

4. Age of the Player

Salary vs. Age

Using these results we can build a solid model to predict a player’s salary based on just these 4 very important variables. We imported and used Python’s statsmodels to build our linear regression model seen below.

Linear Regression Model For Salary

As you see all of the variables in the model are significant with P-values of 0, with Assists being the only variable that could be questioned with a higher P-Value of 0.092. However, considering the popularity and importance of Assists as a stat in the NBA we decided to keep it in our model.

How to interpret these numbers:

Each variable(age, assists, rebounds, points) is given a coefficient and that number is multiplied by its number inputing for that variable. For example, player #1 is 25 years old, and averages 18 points, 6 assists, and 5.3 rebounds per game. His/Her salary is then calculated as follows:

Salary = (25 * 0.4821) + (18 * 0.4857) + (6 * 0.3045) + (5.3 * 0.4655)- 12.4806

WHERE DOES TWITTER COME INTO THIS?

Originally we wanted to use the same statistics we used to predict a player’s salary to also predict their Twitter followers. However, this was impossible considering that none of the variables were significant predictors of a player’s Twitter followers. We did however make one discovery…

Twitter Followers vs. Salary

A player’s Twitter Followers had a positive correlation to that player’s salary

Linear Regression Model for Twitter Followers

Calculating the Twitter Followers:

  1. We use have the user input values for the 4 variables Age, Points, Assists, and Rebounds.
  2. We use our first model to calculate that players salary
  3. We then that calculated salary as the variable in our second model

Example cont. : Age- 25, Points- 18, Assists- 6, Rebounds- 5.3

Estimated Salary = $12.58 million

Use Model #2- Twitter Followers = (12.58 * 0.2195)- 0.9612

Estimated Twitter Followers= 1.80 million

Predicting a NBA player’s twitter following is very difficult as evident by the low R-Squared value. Our plan is to use the player’s stats such as points, assists, rebounds, and age in order to calculate their salary using our first model. Then using our second model we will convert that calculated salary into an estimation for a player’s twitter followers. We created a tool that allows a user to input a hypothetical NBA players stats and it will calculate that players salary and Twitter Followers.

Click Here to see the NBA Twitter Followers Calculator

Trying to calculate a player’s Twitter followers provided us with many challenges. But, could we use machine learning to better our solution?

Predicting Salary and Twitter Followers using Machine Learning:

Before starting to use any machine learning models to predict salaries and twitter followers, we created a correlation heat map so it would be help us to identify which variables we should drop, and which we should keep when making the model.

Heat Map

Machine Learning for Salary

From the information we gathered from the heat map and model testing, we made the decision to drop all of the statistics except for those that are included in the image below when trying to predict salary.

An example of the cleaned data set for machine learning

These stats appeared to have a positive correlation with salary, so it made sense to include them. All of the statistics we considered when doing linear regression are included here, but we included more to try to give the models more information. We also decided that the best way to predict the salaries of these players based on their stats was to make salary brackets with each bracket being a range of 5 million dollars. Without doing this, it would be virtually impossible to get any accurate predictions with the provided data set because we would be predicting exact salary values. This allowed us to see how our model worked for within a reasonable price range. After running different types of models, we found that the random forest classifier was consistently the most accurate. We ran the model multiple times, and it hovered between 50% and 60% accuracy almost every time. Although this is a significant improvement over just randomly guessing, there are a few reasons why it is difficult to further improve our model. Primarily, we have a limited data set that does not include all of the players in the NBA and was only one year of data, so the model was limited in its testing and training data. Another reason for the inaccuracy is that NBA players do not always perform based on their salaries. Rookie contracts last up to four years in the NBA, and if a player develops significantly in that time period, they could be greatly outperforming their current salary. Similarly, injuries could cause players with high salaries to underperform statistically. It is reasonable to say that our model predicts the expected salary of a player, which may not be the reality of the situation.

Machine Learning for Twitter Followers

Unfortunately, our data did was not conducive for using machine learning to predict twitter followers. Our data set did not provide us with enough players that had a high volume of twitter followers, and too many with a very low volume of followers. Similar to the salary machine learning, we had to put the players in brackets in order to predict followers; however, there was too much of a skew in the data set which caused the lowest bracket to contain almost all of the data points. Because of this, the feedback received did not give us any meaningful feedback for predicting twitter followers.

Patterns in Current NBA Twitter:

Since we found that it was difficult to predict twitter followers based on statistics, we wanted to analyze characteristics of NBA players’ Twitter accounts that had successfully acquired a large Twitter following. As a case study, we analyzed the accounts of the ten most followed current NBA players, who are shown below.

Top Ten Most Followed NBA Players with Followers in Tens of Millions

After determining the players that we wanted to analyze, we then tried to hypothesize what attracted people to follow these players. Surely NBA stardom is a preeminent factor, which we cannot really attach to a statistic, but that does not explain the complete Top 10 list. Dwight Howard is in the waning years of his career, and has struggled to stay relevant lately. Pau Gasol has become mostly a role player in the last few years, and while Blake Griffin still has great games, he has battled injuries most of the last few seasons. In addition, Carmelo Anthony is on his way out of the NBA. So what else could cause the popularity of these players? We pulled tweets from the last 1000 days for each player and decided to analyze the amount they tweet, how old their account is, how many friends they have, how many followers they have, and finally, run a sentiment analysis on each of their tweets and see if that had any bearing on how many followers they had

One of our thoughts was that a potential factor in how many followers the players had would be how active they actually were on Twitter. As you can see, the graph of quantity of tweets in the last 1000 days does generally mirror the followers trend, aside from Dwight Howard and Pau Gasol. However, as we discussed earlier, these are two of the main outliers in terms of level of play currently in the NBA, so their increased presence on Twitter could show why they are still highly followed.

Another factor we wanted to look at was how old the accounts were, because that gives them more time to amass followers. However, the results do not tell us much, other than that Pau Gasol has amassed a large following in less time than the others.

We also looked at Friend Count, which really only stands out for Dwight Howard, but again could explain his large following relative to his impact in actual games. More friends mean that he probably has more fan interactions, which spreads his popularity.

Lastly, we ran a sentiment analysis on the tweets in the last 1000 days for each of these players. We ran sentiment analysis on each of the tweets of the player and calculated the mean for the player’s tweets. We could not include emojis or links in the sentiment analysis, so it may be a bit flawed. There was little correlation shown between sentiment and followers in the sub group; however, each player had a positive sentiment score, which could give some indication outside of this subset.

Overall, it is hard to pin down the exact reason for the large Twitter following of these players, however the data gave use loads of valuable information and insight to the world of NBA Twitter. Each player except for Pau Gasol is near the top of the NBA pay scale, and they all are very active on Twitter. The stardom, coupled with accessibility and engagement with fans has given these players huge social media followings and the trend will only continue to grow in the future.

Link to our code for this project

--

--