A look at the NBA: Who are the Top Candidates to win League MVP as we enter the back half of the 2024 NBA Season?

Shea Versey
INST414: Data Science Techniques
5 min readFeb 10, 2024

Which NBA players are likely to finish as finalists or as the league winner of the 2024 NBA MVP award? This is a question I believe can be answered by data. Most NBA teams have played roughly 50 games through an 82 game season. Using data, particularly per game statistics on each player in the league, we can see insights on the possibility of future winners based on just over half of the NBA season. The specific stakeholder asking this question is a sports bettor. Many people, including myself, enjoy sports betting and know there is money to be made by placing smart and educated bets. For people interested in placing bets on NBA futures, this will help them decide what player to bet on to win the MVP awards, as well as some other less coveted awards such as the scoring title or the assists leader.

The data best suited to answer this question is NBA player and NBA team statistical data. Player data should include all stats from the box score especially the most important stats; points per game, rebounds per game, assists per game. These stats are relevant because points, rebounds, and assists are typically the most important metric that NBA voters use for voting on award winners for MVP. Team stats don’t need to be as specific, a win-loss record would suffice because typically the only team stat that has an impact on the MVP award, is the overall record of each candidate's team. Ranking players based on these data metrics will allow us to make educated predictions on what player have the best shot at winning certain awards.

I was able to access two data sets from Kaggle that seems to fit my needs for analyzing some player data. By using Kaggle I searched their datasets for “NBA stats” and there were plenty available. The one I selected is called “NBA Stats (1947-present)” and it contains twenty-one csv files ranging from basic to advanced stats for players and teams in the NBA. For this Exploratory analysis I am using the following csv file; ‘Player Per Game.csv’. Unfortunately, despite having for csv files of data on team statistics none of them offered any data the current win loss record of the NBA teams. However, while still using Kaggle I found a different dataset which used simulations to make predictions on the final standings of the NBA 2024 regular season. These files are Eastern Conference and Western Conference.

Taking a look at the player stats per game CSV file , there are lots of columns that are unnecessary to the analysis I am forming. I will discuss later how I cleaned this data and narrowed down the data set to match the needs of this analysis. as I stated before, the most important individual stats for an NBA player in terms of MVP voting are points per game, assists per game, and rebounds per game. At times defensive stats such as steals per game and blocks per game will become voting factors as well. Using the player per game statistics I will rank the data set multiple times by each of the different statistical categories I just mentioned. Typically MVP winners In modern NBA history are players ranking in the top 10 in points per game considering it is the most important stat in the entire sport. Therefore, in this analysis points per game is the most important factor, however, being first in points per game is by no means a guarantee to win the MVP award. Often players who score amongst the top 10 in points per game but also rank highly among rebounds, assists, or defensive stats will be on the short list for MVP finalists in that season. Lastly, I will take a look at the projected team record data to determine which teams are likely to make the playoffs. The reason this is important is because teams who don’t make the playoffs haven’t had one of their players win since Kareem Abdul-Jabbar in 1976. Therefore any players on teams that are expected to miss the playoffs will be considered to have practically zero chance at winning the award. It is important to note that this only applies to the MVP award, as the other awards such as the scoring title depend solely on the player with the highest points per game that season, regardless of team success. It is important to include all these factors and data points into consideration because the award can seem objective at times and is surely not based on any one statistic. Every year a select group of voters include all of this data to make their decisions on who to vote for, which means we need to do the same when deciding which player to bet our money on.

To clean up this data I narrowed down the columns in the data sets to only include data necessary to answering the question. Excluding player name, position, and team I removed all columns except points per game, assists per game, rebounds per game, steals per game, and rebounds per game. I then filtered the dataset to only include data from this season, 2024. This dataset includes data since the 40’s so I needed to remove all that because I only need data pertaining to the current season. This was a very clean dataset that is updated regularly. I didn’t run into any bugs when cleaning or manipulating the data and I expect others to encounter issues either. Same goes for the team standing data file. There were to files one for the eastern conference and one for the western conference. for both data files I kept just 4 columns; Conference, ‘W’, ‘L’, and ‘W/L%’.

This shows compact tables of pts, rbs, and asts leaders based on a previously filtered data frame which includes only the top 20 players in points per game
This shows the 7 teams projected to miss the playoffs from each conference in the 2024 NBA season starting from most to least likely to miss out.

As you can see from the figures above there are certain players that find themselves in just the top 5 of multiple statistical categories. Right away you see names like Luka Doncic, Joel Embiid, and Nikola Jokic as top candidates for MVP based on the tables.

There are several limitations to this data analysis. One major limitation is that there is a new rule in the NBA that to be eligible for end of the year awards players must play at least 65 games in the regular season. Because we cant predict injury or other factors that could prevent players from meeting that threshold, it limits our ability to be completely accurate. Another issue that could impact the analysis is that I didn't run the standing simulation myself which predicts the teams that will make the playoffs. I do believe that it will be quite accurate considering 60% of the season has already passed, but unexpected things can happen in sports and playoff teams will be changing up until the final day of the regular season, which of course limits our ability to predict who voters are going to choose 3 months from now.

Link to my GitHub Repository: https://github.com/skversey/INST414-Module-Assignment-1

--

--