The Plan

It’s been a while. We were gaining momentum on our database and all of a sudden I fell off the writing train. Here’s why:

The NBA seemed to change their backend and nba_py could no longer support our data pull. I thought about how to continue, but hit a wall. I didn’t want to build a database or app that we couldn’t update and other project ideas- like building a nba-reference scraping package or daily fantasy basketball predictions- didn’t really interest me enough to build anything worth sharing.

A lot has changed since then. Our world is in quarantine and the NBA season is suspended (and rightfully so- go wash your hands!). Instead of squandering my time in isolation, I’m going to use it to write more and hopefully teaching you something about analytics and coding, while focusing on our common love of basketball. Also, we have a new source for pulling data.

Introducing NBA_API

NBA_api is similar to nba_py in the sense that it gives us access to vast amounts of NBA data from stats.nba.com in an easy to use python package. There is one major difference- IT WORKS!

Installing the package into your virtual environment is easy:

pip install nba_api

If you need a refresher on virtual environments, you can check out our tutorial here. Also, check out nba_api’s github and read/work through the examples- basics, getting stats, finding games, and play by play data.

Pulling Multiple Seasons with NBA_API

We won’t go through this code in detail because it’s pretty similar to the prior pulls that we performed from nba_py. A couple of notes:

The games dataframe contains dates that could potentially have NBA games. This is created from a script that takes a csv containing the start date and end date of each NBA season from 2000 through 2018 and returns each possible date. This saves us from attempting pulls for dates that we know have no regular season games. This script is included in our nba_api folder- get_game_dates.ipynb
After pulling all the data for a specific day, we update the pulled column in the games dataframe to “done” and save the dataframe. If we get an error, timeout, or lose our connection during the pull, we can rerun the entire script and only pull the data that we need.
You can see at the top of the script we add various, random sleep periods between pulls. We don’t want to overwhelm the NBA servers and we don’t want to get banned. Be nice people!
We have not pulled any data for the current season. This is your project so feel free to add that to the initial pull, but we’re going to write a new update script that we’ll automate to update our database later. We’ll use this to pull 2019–20 season data.

To use the script, download it from our github here, change the location to folders on your computer where you want to save your flat files, and run the files. If you need a refresher on how to use github, you can check out this tutorial we made earlier in the series. Anyway, here are a few images of the script to make sure you’re running the right file:

Header of our base pull file — Header of our Base Pull File

Full disclosure, there are probably better/faster ways to pull all of this data. If you don’t want team data, you can just pull game logs for each player and that should be faster, or there may be other ways to optimize. I just wanted something that would get us up and running. But leave comments below on ways to make this better!

Wrapping Up

Thanks for reading and following along. I promise we’ll get moving quickly now that we have a reliable data source. Next we’ll recreate our database, automate a script to download new data and update our database, and then we’ll jump into analyses, creating web-apps, and different ways of sharing this data with the world.

As always, smash that clap button to let me know you were here and leave comments on any issues you’re having or things you’d like me to cover.

The Plan

Introducing NBA_API

Pulling Multiple Seasons with NBA_API

Wrapping Up

Written by Dan Watson