Creating and Embedding Interactive Scatter Plots using Plotly Express

Michael Black
Geek Culture
Published in
4 min readJun 15, 2021

Going step-by-step using Plotly Express, Python, and Pandas to plot the NBA’s All-Time Leading Playoff Scorers.

Creating visually appealing graphs is an important part of conveying your message when working with data. Graphs with easily distinguishable points/lines, helpful legends, and interactive features like zooming and hovering (to name just two) are much more informative and user-friendly than those without.

While there are a few ways to do this, I have found one that I find really simple and powerful: Plotly Express. In this article, I will create (and embed) interactive graphs of the NBA’s All-Time Leading Playoff Scorers using Plotly, Python, and Pandas.

Let’s look at a couple of quick examples.

Step 1: Retrieving the Data

I am not going to show the code I used to call the API and create my DataFrame in this article, but I highly encourage you to visit my GitHub repository and view the notebook yourself. It is commented out and easy to follow.

You can also download the Excel file containing all of the NBA Playoff data I use in this article and then follow along with the code provided step-by-step.

Step 2: Loading and Organizing Data

Importing our libraries:

Loading our Excel file into a Pandas DataFrame:

Creating our DataFrame and renaming column ‘0’: ‘Player Name’.
Interactive DataTable for all players who have played in the NBA postseason

I also want to add a fourth column — ‘Playoff PPG’ (Points Per Game):

Adding column ‘Playoff PPG’
DataTable sorted by ‘Playoff Points Scored’

Step 3: Time to Graph

Use Plotly to create a scatter plot:

Creating a simple Plotly scatter plot
  • data_frame — set equal to my DataFrame: ‘playoff_data_DF’
  • x — values for x-axis: ‘Playoff Games Played’. Column in DataFrame.
  • y — value for the y-axis: ‘Playoff Points Scored’. Column in DataFrame.
  • title — setting a title for the plot.
https://datapane.com/u/wmblack23/reports/nba-playoff-scoring-all-players

Not bad!

By hovering over any point on the graph, we can see the number of playoff points and games corresponding to the point. This is a useful feature — but would be much more useful if we could also see the NAME of the player when we hover over a point. *Hint, hint.*

We can also click and drag (creating a rectangle) on any point of the graph to zoom in. This is especially helpful on a graph such as this with so many points clustered together.

Adding Visual Features

Adding visual features to our scatter plot
  • Size — we specify the column ‘Playoff Points Scored’ to tell our graph to make the points larger for players with higher totals of playoff points and smaller for those with less.
  • Template — Plotly has a handful of template options to design your plot (included in GitHub repo). I prefer a white background.
  • Color — We specify ‘Playoff PPG’ for our color variable. This means we will give similar colors to points that have similar ‘Playoff PPG’ values.
  • Size_max — the maximum size of any point on the plot.
  • Hover_name — now, when we hover over a point we will see the ‘Player Name’ as the first value.
  • Hover_data — including ‘Playoff PPG’ in our hover_data so that we can see it along with ‘Player Name’ and our x and y-values.
  • I chose ‘Jet’ as my color scale. More here.
https://datapane.com/u/wmblack23/reports/nba-playoff-scoring-adding-features

By setting color equal to ‘PPG’ and size equal to ‘Total Points’, we can make easy observations without even needing to use our zoom and hover features.

Top 200 — Additional Added Features

Adding statistical features to our scatter plot
  • Trendline — ‘ols’: Ordinary Least Squares regression line. This allows us to make further observations about our data. Which players are scoring above, below, or at the expected amount of points given the number of playoff games they have played? Try hovering over the regression line as well!
  • Add_hline — adding a horizontal line representing the median number of playoff points scored by the Top 200 scorers.
  • Add_vline — adding a vertical line representing the median number of playoff games played by the Top 200 scores.
https://datapane.com/u/wmblack23/reports/top200-regression-and-more

The two ‘add_line’ features, like the trendline above, allow us to make additional observations about the players:

  • Dennis Rodman, in the bottom-right quadrant, has played in more playoff games than the average player on this list but only averaged 6.4 PPG, so he comes in well below the median line for points scored.
  • Likewise, in the upper-left, Allen Iverson has scored more than the average amount of points compared to the Top 200 (29.73 PPG) while playing a significantly less-than-average number of games, so he comes in at the left of the median line for games played.

Those in the top-right quadrant, players who appeared and produced at a higher-than-average rate compared to the rest of the Top 200, are all current or future top-tier NBA greats; especially those who also fall above the regression line.

In fact, if we want to take a closer look at who makes up this super-elite group of NBA Playoff scorers, we can do so with the following:

The most elite group of playoff performers in NBA history

These are the players who played in more playoff games (> 118) and scored more playoff points (> 1959) than the averages of the Top 200 playoff scorers ever AND outscored the prediction of our OLS trendline.

If you are interested in learning how to embed your tables and visualizations using DataPane as I have done here, I would encourage you to check out this helpful article.

Embedding with DataPane is a really handy and easy-to-use feature that helps make your data more accessible to your readers.

Thank you for reading!

Check out the GitHub link at the top of the article for more.

Michael Black

--

--