Fitting it in: Adjusting Team Metrics for Schedule Strength

Co-Author: Elissa Lerner

While we wait to see how the NET nets out for the first time this Selection Sunday, one thing is certain: basketball heads will be chattering away about the difficulty of program schedules. You might have heard that it’s not whether you win or lose, it’s how you play the game. But that’s not entirely true — it also matters whom you play.

Intuitively, this makes sense: playing well against a strong team seems like it should matter more than playing well against a weak team. But when you hear about teams playing amazing defense, what does that really mean? Did that rating take the competition into statistically rigorous consideration? How might you figure out the significance of a team’s performance in a given season — that is, their performance in the context of all other team performance?

Let’s give it a shot.

How to adjust for schedule

A raw team ranking shows an aggregate calculation for a particular metric (e.g., offensive efficiency) over all teams over all available games in a season. But we wanted to show how a team’s rank in a given metric shifts when you account for their schedule of opponents. (After all, how highly should a team’s offensive efficiency be ranked if they only play the weakest defensive teams in the country?) To accomplish this kind of adjustment for schedule, we’d need to turn to BigQuery and Colab (if you’d like to follow along, open the Colab link and run in Playground mode, then click Connect in order to run the queries).

We started by querying for game-level data for all D-I games in the past few seasons. We computed pace, efficiency, and the Four Factors using the “d1_tm_game_stats” view we built, which covers team and opponent box score-level data, along with possession estimates, for each game.

Screenshot of BigQuery

We then ran a similar query to aggregate these metrics to the team season-level, and this became our base collection of team metrics for analysis. You can find these “raw” advanced stats elsewhere, like KenPom, Sports Reference, and TeamRankings.

From there, we exported a few CSVs and read them into Colab for public-sharing purposes (otherwise, we’d connect directly to BigQuery, as shown here). We used the pandas melt function to generate one row per-team per-stat, which would help us later on.

But we wanted to go further and get adjusted versions of each of the team stats. Our main idea behind “schedule adjustment” is that each stat of interest is a function of three things: a team’s ability, their opponent’s ability in a given game, and home-court advantage. Using a very loose model representation, you could think of it as:

game_stat ~ intercept + tm_rating + opp_rating + home_advantage + (error)

We took the stat value from every game in a season, which included matchups of various teams and opponents. We then created dummy variables for both the team and opponent fields, with each set to 1 if the team is involved in the matchup (as “tm” or “opp”), otherwise 0. We then fit a ridge regression model using scikit-learn and used the coefficients to get team/opponent (and home advantage) estimates — the regression automatically did the opponent and site adjustment for us.

For each stat, our procedure generated a team “rating” that accounts for the strength of the opponent on the opposite side of the ball in the same stat (e.g. tm_efg_pct is adjusted for opponent’s opp_efg_pct), as well as home advantage. This is a more thorough and statistically valid technique than your typical “stat vs opponent season average” adjustment that you often find in other such analysis.

Snippet of ridge regression in Colab

(We liked ridge regression for this work because it goes beyond ordinary least squares, and uses a penalty term on the size of the coefficients to help handle multicollinearity and “shrink” coefficients (particularly in small sample size cases) — a form of regularization that gives more sensible estimates. This procedure not only gives the team ratings that “best fit” the game results given the model above, without going too extreme, but it also gathers them in a “melted” data frame with one row per team-season per stat — much like our season-stat combination generated above.)

We joined the raw and adjusted tables together for each season-team-stat combo, standardized each stat, then ranked each team by each stat. All together, we wound up with three values for each type of metric: “stat” on regular scale, “rtg” on 0–100 scale, and “rk” as D-I ranking (once for raw, and once for adjusted) for each of these for each team-season.

Onto the analysis!

What your schedule says about you

Generally speaking, we’re interested in schedule-adjusted metrics, which you can scour through in this Data Studio dashboard. The findings align with what you’ve been hearing and seeing all season: Gonzaga and Virginia are dominating (at least in part) by virtue of their offensive efficiency, while Texas Tech, Virginia, and Duke are tough on defense. This makes sense — most season rankings adjust for schedule difficulty to some degree.

But when you isolate for schedule adjustments, you can see just how much of a difference the difficulty of a team’s schedule has on their overall metric rankings.

We generated an interactive plot of every team’s raw and adjusted version of each metric to show the impact of schedule adjustment. Using Plotly’s scatterplot functionality, along with team colors and names, you can hover over each point and see each school’s raw vs. adjusted ratings and ranks. Raw ratings are on the x-axis, adjusted ratings are on the y-axis. Up and right is better for each stat, even those where lower numbers are better (the axes get reversed for these).

Here’s net efficiency plotted (note: all plots show data from games through March 11):

Scatter plot of net efficiency, raw vs. adjusted

The gray line (which is y=x) shows where raw and adjusted ratings would be equal. Most teams fall relatively close to that line for most stats, since metrics don’t move too much for many teams when adjusting for schedule in this way. But some teams have adjusted ratings and rankings that are rather different from their raw ones, which suggests their competition (and home/road schedule) has a substantial effect on their performance in a particular metric.

For example, Abilene Christian University has a raw net efficiency rank of 21 — all the way to the right in the chart. But once it’s adjusted for schedule, their rank falls to 145.

Location of Abilene Christian on net efficiency plot (raw vs. adjusted)

Meanwhile, Kansas has a raw rank of 91, but once adjusted for schedule, they’re up to 20. What’s going on?

Location of Kansas on net efficiency plot (raw vs. adjusted)

This is schedule-adjusting doing its job. ACU, though enjoying a 25–6 overall record this year, is in the Southland conference (a mid-major). Kansas, despite having one of its worst seasons in recent memory, is still in the Big-12, and therefore gets a boost by virtue of the company it keeps. Major conference teams tend to have tougher schedules and benefit from schedule-adjusted metrics. Just look at Louisville: ranked 63 for net efficiency, but 16 when adjusted for their brutal schedule (Tennessee, Michigan State, Kentucky, and Indiana out of conference, then UNC twice, UVA twice, Duke, and the rest of the ACC during conference play).

Now that we’ve got a grasp of how schedule-adjusting works, let’s look beyond your basic adjusted efficiency stats, which others (most notably, KenPom) have made readily available. We went further and created adjusted versions of the Four Factors, which are much harder to find publicly. Like efficiency, adjusting the Four Factors can reveal some important insights. For instance, consider this plot of each team’s adjusted vs. raw effective field goal percentage (eFG%), which is a measure of how well they shoot from the field.

Effective Field Goal Percentage (eFG%), raw vs. adjusted

Notice two different shades of blue in close proximity towards the top right?

Duke and UNC plot locations for eFG%

Duke and North Carolina are nearly on top of each other on this plot (nicely evocative of the mere ten miles between the rivals in real life, we think). Raw eFG% has Duke ranked 76th and UNC 78th, which may seem a bit low for these high-profile teams. But when you adjust for their hard schedules — which, in this case, is based on opponents’ defensive eFG% — they move almost in unison up to 16th and 17th, respectively. Each of them played a relatively strong set of shooting defenses both out of conference and in the ACC (including each other, twice). Schedule adjustment makes them go from looking like merely good shooting teams to the more elite ones that they are.

Adjusting for schedule helps metrics better reflect a team’s quality in those facets of the game than raw metrics do. Moreover, a team’s schedule-adjusted metrics are generally more predictive of future performance than their raw metrics. That said, schedule isn’t everything. No matter how much we isolate for strength of schedule, part of the fun of March is seeing which teams surprise everyone. Try filtering the Data Studio dashboard of Four Factors to compare schedule-adjusted UMBC with Virginia from last season:

Adjusted Four Factors for UMBC and Virginia through Selection Sunday 2018

Hmm, how’d that work out, again? Oh, right.

On schedule for Sunday

Schedules aren’t the only way to weight and rank teams, but they are helpful. When you combine schedule strength with Four Factors, you can develop a pretty thorough picture of the season’s landscape. We’re looking forward to seeing how these shake out in the coming weeks!