March Madness Post-Mortem, and why Sharing the Ball isn’t always good

Published in

re-HOOP*PER-rate

5 min readApr 12, 2018

It’s been a while since March Madness, and what Madness it was! My bracket was torn to shreds pretty quickly, though the neural network I trained did predict Buffalo over Arizona, and really liked Nevada to upset Cincinnati. I said before that if I were to bet on any game, it would be a potential Nevada over Cincinnati 2nd round upset, and after seeing the Nevada Wolfpack come back from 20 points down to upset Cincinnati… well, to paraphrase the great Rasheed Wallace, numbers don’t lie! (At least, not when they repeatedly identified Cincinnati as the team most likely to be upset)

While the Tourney was going on, I decided to do some Principal Component Analysis to see which of the 14 Advanced Statistics I analyzed were most relevant in predicting a team’s NCAA tournament success. This projects the dataset into a pre-determined number of dimensions, in a way that optimally keeps as much of the original information as possible. The magnitude of each of the original components in the new dimensions gives a clear sense of which statistics are the most “important”, in terms of capturing data that are already captured in other dimensions.

In Python, I did:

from sklearn.decomposition import PCA
pca=PCA(n_components=3)
X3D=pca.fit_transform(XTrain)

to convert the original dataset into a new representation with only 3 dimensions. I checked how well this new 3 dimensional dataset captured the original data by calculating the Explained Variance Ratio:

print(pca.explained_variance_ratio_)[0.49246765 0.14785902 0.11346942]

adding up these percentages,we get about 75%, not bad for reducing 14 dimensions down to 3!

To see which of the orignal 14 stats were projected onto these 3 dimensions the most, I printed out the components of each in the new PCA representation:

pca.components_array([[-1.29458010e-01,  6.95485825e-02, -9.83184325e-04,
        -8.05737044e-02, -4.33195551e-03, -1.46212499e-01,
        -1.01218939e-01, -8.34025780e-04,  1.36409696e-04,
         9.10941095e-04, -3.09267288e-01,  1.41108473e-02,
        -4.74401428e-01, -4.44151076e-03, -7.86984776e-01],
       [-2.80822347e-02,  3.44688827e-02,  1.91145602e-03,
        -1.06263980e-01, -1.24243865e-02,  9.26066611e-01,
         4.90970113e-02,  1.68262277e-03,  1.79876657e-03,
        -1.09215205e-03,  2.28980781e-01, -7.21053163e-02,
        -2.39997384e-01,  3.03746720e-03, -1.06380175e-01],
       [-1.00796963e-01, -1.74515007e-01,  3.63456693e-03,
        -2.20267108e-01, -6.99973454e-02, -3.07469576e-01,
         6.89514344e-02,  3.54759919e-03,  2.67396701e-03,
         1.88325307e-04,  8.11275093e-01,  1.67915195e-01,
        -3.44898208e-01,  7.61347422e-03, -3.55936511e-02]])

The most relevant statistic for the first component (with a magnitude of almost 0.8) seems to be the first element in our array, which corresponds to good old fashioned wins and losses. The next 3 most important factors are Offensive Rating, Strength of Schedule, Assist Rate, and Offensive Rebounding Rate. Looking at the next most important component (the second array in the output), Assist Rate is the most important statistic. Looking at the third array, we see that Offensive Rating is the biggest contributor, followed (yet again) by Assist Rate. It seems offensive statistics manage to “capture” much of the information in other Advanced Statistics.

Curious about seeing Assist Rate appear repeatedly among the most important statistics, I took a look at the performance of the teams that ranked HIGHEST in Assist Rate in the NCAA Tournament over the past 9 seasons. Old sportscasters always like to say teams that “share the ball” and are “unselfish” perform better in the post-season, but is that really true?

I took a look at teams that were:

In the Top 15 in Assist Rate
Had a Top 6 seed (making them the favorites in at least their first game)

What I found was a bit shocking: the 49 teams that fit this profile had a net record of 25–24 in the NCAA Tournament from 2010 to 2018. A number of those teams were from the Big 10 Conference (Michigan State, in particular, consistently had a high assist rate); if we take out the Big 10 teams, the top passing teams who received high seeds had a record of only 18–18 — they were upset as often as they won. Keep in mind these are top seeded teams! They are generally expected to win — and their opponents have favorable odds. That means if you went to Vegas every single year from 2010 to 2018, and simply bet on teams that were in the Top 15 in assist rate to lose every game, you would be way ahead right now. In fact, you would have predicted notorious, come from nowhere upsets such as 15 seeded Florida Gulf Coast over 2nd seeded Georgetown in 2013 and 15 seed Middle Tennessee State’s upset of Michigan State in 2016.

Second seeded teams that are in the top 15 in assist rate fare particularly poorly in March: their record from 2010 to now is a measly 6–6, far below their expected result of making it to the Elite Eight. In 2018 alone, second seeds Michigan State and UNC were upset in the second round of the tournament. It seems like that conventional wisdom about “sharing the ball” doesn’t actually apply come March! A few other observations about these teams:

Top 15 Teams that received Top 6 seeds had an average seeding of 3.5, which means they would be expected to reach the Sweet 16
Instead, they have an average WAS score of -1.1, corresponding to an expected loss in the second round
Second seeded teams had an average WAS score of -2.4, somewhere between a second round upset and a first round upset, performing much worse than their expected appearance in the Elite Eight
Only 2 teams that fit this profile have made it to the Sweet 16 in the last 9 years: Top seeded Syracuse in 2010 and 4th seed Michigan State in 2014. Both teams promptly lost
Only 2 of the 49 teams that fit this profile have performed above their seeding in the Big Dance: the aforementioned Michigan State team in 2014, and 6th seeded Xavier in 2015. Both teams only won one more game than expected.
Speaking of Michigan State, they appeared on this list 3 times, more than any other team. Back when Tom Izzo led the Spartans to the National Championship in 2000, they were ranked a lowly 102nd in Assist Rate.

What is it about teams that share the ball so much that makes them vulnerable to upsets? It may be that teams which have a high assist rate are forced into that position because they don’t have individual star players that can take over in close games. The lack of a clear go-to scoring option may cause teams to pass the ball around the perimeter, without anyone taking the initiative to make a play (a common sight every March). That’s a recipe for disaster in the close games that have come to define the NCAA Tournament.

It may also be because teams that pass at such a high rate rely on an offense that is built around repeated “gimmicks” like lots of ball hand off screens, or lots of repeated back door cuts. For a top seeded team, there is usually plenty of game film for their opponents to analyze and determine exactly how to counter those gimmicks. Lower seeded teams that have a high assist rate perform much better in the tournament, possibly because there isn’t as much game film of their offensive schemes for opponents to study, allowing their clever plays to be shocking surprises. In short, assist heavy schemes like the Princeton offense are only effective for the Princetons, Holy Cross-es, and Richmond’s of the world, and not for the Michigan States, UNCs, and Georgetowns.

March Madness Post-Mortem, and why Sharing the Ball isn’t always good

Written by reHOOPerate