Do stars matter?

A preliminary study of recruiting services ratings

Anyone who follows college football knows that success is partly driven by recruiting the top athletes from high school and junior college. Every February, fan forums and blogs come alive with gossip regarding student-athlete choices, comparisons of rival schools’ recruiting classes and evaluations of the staff’s success in bringing in top talent. Most conversation centers around the notion of stars and ratings.

Scout.com, 247sports.com and ESPN all produce ratings of student athletes. These ratings are typically 1–5 stars, with a numeric equivalent on a 100 scale. Few would argue that lower ratings are better higher ratings. Yet very few forum posts, SBNation articles or articles on mainstream sports media make an effort to explain how the ratings are derived. The numbers assigned, though seemingly arbitrary, are taken as fact by fans and media alike, and used to rank recruiting classes and predict future success.

The debate is age old, and entire blog posts are dedicated to the very subject. VolNation, a popular blog covering University of Tennessee, posted the following in a February 1st 2010 article How many stars does it take to…:

…since 2007 the SEC has averaged 5 teams in the Top 10 of Recruiting Classes (based on Rivals numbers). The recruiting numbers mean a lot….

Given the recent history of Tennessee football, we can safely say that having top recruiting classes there meant little. But the question remains. Does having a high average rating across the rating services matter? Can a small class with highly rated players compete with a large class with a slightly lower rating? Am I worse off to recruit JUCO players? Can a class of 3-stars with “system” guys beat a class of 4-stars?

I’m thinking of a number…

Before evaluating the quality of the rating services, its probably worth figuring out how they do it. What makes a player a 97 versus a 95? On this topic, the rating services are particularly vague, leading the author to believe the value is almost entirely arbitrary. 247's official documentation tells us that a 98–100 rating means the athlete is a five-star prospect:

98–100: One of the top 30 players in the nation. This player has excellent pro-potential and should emerge as one of the best in the country before the end of his career.
No math, no statistics, just an arbitrary rank-ordering.

No math. No statistics. Just a rank ordering based on some arbitrary or unstated criteria. Other rating services provide similar explanations. A Washington State SBNation article goes through the methodology at length for each recruiting service. Most services simply take a board of experts, rank order the recruits and assign numbers based on their ranking — hardly a scientific approach.

Recruiting narratives.

So going to the source (without going to a primary source) doesn’t answer our question. But its still pretty evident that these services have some value; every year Alabama pulls in the “top” class, and every year they are somewhere in the national championship conversation.

The model

To determine what a point is worth, I gathered data from 247Sports composite rating for every power 5 conference team starting in 2004, and every end-of-year Sagarin rating for each team in my study starting in 2008. Sagarin is a surrogate for team success, including strength of schedule as well as winning percentage, and is just easier to model than rankings. I also added some interesting metrics that I thought might be significant, including the count of JUCO players recruited and the number of prior National Championships a team has won (surrogate for prestige). The final data set was 384 observations long and covered the period of 2008–2014.

I had a hunch that a high Sagarin score was related to the 247 composite score for the freshman class. A quick plot shows a clear linear relationship:

Plot of Sagarin rating by 247 Freshman Class Composite Score. Teams with high Sagarin ratings the previous year are in lighter shades of blue.

I then ran stepwise linear regression, a common data mining tool to identify the strongest model, with the Sagarin rating as the endogenous (dependent, outcome) variable. The exogenous (independent, predictor) variables included:
- Prior year Sagarin rating
- Average composite rating (1 variable for each of the last 4 classes)
- Average class size
- # of JUCO recruits
- # of national championships
- The conference

In the final model, last year’s Sagarin rating, the freshman recruiting composite average and the average class size were significant factors in the current year Sagarin rating. The number of JUCO players recruited and national championships the school had previously won were dropped, as well as the ratings for the other classes (sophomore, junior, senior + 5-year senior).

What’s a point worth?

The best model yielded an R-squared value of .4755. Interpreted loosely, the percent of variance this model described was ~50%.

Final linear model. Sagarin score is predicted by last year’s score, 247 composite score for the freshman class, and the average class size over the past 4 years.

So this tells us that a 1-point increase in the freshman class composite score is worth .81 Sagarin points. So moving from a class of mid-three stars (85) to low 4 stars (90), we can expect 4 more Sagarin points. Because the Sagarin ratings are normally distributed (mostly), a 4 point move could mean a bump from #4 to #1 or #7 to #4 for a top team, but towards the middle this could mean bumps of 10 or more places.

We look at our “residuals” or the differences between what our model predicts and the actual results, and we can see some of the other factors explaining the remaining 50% of the variance intuitively.

Studentized residuals of the final model. Extreme values are labeled by year and team.

On the upside, our model misses the influence of a single great player. Not surprisingly, extreme values include 2010 Auburn, which included Cam Newton(who was only a 4-star JUCO player when he signed at Auburn) and 2013 Florida State (Jameis Winston), both teams with “outlier” players. There’s also a case made for good coaching, with 2010 Stanford and 2014 TCU both recruiting in the mid 3-star range and getting outstanding results. There’s teams with creativity in coaching as well, running modified no-huddle spread offenses (Missouri and Auburn). So perhaps there is space for recruiting “average” rated players and getting above average results. And most sports writers chalk it up to good coaching: David Shaw at Stanford, Gary Patterson at TCU, Gus Malzahn at Auburn.

On the bottom we find dysfunctional teams, coaching turnover and potentially bad coaching. 2010 Ohio State and 2014 Vanderbilt come to mind as team with drastic coaching changes and dysfunction mixed together. Auburn in 2011 and 2012 showed us Gene Chizik without Cam Newton and the results were not pretty. 2010 Texas was a catastrophe.


Bottom line: In aggregate, recruiting a highly rated class is better than recruiting an average one.

The rest I may or may not choose to leave to Las Vegas. I wrote this initially in response to a statement posed to me by a co-worker. Namely that coaching matters much more than who you recruit. That a great coach recruiting for a system matters more than pulling in a ton of top talent. And I think this is still the case in certain circumstances. Even a 10 point swing in recruiting class ratings only gets you 8 more Sagarin points.

For a team like Louisville for instance to get a ranking closer #10, #30 last year on Sagarin’s ranking in 2014, it would have needed to out recruit Alabama. So perhaps hiring a great coach or recruiting a superior player is a better strategy than going all out and bringing up that average class rating. But we lack measures of great coaching, and this is the best measure of great players we have.

Finally, the study and commentary here is not meant to be exhaustive. There are many ways to measure success (wins, top 25 finishes) and more complicated models we could have chosen (for those interested in making the playoff for instance, we might have modeled a top-4 0/1 variable, using logistic regression).

Feedback is always welcome. Reach me at mberns @ chicagobooth dot edu.

Show your support

Clapping shows how much you appreciated Matt Berns’s story.