Game On: How NIL is Changing the Recruiting Game for High School Athletes

Robert Mepham
10 min readApr 10, 2023

--

The college sports world is experiencing a groundbreaking change as student-athletes now have the opportunity to benefit from their Name, Image, and Likeness (NIL) rights. The introduction of NIL Scorecards has further opened doors for athletes to earn substantial sums of money.

Haley and Hanna Cavinder both signed NIL endorsements with companies such as the WWE (promo shown above) and Boost Mobile.

Take, for instance, the Cavinder Twins, whose TikTok popularity landed them a $500,000 deal with Boost Mobile. Bryce Young, who hadn’t even started at quarterback for Alabama, already signed an NIL contract worth nearly $1 million with Dr. Pepper.

Heisman Trophy winner Bryce Young turned his stardom as Alabama’s starting QB into a series of nationwide Dr. Pepper commercials.

Livvy Dunne, arguably the face of NIL in college athletics, also seized the moment and inked multiple six-figure NIL deals with brands such as Forever 21, American Eagle, and Vuori. With the aid of these revolutionary tools and tactics, the earning potential for college athletes has soared to new heights.

Olivia Dunne (LSU Gymnastics) has amassed over 11 million followers across her social media empire. She continues to leverage that following alongside her athletic career at LSU to turn that following into millions of dollars in NIL endorsements.

If you’ve ever wanted to understand the world of NIL from a data science perspective, including how high school athlete’s can utilize machine learning to better understand their worth when negotiating their first NIL deals, this article is for you!

The National Collegiate Athletic Association (NCAA) has a long history of opposing athlete pay, citing the concept of amateurism and the preservation of the collegiate model of sports. The NCAA has argued that paying college athletes would ruin the integrity of the games and lead to an imbalance in competition between schools. However, this stance has been challenged by athletes, coaches, and advocates who argue that college athletes should be fairly compensated for their contributions to the multi-billion dollar industry of college sports.

One of the earliest attempts to get athletes paid was in 1951, when the University of Oklahoma football team threatened to boycott the Orange Bowl over demands for better pay and treatment of athletes. The protest was successful, and the athletes received more financial support and improved conditions. However, the NCAA continued to oppose the idea of paying athletes and strengthened its rules against compensation.

Over the years, several high-profile cases have emerged of athletes, coaches, and schools circumventing NCAA rules to provide financial benefits to athletes. One such case was the University of Southern California football program, which was hit with severe sanctions in 2010 after it was revealed that star running back Reggie Bush had received improper benefits. The scandal resulted in USC being stripped of its 2004 national championship and being banned from postseason play for two years.

In recent years, the issue of athlete pay has gained significant attention and momentum, with the rise of the NCAA’s revenue and the exploitation of college athletes becoming more apparent. The landmark case of O’Bannon v. NCAA in 2014 challenged the NCAA’s rules against compensation, with former college athletes arguing that they were being unfairly exploited. The case ultimately resulted in a settlement that allowed schools to offer cost-of-attendance stipends to athletes, but it did not address the larger issue of athletes being compensated for their likeness.

Ed O’Bannon (UCLA Basketball) filed a lawsuit that made it up to the US Supereme Court (not heard) agains the NCAA and their licensing company claiming that the NCAA was unfairly licensing the Names, Images, and Likenesses of current and former athletes without proper compensation.

The issue of name, image, and likeness (NIL) rights gained momentum in the past decade, as athletes sought the right to profit off their own image and brand. In 2021, the NCAA finally allowed athletes to profit off their NIL rights, following pressure from states enacting their own laws to permit such compensation, including California, Florida, and Colorado. The decision by the NCAA allows athletes to sign endorsement deals and receive compensation for autographs, social media content, and appearances.

Now that we know how we got to this new era of NIL-infused collegiate sports, let’s pivot back to the data science side of things. It seems rational that a high school or college athlete would want to know their fair market value in terms of what their NIL would be worth at different schools, much like how professional athletes test the waters of free agency to see how much they are valued by different teams.

This desire to estimate a player’s NIL valuation lend istelf perfectly to the deployment of a very straightforward machine learning algorithm. Since the value we are trying to predict is quantitative in nature, we will use an algorithm that can be relied on not only to accurately predict the dependent variable with high accuracy, but also one that is capable of weighting feature importance in such a way that it can tell us which component’s of a particular input feature set are most important in driving that predicted outcome (more on feature importance later).

XGBoost (eXtreme Gradient Boosting) is a powerful machine learning algorithm that has proven to be highly effective in a wide range of applications. XGBoost is a scalable, fast, and accurate algorithm that can handle complex and non-linear relationships in the data, making it an ideal choice for modeling the complex factors that contribute to an athlete’s NIL valuation. With its ability to handle missing values, feature interactions, and high-dimensional data, XGBoost can deliver highly accurate and reliable predictions that can help teams and organizations make data-driven decisions about how to value and compensate their college athletes fairly.

A simplistic diagram of an XGBoost algorithm driven by a gradient-boosted tree approach.

Now that we know the approach we plan on taking, we need to decide what data to collect, which athletes to train the model on, and how we’re going to validate the model so that it’s actually deployable and not hindered by overfitting bias.

If we look at the publicly available data out there right now when it comes to athlete’s NIL deals, it’s actually quite sparse. This makes sense because, like many celebrity endorsements, athletes aren’t exactly eager to reveal how much money they’re making to endorse certain products. However, we do have some info out there, including many databases that collect and aggregate athlete’s NIL valuations such as On3.

This NIL scorecard by On3 reveals not only a player’s NIL valuation, but also some of the relevant data that goes into predicting an athlete’s NIL value.

It seems reasonable to assume that the social following that an athlete has might play a role in predicting how much a company would be willing to pay them to endorse their product. Therefore, our algorithms will be based on a fairly small feature set (11 features, 137 when dummied) that includes things like an athlete’s Instagram followers, TikTok followers, which school they plan on attending, their recruitment rankings from the three largest prep sports databases (if available), and some other fringe predictors.

We collected this data on over 1100 athletes, including current NCAA and high school athletes who have readily available NIL valuations to use as the independent variable.

We then used some pretty standard algorithm training procesures including an 80/20 train test split, grid search to optimize over a dozen hyperparamters, and also 10-fold cross validation to ensure that overfitting was limited to the extent that it could be in such a small dataset.

Before we get into the takeaways and use-cases for our algorithms, we want to first showcase how accurate they are. Overall, our broadest XGBoost algorithm had a test set RMSE of $1700. That’s an insanely accurate prediction capability considering we’re talking about athlete’s whose annual NIL rights often reach the high 5-figure range, and even several of the top athletes commanding 6 or 7-figures in endorsement deals.

We did recognize that the dataset was a bit stratified. Think about it, college football players are going to demand a lot more in NIL value than a college lacrosse player because their market reach is so much broader. This means that we trained separate models for sports that we recognized had drastically different NIL-valuation scales than others.

These included the following sports:

  • College Football
  • Men’s College Basketball
  • Women’s College Basketball
  • Women’s Gymnastics

Those models should be used in conjunction with the overall model to understand a more accurate forecast of an athlete’s worth in comparison to the overall market for athletes in a given sport.

(The worst test set RMSE for any of the models was ~ $8900)

Not only do our XGBoost models allow us to predict an athlete’s NIL-valuation with staggering accuracy, but it also lets us dig a step deeper to see what specific component’s of an athlete’s brand drive their NIL valuation the most.

The concept of feature importance is seen across most ML algorithms. This can range from simple variable coefficients in regression algorithms to more complicated neuron-weights and back-propagation vectors in neural networks. The signs and magnitudes of these importance indicators tell the relative importance of each predictor to the output of the algorithm.

In the case of our tuned-XGBoost algorithms, we can use the F1-score of each of the predictor variables to see which components of an athlete’s profile have the largest driving-force in their NIL value.

The School You Go To Matters:

Not All Social Media Followers Are Built The Same:

The typical college athlete has almost twice as many Tiktok followers than Instagram followers, and only 1/6th the audience on Twitter they have on Instagram. This means that reach across different platforms is relative. Just because someone has 100k on TikTok does not mean their influence is anywhere close to someone who has 100k on Instagram or Twitter.

F scores for the top-10 most important features in predicting a college athlete’s NIL value in 2023

As you can see, apart from an athlete’s recruit composite score (i.e. a representation of how highly recruited an athlete is), an athlete’s social media following on the three major platforms is the next most important thing in determining NIL value. Using F-score in the most naive sense of feature importance, you can see that an athlete’s audience on Instagram is almost twice as valuable as their audience on TikTok, despite it often being smaller in size. Furthermore, Twitter, while often being overlooked from a monetization standpoint, actually provides more of an indicator of NIL value than TikTok does.

While this does not mean Twitter is more lucrative to advertisers, a larger audience there often correlates with an ability to grow a following on the other major platforms as well, increasing overall value.

There is a notable difference in F-scores between genders, as it appears social following is more important in determining NIL value among female collegiate athletes than it is with male athletes.

This could be for a few reasons. Firstly, it should be no surprise that as things currently stand, men’s sports, particularly college football and basketball outdraw women’s sports from a media-value perspective. This means that top-tier male collegiate athletes will get more screen-time in national broadcasts than their female counterparts, simply because of the current nature of TV deals and market demand.

Secondly, this could also be seen as a symptom of the future-earnings value of male athletes compared to female athletes. Many male athletes in the top-earning sports have expected career earnings that far outpaces projected earnings for their top-earning female counterparts. This difference in expected earnings across their career could explain the lesser focus on current social media following for male athletes, as sponsors might see their marketing campaigns with these male athletes as a sort of investment/networking venture in hopes of securing an early sponsorship with the next nationally recognized superstar like LeBron, Patrick Mahomes, or Steph Curry.

It should be noted that when top male sports are excluded from the model, the relative importance of current social media reach is approximately equal across genders, indicating that for sports without this future earnings discrepancy, there appears to be a relatively equal emphasis in current social media reach for both male and female athletes.

In conclusion, the introduction of Name, Image, and Likeness (NIL) has revolutionized college sports, allowing student-athletes to capitalize on their talent and hard work. However, it can be difficult for athletes to navigate the NIL marketplace and understand their true value. That’s where NIL Scorecards come in. With the help of machine learning algorithms and the most up-to-date NIL marketplace data, these proprietary tools can accurately predict any high school or college athlete’s relative NIL valuation based on their social following and sport/school of choice. This technology empowers athletes to make informed decisions and maximize their earning potential, creating a level playing field for all.

Sample output leveraging our machine learning algorithms.

If you want free access to the information that other services hide behind a paywall when it comes to NIL-valuations for prospective college athletes, feel free to reach out to us on Twitter (@hoop_power).

Share this article with any athlete or parent you know who could benefit from a better understanding of why NIL should be at the forefront of any athlete’s collegiate-athletics decision.

And if you are interested what your NIL valuation is, drop me a DM on Twitter (@hoop_power). I’ll ask for some information and give you a comprehensive NIL scorecard like the one you see above for Bryce Young. This can serve as a great starting point for comparing your college decision, as well as a good tool to have in your back pocket when entering into negotiations with your first NIL sponsor.

--

--

Robert Mepham

Data Scientist | Amateur Sports Analyst |Leveraging AI In The World of Sports