Analyzing MLB Draft Trends Based on University

Jaron Richman
INST414: Data Science Techniques
2 min readMay 2, 2024

When choosing what school to attend to play baseball, players prefer schools who are prone to getting their athletes drafted at a high rate, and also sign for large amounts of money. Historically, the top schools for that reside in the SEC and ACC, but there have been other schools producing high level talent recently. I decided to look and see what schools are producing the most picks, and who is getting their picks the most money on average.

I collected my data through the baseballr package in R Studio. I scraped 2021, 2022, and 2023 draft data, and combined them into one dataset. Some of the schools were listed multiple times but with different spellings — Penn State was listed as both ‘Penn State’ and ‘Penn State U’ so that had to be fixed. I also had to convert the signing bonus column to a numeric variable to I could conduct data wrangling on it at a future time. I then grouped the data based on school, and calculated the number of draft picks each school produced, as well as their average signing bonus

draft2023 <- get_draft_mlb(2023)
draft2022 <- get_draft_mlb(2022)
draft2021 <- get_draft_mlb(2021)

drafts_22_23 <- full_join(draft2022, draft2023)

drafts_21_23 <- full_join(draft2021, drafts_22_23)

Once all of my data wrangling was complete, I formed a Python file to complete my analysis.

My analysis is limited on the fact that it only includes the last 3 years of draft data. We could get more of an insight if there was more data included. Additionally, SEC schools typically get the best recruits out of high school, which then turn into higher draft picks. There could be some bias towards schools that truly develop their players to go from a non-draft prospect into a draft pick.

I have included a link to my GitHub repository containing the code for this article. They are “Module 2 Assignment.ipynb” and “draft_data.R

--

--