On Pursuing a Career as a Sports Analyst

Sean Carver
6 min readNov 11, 2023

--

Photo by pixabay from pexels

A few years ago, I had a dream. I wanted to become a sports analyst making a lucrative salary while helping my team win games.

With an impressive resume, I thought I stood a chance. Fifteen years earlier, I had graduated from Cornell University with a Ph.D. in Applied Mathematics, and I had spent my career working in academia for schools including Yale University and Johns Hopkins. My teaching experience included many sections of Statistics classes. While I was at American University, I had a research program in which I mentored students in sports analytics and published work on Medium and in peer-reviewed publications. I had recently completed a fellowship learning the ins and outs of data science at the prestigious Flatiron School in DC and had some impressive projects under my belt.

Part of my dream might have come true. My hometown baseball team had just won the World Series and they were hiring a new crop of sports analysts to help them continue to win games. I applied and got an interview.

With the first sentence out of the interviewer's mouth, I realized that no matter how impressive my resume was, the whole of my dream would not come true. The problem appeared to be that there were many talented people who wanted to work as sports analysts and were willing to take low salaries. With an over-supply of labor, teams can get away with offering salaries to prospective data scientists that would not be competitive in other industries.

The Flatiron School gave me a clear idea of what to expect in the job market. The highest number my team quoted fell below the lowest point of the range that Flatiron had promised for a junior data scientist in most industries. Moreover, with my academic experience as a research scientist and professor, I qualified for a much higher title. The salary in professional sports was too low for me. I was invited to a second interview, but I politely, sadly, and with great remorse, withdrew my application.

Still, with expectations appropriately managed, I must point out that the dream of being a sports analyst stands as fantastic and attainable. The important thing to realize is that once you have the skills needed to be successful as a sports analyst, you will be extremely competitive in other industries that pay more. With this insight, you can plan to work as long as you want for sports, then switch over to higher-paying roles at any time you need to. A non-sports data scientist is a solid plan B for no extra effort.

For myself, I am happy with Healthcare as my primary job, but I have decided to take on sports analytics as a side hustle. Mentoring students and career hopefuls through sports analytics projects has been exquisitely fulfilling to me as an academic, and I have started a business doing just that as well as tutoring children and adults in Statistics, Mathematics, Calculus, Computer Science, Data Science, and Introductory Physics, all of which I know very well. For more information, see my homepage at http://doctordataprofessor.com/.

What do you need to know as a sports analyst? What skills will I teach? There are primarily three programming languages used by data scientists: Python, R, and SQL. If you know Python, but not R, I would recommend learning the basics of R. In my opinion, Python is a better language, but R has more advanced methods and software available for it. In my current field, Python is generally preferred for a lot of things, but for some of the more technical projects we do, we require R. It’s better to learn both. Additionally, SQL is helpful when data sets are large.

Beyond programming, Statistics is, of course, very important in this work — specifically, Regression Theory and Machine Learning, but other aspects of Statistics are also important.

Data Visualization and Data Wrangling are also key, as well as various publishing software packages. For academic publishing, LaTeX, and sometimes Knitr, is generally needed, but if you are just going to publish on Medium, its user interface for writing articles may be all you need and is much easier to learn than LaTeX.

Finally, Mathematics is helpful for understanding statistical methods.

One important component of your resume can be sports projects with data to show your creativity and interest in sports. I recommend that you find projects that are original and projects that are relatively easy to accomplish — many such projects exist in sports. You do not necessarily have to pick projects that help teams win games, but they should be fascinating to fans. Pick projects that can draw a lot of interest on Medium, and possibly get published in venues such as Towards Data Science or other such publications through Medium. From my experience, there is a lot of interest in sports data science articles on the platform. These projects will not only help you get a job, but may help you get into school, either as an undergraduate, or graduate student, depending on your field of study. Moreover, sports projects can often be excellent conversation starters in interviews or with professional networking — regardless of what field you pursue, sports, or something else.

Baseball is perhaps the best sport for doing projects, simply because there exists such a wealth of good data that is free to any enthusiast. For some other sports, data are available but the cost can be prohibitive and therefore it becomes hard for others to reproduce and interact with your work. Baseball also has a discrete nature to it which lends itself well to easy analysis. Besides, with the World Series now over for the past season, teams may be looking to hire a new crop of analysts soon.

To get into baseball research, I would recommend the book, “Analyzing Baseball Data With R,” by Max Marchi, Jim Albert, and Benjamin S. Baumer (recently available as a second edition). The book describes five baseball data sets and teaches the R statistical programming language to analyze them. I have not been able to find a comparable book for Python — writing such a book could be a project for an ambitious enthusiast.

The data sets in Marchi’s book include the Lahman season-by-season data set, the Retrosheet game-by-game, and play-by-play data sets, as well as two pitch-by-pitch data sets: MLBAM Gameday, and PITCH/fx.

From these data sets, more can be derived. For example, with one of my students, using SQL, we derived a half-inning-by-half-inning data set to get a handle on the question: what are the weirdest half-innings that have been played, or that have never been played, but are possible according to baseball’s rules, in all major-league history. This effort fascinated us because one half-inning data point tells a whole story, which can be fleshed out with more data and details from other sources. By being able to pose and answer a simple question, like “What is the weirdest half-inning in all major-league history?” my student was able to capture many people’s attention in his professional circles, and he successfully launched a career in sports.

So what is the weirdest half-inning in baseball’s major-league history? We interpreted “weirdest” as the most improbable (based on major-league history) of the shortest possible half-innings (only three at-bats for three outs). Our answer: a triple starts the half-inning. On the second play, the batter gets to second base, but the runner on third goes nowhere and is still in play. The third and last play is a triple play — three outs end the half-inning. Each of these plays has occurred in major league history, but this sequence of three plays has not, and based on estimated probabilities is likely to occur only once in more than a billion half innings. It is the least likely of 26 possible three-play-half-innings as described by the Markov model of baseball, and only one of two such three-play-half-innings that has never occurred in baseball’s recorded history (1930–2018, from Retrosheet, not considering post-season).

If you are interested in working with me on baseball analytics, or on other data projects, or know someone who needs tutoring in Statistics, Mathematics, Calculus, Computer Science, Data Science, or Introductory Physics, please see my homepage at http://doctordataprofessor.com/ or email me at doctordataprofessor@gmail.com to set up a free 15-minute consultation. With an academic break coming up, and college admission season around the corner, now might be a good time to start a data project.

Let’s work together to foster our dreams of helping our teams win more games with data.

--

--

Sean Carver

I have a passion for discovering and communicating penetrating insights from data and models.