Getting into Sports Analytics 2.0

Sam Gregory
9 min readJan 18, 2020

A little over two years ago I wrote a piece called “Getting into Sports Analytics”. It is probably the most read and shared thing I’ve ever written. Since writing that piece I have a new job (still in sports analytics working at Sportlogiq), but I stick by just about everything I wrote. As someone who’s early sports analytics work is littered around the internet on various blogs, publications and websites I can tell you that being able to look back on something you wrote a few years ago and still stand by it is a rare occurrence.

When I wrote that piece I put every bit of advice I could think of into the article but I still get messaged by people almost daily asking for advice on how to “make it” in sports analytics. As I mentioned in the previous piece I wrote I understand this inclination, I did it all the time when I was trying to break through and get a job. But to be honest when I look at people who did help me along the way it was never a response to any of these DMs or messages. It was a piece of feedback someone would give me on a piece I wrote or reading articles others wrote outlining their methodology or approach to a problem. I got way more value out of those interactions than messaging people asking “how do I get a job in sports analytics?”.

Not that there is anything wrong with asking this — as I said I did it all the time!

But the analytics world has changed since 2017: more public data sources, more job postings, and almost certainly more competition for those job postings. I don’t have any secret sauce I’m withholding that is the key to getting a job in sports, all the advice I had to give at the time was in that piece, but I thought there would be some value in “updating” what I’d written so that people who are asking these questions in 2020 have some insights that better reflect the current state of sports analytics.

Before I start though if you haven’t read Getting into Sports Analytics I would suggest reading that first, the core messages of that piece are probably more valuable than any of the specifics I will mention here.

University Degrees

Of all the things I discussed in the first piece this is the one I still get the most questions about. These questions are often phrased as “I’m debating between these two degrees, which is better for my chances of getting a job in sports.” My answer is still the same, study what you are interested in. I can almost guarantee studying something you are passionate about and doing interesting things in that field will better prepare you for a job in sports analytics than choosing something just because you think it will get you a job in sports.

University Clubs

University sports analytics clubs are popping up more and more and make me wish I’d had access to something like this when I was in university. I think the first one of these clubs that I was introduced to was the Harvard Sports Analysis Collective which was run at the time by Brendan Kent and Andrew Puopolo, both smart guys who ended up getting jobs in sports immediately after graduating.

These clubs provide an opportunity to work with lots of smart and passionate people who are interested in the same field you are but may have different backgrounds or fields of study which makes learning from them even more valuable.

Another cool opportunity that comes out of these clubs is the ability to actually work with sports teams while you learn. Many of these clubs are starting to work with their university’s varsity sports teams as an in house analytics department. A current colleague of mine at Sportlogiq Connor Jung started working with the Queen’s hockey team while a student there, fast forward a couple years and the Queen’s Sports Analytics Organization played a key role in the Queen’s Men’s Hockey Team winning their Conference Championship for the first time in almost 40 years.

So if you are a university student my advice is: study what you are interested in outside of sports and try and get involved in other ways with varsity teams or sports analytics clubs — if your university doesn’t have one think about starting one!

Public Data Sets and Code

When I was writing in 2017 I talked a lot about doing work and making it public, this was admittedly a harder task back then but today there are fewer barriers. Notably the number of public data sets available for analysis has grown significantly. This has also drastically increased the amount of publicly available code to learn and borrow from.

There are two public data sources I’ll mention specifically here but there are plenty more out there that are easy to find.

Statsbomb Public Data

Statsbomb are a relatively new soccer event data company providing a more comprehensive event data spec than many competitors in the field. They’ve also released quite a few public data sets which include full seasons or competitions. Currently available publicly are the 2018 Men’s World Cup, 2019 Women’s World Cup, every La Liga game Messi has played from 2004/05–2018/19, the 2018 NWSL season and the 2018/19 and 2019/20 WSL seasons. The data sets can be accessed in this public repository.

Statsbomb themselves have written packages in R and python for parsing and playing around with the data. If you search “Statsbomb” on github you will find over 50 additional repos using the publicly available Statsbomb data to provider further inspiration.

This kind of breadth of publicly available, high quality event data is new in soccer and has already helped push the field of public analytics forward. If you are looking for a place to start making a name for yourself in sports analytics doing impressive work with this data is one of the best I can think of. If you are looking for exposure their website produces written content based off the data (plus they pay their writers!) so if you think you have something good pitch it to their editor Mike Goodman.

Big Data Bowl

The Big Data Bowl is a (now annual) data competition organized by Michael Lopez, the Director of Data and Analytics at the NFL. The Big Data Bowl provides a level of publicly available tracking data that I didn’t think was possible in the industry.

You’ll hear it a million times but tracking data really is the next frontier — or some would argue the current state of the art — in sports analytics. As opposed to event data where you only get a list of on ball actions during a game tracking data gives you player positions at a high frame rate (fractions of a second). The data is harder to work with, much larger in size but also gives a much more complete picture of what is happening at any one time.

The Big Data Bowl is essentially a tracking data dump of several weeks of NFL tracking data and a kaggle competition where contestants try to answer a question of interest posed by the NFL. Again a quick “Big Data Bowl” search on github returns over 90 repos of publicly available code plus the notebook submissions to the kaggle competition are available here.

I don’t think it can be overstated enough how much having publicly available tracking data will help push this field forward.

It’s also worth mentioning here for both the Statsbomb data and Big Data Bowl data even if your goal is to get a job in a sport outside of soccer or American football working with these data sets and showing what you can do will help attract interest from people working in all sports. Statsbomb’s old data scientist Derrick Yam now works in the NFL for the Baltimore Ravens, Dani Chu one of the winners of the student category of the 2019 Big Data Bowl was recently hired by the new Seattle NHL team and Namita Nandakumar currently a Philadelphia Eagles data analyst made her name in the public sphere working with hockey data. Point being many of these skills are transferable across sports and if you are interested in working outside of soccer or American football then I encourage you to seek out public data sets available in your sport but also play around with what’s available from Statsbomb and the Big Data Bowl.

Sports Analytics Conferences

Obviously your ability to attend sports analytics conferences depends on many factors: geography, financial resources etc. But if it is something you have the opportunity to do I would highly recommend attending a practitioner-facing sports analytics conference that doesn’t cost $800+ to register for.

These conferences tend to be relatively small, the atmosphere is very friendly and they are the best opportunity to meet and talk with the smartest minds in the field. I find the presentations and conversations I have at these events are some of the most valuable resources for my personal development in sports analytics.

A few of the relatively affordable ones that come to mind — and I’m sure I’m missing a lot — are: NESSIS (Boston, USA), CASSIS (Vancouver, Canada), Sounders FC Analytics Conference (Seattle, USA), CMSAC (Pittsburgh, USA), OTTHAC (Ottawa, Canada), OptaPro Forum (London, UK), Leicester Tactical Insights (Leicester, UK), CBJHAC (Columbus, USA), Great Lakes Analytics Conference (Steven’s Point, USA), RITSAC (Rochester, USA).

It is worth saying as someone who has attended quite a few of these conferences the conversations that stick with me are ones about the content, the industry, or just sports in general more so than the ones that start with someone handing me a resume. Not saying these aren’t opportunities to pass on your resume and look for a job but engaging with the conference itself and taking advantage of the fact you are in a room with people who are passionate about many of the same things you are will mean a more rewarding conference experience and potentially even a better chance of landing a job down the line.

Accessibility and diversity are the elephant in the room at a lot of these conferences. The attendees list is almost always very white and very gender imbalanced. One of the biggest issues is financial, which makes initiative like this one from HockeyGraphs crowdfunding to help individuals from underrepresented groups attend conferences incredibly valuable. So this is less a comment to those looking to get into sports analytics but one to those in positions of privilege: if you have the opportunity to help sports analytics tackle its diversity problem do it.

Do Good

The Silicon Valley-led tech industry is going through an ethical crisis right now with people like Mark Zuckerburg testifying before Congress. It’s the inevitable result of incredibly smart people making decisions in a bubble without thinking about the broader societal impacts and ethical concerns of the products they are working on. I think we are just now on the cusp of similar difficult conversations in sports analytics.

As I write this we are days removed from the MLB handing down a series of severe punishments to the Houston Astros and their staff for a 2017 sign stealing scandal. The Astros had become the poster child of the analytics movement and the search for marginal gains. In the process they won a World Series and came very close to a second this past season. In the process they also engineered a cheating scandal the likes of which baseball hasn’t seen since the steroid era, they acquired a highly “undervalued asset” in the form of a domestic abuser and they were forced into firing an Ivy League graduate assistant GM after initially defending him for harassing a woman reporting on said domestic abuse.

The poster child of the analytics movement demonstrated what happens when you treat people like numbers on a spreadsheet and the sport as a problem to be solved regardless of the methods to get there. All this to say sports analytics isn’t some mythical, entirely objective field where we should blindly follow what the numbers or analysis tell us, it is small part of sport but one that can have a significant influence over people’s lives. These are all things that we should be aware of regardless of what our roles in sports analytics are. I mention it here because it’s something I didn’t think about at all when I first started getting into sports analytics but that I think about all the time now.

Sports analytics has so many good use cases as well: fighting racial biases in recruitment, finding undervalued players who have been discarded by a broken system etc. It’s just important to remember that your actions in this field have real world consequences and you should be thinking about them in your analysis.

Between what I wrote in 2017 and what I’ve written here I think this is about as much advice as I can give for “ how to get into sports analytics”, a question that has no easy answers. Maybe in a few years it will be time to write another update, but until then good luck!



Sam Gregory

⚽️📊📉📈 | Data Science + Sports Analytics | Grad Student @iHealthSportVU