SPORTS STRATEGY PHOTO VIA SHUTTERSTOCK

Sports Is Your Perfect Introduction to Data Science

Colin Davy
State of Analytics
Published in
5 min readMar 18, 2016

--

When I interviewed at Slalom, I talked about my work experience for maybe 10% of the time and my hobbies 90% of the time. This wasn’t because Slalom was that interested in my interests outside of data science, but because my hobbies are what got me started in data science in the first place: sports analytics. Like most people that get into sports analytics, I was drawn into the field because it was a way for the analytically-minded to better understand the games they were already watching. Sports analytics are a way to explore the same questions fans and coaches alike are asking: who do I think will win, and why? Who are the best players in the world right now? What strategies are best, and why?

There are plenty of great reasons to ask these questions. Some are just a part of being a fan; others have a more… direct financial reason. Either way, there has never been a more exciting time to be a part of sports analytics. Data sets get better every passing year, the methods of analysis get more sophisticated, and the culture of acceptance of analytics is solidly in the mainstream. Even if you have zero aspirations of making a career out of sports analytics specifically, it’s a great “gateway drug” to analytics in general if you’re new to the field, possibly better than any other subject matter. Having transitioned from sports analytics to predictive analytics in general, here’s my (incredibly biased) opinions on why:

· The data sets are great. There is no shortage of descriptive statistics for just about every sport out there. Box scores alone have a wealth of well-maintained information stretching back for extended periods- baseball alone has a strong argument as the world’s richest data set. Data is generally easier to obtain, there’s very little data cleaning and massaging required, and the rules of each game make causality a lot easier to determine.

· The problems are well-defined. All sports have at least one unambiguous result built into the rules: who wins and who loses. The majority of sports analytics is devoted to predicting this outcome. All the secondary questions (how does a specific player contribute to winning a game, what strategies should be used, etc.) are in service to answering the win/loss question- and those are great problems to solve too! Every tool in analytics is available, from linear/logistic regression to clustering algorithms to best practices in data visualization. It’s really choose-your-own-adventure for what you want to get better at.

· Domain-level knowledge is much easier to incorporate. This is a fancy way of saying that being a sports fan is a meaningful starting point. If I tried to get into predictive analytics by looking at something like crime data, for example, it would be a lot harder without some domain-level knowledge of the subject (an understanding of the different types of crime, knowledge of policing strategies, etc.) If you’re a sports fan, however, you already have a great reference point to help inform the analytics you use. Knowing the rules and having prior beliefs and intuition not only provide a good framework to start asking the right questions that drive analytics, they also provide a great sanity check when your analytics start producing results.

· There are lots of out-of-sample events to tell you how well you’re doing. Once you figure out what you want to predict, sports will give you ample opportunities to see how good your models are. Analytics only has one truly meaningful measuring stick, and that’s the out-of-sample prediction: how well can you predict something before it actually happens. Not only is this great from a technical perspective, but it’s also great to have repeated exposure to being wrong a lot. Predictive analytics at its core is about being the least wrong over the long run. It’s one thing to understand that intellectually, it’s another to experience it. The classic example: when you say a team has a 75% chance of winning, it’s not a failure if they lose- it’s only a failure if they lose more or less than 25% over the long run. That perspective is much easier to come by with practice.

· You’ll probably be far more engaged in doing all of the little things. How many of us have slogged through a Coursera sample problem or a Python tutorial, feeling like it’s a homework problem you just have to do? Plenty of people have a passing interest in getting better at analytics, but as soon as it feels like just another job, they don’t go any further, and analytics becomes another distant aspirational thing on the to-do list. If you find a burning question you really want answered, though, you’re much more willing to go through all of the mundane details. This isn’t a trivial part of learning analytics; personally, I think finding a question you’re passionate about answering is the most important part when starting out. It’s kind of like exercise: the most important part is sticking to it over the long run, and that’s a lot easier to do when it doesn’t feel like work.

It’s one thing to jump into sports analytics just to get some practice. But if you actually produce something interesting and groundbreaking while you’re at it? The barrier to entry for actually contributing to the field has never been lower. The blogger-to-employee path happens with increasing frequency every passing year. There are no shortage of forums to present your findings. And speaking from firsthand experience, I’m hard pressed to find a more intellectually curious and humble group of people than the sports analytics analytics crowd: I probably enjoy those discussions more than any other in all of analytics.

So grab a data set and start asking questions. There’s always room for more ideas.

You can find the author’s current sports analytics work at http://www.sbnation.com/advanced-baseline and http://www.fantasylabs.com/daily-fantasy-golf/ .

--

--

Colin Davy
State of Analytics

Colin is a consulting data scientist in San Francisco, a two-time winner of the Sloan Sports Analytics Conference Hackathon, and a Jeopardy champion.