Assorted Sports Analytics Mailbag

Sam Gregory
9 min readJun 23, 2020

--

Last week was my last at Sportlogiq — a computer vision driven sports analytics company based out of Montréal. Working in start-up/tech company environment was a new challenge for me and I learned a lot in my two years there both in sports analytics and in applied machine learning + AI more generally.

For those interested in working in this field as of writing this the posting for my old job is still active.

When I left Opta a couple years ago I did a mailbag reflecting on my time at the company so I decided to do something similar as I leave Sportlogiq and return to student life as a grad student researching the intersection of sport science and sports analytics at Victoria University.

So thanks to everyone who sent in questions, I’ve answered a few below.

The industry has exploded over the past few years. When I left Opta — now Stats Perform — Statsbomb was just entering the market and there were a few other new analytics companies offering services to clubs (21st Club, SciSports etc.). Now that market is booming and their seems to be new technology companies entering the space all the time offering all sorts of services and different types of data. I think more companies in the space has helped push the industry along so different companies can focus on meeting specific needs rather than there just being a handful of more general “football data companies”.

As for your second question companies like Sportlogiq are already collecting comprehensive tracking data sets from broadcast feeds, but in terms of automating event data from broadcast I certainly think the industry is moving in that direction and that will be one of the next big breakthroughs.

Currently collecting quality event data takes a least 2 people, my guess is the first breakthrough of computer vision in eventing will be a hybrid system that takes that down to 1 person and potentially to less than 90 minutes of time per match. For example maybe an automated eventing system collects all passes and then a human will manually annotate the body part used to make the pass (this isn’t based on anything I’ve personally seen just a prediction). But we aren’t a million miles away from event data being collected entirely automatically.

I just wanted to ask about tips/steps for those looking to break into sports analytics. I’m building my foundation in analytics now by learning Python then planning to a pursue an analytics role before trying to jump into sports analytics. Any advice, specific tools/concepts to learn etc would be a huge help!

I’ve written two pieces on this in the past here and here but I think this is a good opportunity to address something I mention in both articles which is my suggestion to start doing public work and working on sports analytics in your own free time.

I was having a conversation about this with a former colleague of mine David Yu, the team lead for hockey analytics at Sportlogiq, about doing public work which I’m sharing here with his permission. David is a first generation immigrant to Canada and came into sports analytics from the world of academia where he studied biology. He wasn’t involved in the public sports analytics community at all before joining Sportlogiq but since then has made numerous important contributions to the field of sports analytics.

He made the point to me that the ability to do work and make it public before you have a job is something that is only really possible from a position of privilege. Learning data science techniques and tools requires lots of time and commitment and if you are doing public sports analytics work as you learn you are almost certainly not being paid for that work initially never mind just doing it to satisfy your own curiosities. This is something I had the luxury to do because of my privilege of relative financial security but that David didn’t. I’ll use his words directly here to elaborate on the point:

I get that doing public work will give you a leg up. But I think it just needs to be addressed that not everyone has the means to do so and this disproportionately affects POC candidates.

You can recommend it but I think you have to acknowledge that not everyone can do it and it contributes negatively to the diversity in sports tech

I have to admit this isn’t something I’d really considered that much — that when I said the best way to get a job in sports analytics is to do work publicly I may have been contributing to the lack of diversity in the field. Part of the problem is I think that the advice is still correct — the best way to get a job in sports analytics is to do work publicly — but it’s not something everyone interested in the field has the ability to do.

There are a few initiatives I’ve seen which help address this problem (mainly in the form of paid internships prioritizing non-white candidates etc.) but it’s still a real issue and I’m not sure what the best work around here is for the trade off between showing what you can do in the public sphere and the ability to actually do work in the public sphere in the first place. I would be interested in any ideas or thoughts people reading this have.

These two questions I think are quite similar and dovetail into what I was discussing before — what jobs should I do before I work in sports analytics. Firstly, I have to admit it’s a bit hard for me to answer this question because I’ve never had a full time job outside of sports. What I can address is the types of resumes that stand out when I look through potential candidates — I would say any type of non-standard data science work is a bonus. Something that the candidate worked on with non-traditional data and was able to solve problems that are outside pure marketing/sales data analysis. Of course experience in sports outside of analytics is helpful as well — if you worked as a data scientist but then also have experience refereeing or coaching that is a bonus.

Very broad question so I’ll focus on post-Ferguson Manchester United here.

The obvious answer for the biggest flop is Sanchez, but the one that surprised me more was Memphis. I genuinely think he could have been world class at United and his performances at Lyon suggest the club just completely mismanaged him. I think we sometimes underestimate the jump from the Eredivisie to the Premier League, but he had a few flashes even at United where he looked brilliant. I think with a bit more time under a more stable coaching set up and a more attack-minded manager than Van Gaal he really could have thrived.

The biggest surprise from a positive perspective for me during this post-Ferguson period has to be Rashford. I was actually at his debut against FC Midtjylland and remember being disappointed with him in the first half. Turns out I was a bit wrong. Then there was a also period when seemingly the entire analytics community was calling him out as unsustainable and comparing him unfavourably to Iheanacho. Even then I was skeptical Rashford would turn into the forward he is today. Quick, clever and a great finisher one of the most impressive homegrown Manchester United players of the last twenty years.

I think there is quite a big difference between the work I did with teams at Opta and Sportlogiq. Opta was a well-established player in the football analytics world selling a product that everyone knew when I joined whereas Sportlogiq was a start-up selling a product (broadcast tracking and metrics derived from it) that almost no one knew about. Because of the relative sizes of the companies I spent way more time talking to clubs and interacting with clients externally at Sportlogiq. A lot of this was demonstrating how broadcast tracking can be used and what the advantages are over traditional multi-cam tracking — namely breadth for scouting and recruitment.

There were obviously plenty of frustrations and I assume it’s very different from working at a club but in general I had very positive interactions with clubs during my time at Sportlogiq. Part of it comes from the fact I think there is a bit of a selection bias, the people at clubs who see the most potential in Sportlogiq’s products and want to talk to me about it tend to be the people who are also most inclined to think about the game and analytics in a similar way that I do.

Working in a cross-sport analytics team is a great experience and I think more pro teams should do it — especially where ownership clubs own multiple clubs. Sportlogiq had analysts working in ice hockey, soccer and American football, obviously ice hockey and soccer is where there are the most natural overlaps between the sports. There were so many ideas we were able to go back and forth on and if you look at the work that the Sportlogiq analytics team has produced over the past couple years I’m sure you will see the influence of soccer in hockey and vice-versa. Thinking within the context of another sport forces you to think in ways you don’t normally and pushes the boundaries that you normally set-out problems in. One thing I’m looking forward to in academia is doing a bit of work in other sports as well as football.

I can’t compare directly between working for a club and working for a sports data or analytics company because I’ve never actually worked at a club but I can certainly talk about some of what I’ve learned. First off when you work at a company you end up “working” for lots of clubs in small and big ways so I probably have a better idea of what clubs are doing and how they are applying data in a broad sense than most people actually at teams do. This extends to how data is used in the media and betting spaces as well. I probably end up with a better sense of the industry as whole.

Another advantage is that you work with people who are interested in what you are doing. You aren’t the nerd working in a sports environment where there is inevitably some skepticism about the role of analytics at the team, you are working at a company where everyone believes that data should play a key role in sports so you don’t feel like you ever have to justify the existence of your entire field.

Probably the biggest disadvantage is that when you send work or products off to a client that is it. You don’t have a say over how it’s used, whether or not it is being completely misinterpreted and if it is used well you don’t have the opportunity to celebrate the success of your work in the same way. I also think there is something very appealing about the idea of working in a competitive environment like that — after all the competition is part of why we all love sports — and it is something I definitely want to have the opportunity to do in my career.

I chose a very bad time to go back to short hair — right before a 4 month ban on haircuts just so I have to struggle through the awkward middle-length stage again. My hair routine is pretty standard though: shampoo and conditioner probably every third day and a light curly-hair product daily.

I’m not ready to give up on De Gea yet, I still think pre-2018 World Cup he was probably the best goalkeeper in the world and I don’t think you just lose that, especially at his relatively young age for a keeper. That being said Henderson is obviously the goalkeeper of the future at the club and I wouldn’t want to do anything to risk him leaving. I think if you recall him next season you either run the risk of him making a few mistakes early on and being forced to drop him or him only being able to play cup games with De Gea taking the league games. Both of those options I think would make him more likely to leave permanently than another season on loan at Sheffield United so I would let him stay at least one more year there and see how De Gea gets on.

— — —

I appreciate all the questions and hope you got something out of this, hoping to be able to write a bit more going forward. Thanks for reading!

--

--

Sam Gregory

⚽️📊📉📈 | Data Science + Sports Analytics | Grad Student @iHealthSportVU