A Journey Into the Heart of Sports: Data Viz with Daren Willman
Radar Wars and the Theory of Sports Data Visualization
One of the first basketball analytics projects I ever did, like all great inventions, was born of a desire to procrastinate on something ostensibly important (this was three years ago; I was in college, so you know how that goes). I was in the middle of my first semester of data science classes, and I figured at that point, the entire world was at my fingertips. So, I decided to use that newfound power to try and compare NBA players to other historical stars. I did some fancy machine learning (and spent a lot of time on Stack Overflow), got some outputs, and thought to myself about the best way to view the results.
I had played my fair share of Madden at that point, so my mind went straight to using radar plots. I thought it was a perfect fit, providing some geometric definition to a wide range of metrics, a template across which to compare multiple players at once.
I posted my article, tweeted it out, a bunch of folks clapped and pressed the “like” button, and I felt validated. That’s how my basketball analytics career started out.
One year later, Daryl Morey (Houston Rockets GM), Luke Bornn (VP of Strategy and Analytics with the Sacramento Kings), and Sam Ventura (Stanley Cup-winning Director of Hockey Research for the Pittsburgh Penguins) — three pillars of the sports analytics world — came together on Twitter to dish out some advice, and I quote exactly: “No analytics person worth his salt uses radar plots.” That’s how my basketball analytics career almost ended.
Thankfully, it did not. But the ensuing “radar wars,” as dubbed by Ted Knutson, CEO of StatsBomb, inspired a frenetic conversation around effective data visualization in sports. It was the first time I’d been really tuned into the theory of data visualization, so to speak. Now, two years later, I’d like to think (rather, hope) that I’m slightly more seasoned and have learned a few more lessons along the way. Sports is such a rich field for analysis and experimentation, and I’m glad that it’s given me something of a sandbox in which to develop my own skillset. All of which brings me to a month ago, when Jason Forrest and Elijah Meeks floated the possibility of having a dedicated vertical for sports visualization as part of Nightingale.
An entire section of the publication dedicated to “Radar Wars?” Maybe not exactly that, but perhaps something along those lines. It’s an exciting time for sports analytics, and data visualization plays a huge role in that. The more I thought about it, the more I believed that it would be a great opportunity to bring together a diverse and talented group of analysts. In doing so, we advance some perspectives about the more academic and theoretical considerations of what goes into the graphics that they produce and the thought process of how to most effectively communicate complex ideas in a visual manner. Ultimately, it continues the cross-pollination across various corners of the sports analytics field.
So that’s the plan. Whether you’re interested in hockey or in basketball, radar plots or Tableau dashboards, or just somewhere in between, I’m confident that there will be something to pique your curiosity. We’ve got a broad range of articles in the pipeline, from technical tutorials and Radar Wars-style meta-commentary to deconstructions of original visualizations and interviews with leaders in the field. We’ll have content from folks who have not only traversed the public analytics and media landscape but have also plied their trade within professional front offices.
I wanted to use this opportunity to present the sports data visualization aspect of Nightingale in a way that showcased both the current state as well as the potential of the field. That means kicking this off with something more than just a dorky story and a pseudo-mission statement. Something that goes a step further than simply saying “this is how we got here” and exploring what it means to even be “here.” To that end, I actually chatted with the force behind some of the coolest and most engaging sports visualizations out there today: Daren Willman, the Director of Baseball Research and Development for Major League Baseball, and creator of the popular Baseball Savant website. Let’s do this!
An Interview with Daren Willman, Director of Baseball Research & Development for Major League Baseball
The following interview has been lightly edited for clarity. “SN” refers to Senthil Natarajan, and “DW” refers to Daren Willman.
SN: Let’s get some quick context for our readers. Can you tell us a little bit about your background and experience?
DW: Sure. I’m Daren Willman, the Director of Baseball Research & Development for Major League Baseball. My background is in computer science — focused primarily in software development, but most recently, I’ve been extremely interested in data visualization and how it can help sports teams, players, coaches, and front offices better understand the massive amounts of data available to them.
I played baseball all through college and always loved the game. After I graduated, I worked in law enforcement for almost 10 years, developing criminal history software. In 2012, I stumbled across a massive baseball dataset called PITCHf/x. In 2008 Major League Baseball had set up cameras in all 30 stadiums to track the metrics (velocity, release point, movement, pitch type) of every pitch thrown during a game. As soon as I saw the data there was definitely a watershed moment of all the cool things that could be done with it. I immediately started scraping all the data and coming up with ideas on how I would want to use it if I was a player so I started developing visuals that I might want to see. They started off fairly simple like a spray chart of where balls in play were hit and the location of where pitches were thrown. Shortly after that I started a website, Baseball Savant, and started adding the tools I was developing. I never really expected to work in sports but as the site got more popular opportunities started to come up with teams and the league. That’s when it started to become more of a reality that I could possibly end up making a move to sports full time.
I spend a lot of time analyzing, developing applications to visualize the data and trying to figure out what we can do with it that baseball fans might enjoy.
My role with MLB is pretty much a dream job for me. I get to blend two of my biggest passions, technology and baseball. Most of my time is spent working with the new player tracking system Statcast. Statcast tracks pretty much everything that happens on the baseball field. Player positioning, pitch tracking and hit tracking. Needless to say, this is a massive dataset, so I spend a lot of time analyzing, developing applications to visualize the data and trying to figure out what we can do with it that baseball fans might enjoy.
SN: So, you post a lot of those visuals on Twitter, and they’re really popular! What’s been some of your favorite public visualizations that you’ve done?
DW: Here are some of my favorite data viz tweets of recent memory…
SN: How do these viz, the ones you post publicly, differ from the ones you create for your job? What is that process like — accounting for the types of stakeholders you have now and the different conditions of satisfaction you deal with when it’s something bigger than a hobby or personal project?
DW: The public visuals I do don’t differ all that much from the ones I create at MLB. I’m given a lot of freedom to decide and iterate on what I find interesting. I like to share the progress of things I’m working on to the baseball community as I work on them and tend to get pretty good feedback from social media. It can be toxic at times because dataviz is similar to art. Some people will really like the visuals, but some people will hate them. I think one of the more challenging aspects of data visualization right now is how hard it can be to develop them. There’s a stigma with data visualization that it’s easy. In order to be really good at data visualization you need to be pretty good at several different disciplines. Scraping or wrangling data, organizing it, designing the visual and actually writing the code to display it are all unique challenges.
There’s a stigma with data visualization that it’s easy.
SN: I really like a few of the concepts you just brought up there, so let’s drill down a little bit more on that. The first is the idea of leveraging a community to help move your work forward. What have you found is the best way to interact with a community or some best practice tips on putting your work out there to get feedback from that community? What’s the best way to make use of such a broad range of feedback? Was it ever kind of intimidating to just put your visuals out there in public when you first started? One of the topics we’ve discussed among the DVS members is how tough it can be for people to sometimes allow themselves to open up their work to a larger audience, for a broad range of reasons (but also the importance of doing so!). What are your thoughts on this?
DW: My typical process when working on new visuals goes something like this: after I’ve designed and collected the data for my idea, I’ll mock something up at blockbuilder.org using d3 (I do about 80% of my work with d3, the rest of my work is typically done in three.js). It’s a great tool that Ian Johnson (@enjalot) made to rapidly develop in d3. After I think it’s polished enough for a first iteration, I’ll tweet it out to a broader audience and see what the community thinks.
The feedback I receive can vary but there are often times I don’t consider certain things when creating visuals, typically color choices that I might have made that don’t work well with color blind people. However, people will ask, “Why didn’t you use X chart instead?” which helps me think of the problem in a different light. It can definitely be intimidating putting yourself out there, especially on a platform such as Twitter that can be toxic, but the more you do it, the more you learn to ignore the people who aren’t giving you good feedback.
A perfect example of feedback using Twitter is here. I created a scatter pie chart using a concept I saw Elijah Meeks mentioning and, as always, the anti-pie chart crowd came out. I think this is an exceptional way to visualize every pitch a pitcher has thrown since we’re dealing with larger amounts of data. A typical scatter plot would be way too much for this but scatter pies are perfect because they allow the data to be binned to the location and show which types of pitches, how many, and the general location — but someone will always gripe since it’s a pie chart.
SN: I am one of the people who gripes about a pie chart! But I will agree with you that I can’t really think of a better way to bin the data appropriately for that specific purpose. This brings up an interesting point, which is designing for accuracy/precision versus impact. I recently wrote a little bit about how graphs with radial form factors really tend to bring out this tension. How do you straddle that line and balance those considerations? And have you found the arena of sports, which tends to be sometimes a very traditionalist space with a lot of inertia, to be more or less difficult to get buy-in on new or more niche/unique types of dataviz?
DW: I think it really just depends on the circumstances. You need to know that you’ll never make everyone happy. Going back to what we were discussing earlier about dataviz being similar to art, artists can’t expect everyone to love their work. I think as long as you make a conscious effort to ensure the numbers or data you are trying to convey are shown clearly and not in a deceptive way, then using something like a radial, pie chart, etc. are fine. Another visual that seems to bring out this tension is radar charts but I think when those are done properly, they can in fact be very insightful [author’s note: score 1 for #TeamRadar].
People seem very receptive to new ways to view data in sports.
I think sports is actually great when it comes to new data viz. From my experience, people seem very receptive to new ways to view data in sports. Sports, in general, has so many statistics and data now that it’s perfect for creating and developing new types of ways for fans to look at it. I think the younger generation of sports fans especially enjoys good ways of viewing the sports that they like. I’ve definitely found that the visuals that do exceptionally well on social media are ones which are quickly digestible and convey a clear message.
SN: As part of the dataviz process, you mentioned “designing” the visual. Can you elaborate a little bit on your design thinking process? How does an idea take shape? And how does that idea become reality?
DW: My design process isn’t very scientific. Typically, I’ll come across a really cool dataviz and think to myself how can I do something similar with the data I have. I really like going to Andy Kirk’s website. He writes a “Best of” blog every month, and I’ll go there for some inspiration when I’m in the mood to create something. Also, with the recent addition of the DVS there’s so many awesome ideas flying around, it’s an awesome place to get inspired. It’s great practice to see something you like and re-create it using something you’re interested in from scratch.
SN: Let’s continue that train of thought on the dataviz process. Being from a data analytics background myself, I keep going back to your point that dataviz is a very complex, multi-faceted process. I always find in data science projects that I spend a very large proportion of the time on just finding and preparing data, so it’s encouraging to hear this from another perspective as well. Can you go a little bit into how people can attune themselves to that mentality of developing a well-rounded skillset in order to be good dataviz practitioners? How did you develop that multi-disciplinary, end-to-end approach? Does the data step get any easier now that you’ve got the full force of the MLB data infrastructure behind you?
DW: I was fortunate to come from a more data-centric background. I have been dealing with relatively large datasets since I graduated college, so wrangling data came pretty natural to me when I began dealing with visualizations. I think keeping focus on what I’m trying to do has always helped me. If you have an idea of visualization you want to create, it always helps me to think of exactly what data I need to create that idea and how can I get the data. Typically for my visualizations, all the data I need is in a SQL database so I can query it and extract it pretty easily, however, sometimes I have to write a quick script to scrape a site or an internal json feed. There are so many toolsets out there now to scrape data; it’s gotten pretty easy to do. I actually find the data wrangling process to be almost therapeutic. It’s like trying to solve a puzzle.
Also, when just starting out with dataviz, it’s very easy to get discouraged. The process can be tedious and difficult, but there are so many skills you learn end-to-end that even if the project doesn’t turn out exactly the way you want, it’s great practice for the next time, and every time it gets a little bit easier. I think grabbing the data I need has gotten a bit easier since I’ve joined MLB. My colleagues and I have spent many hours developing an internal data warehouse specifically for grabbing data in an easy fashion. There are always tricky situations writing queries when dealing with rolling windows and condensing millions of data points down to a concise dataviz, but that’s part of the fun.
DW: Yes, I noticed animations tend to do better on social media. I think by adding animations to certain visualizations, it helps catch the eye and exaggerate certain points I’m trying to tell. Having a static scatter plot of a pitcher’s strikeouts can help paint a picture, but when you show the same scatterplot point by point sequentially in an animation, it really helps drive home the fact that the player has a whole bunch of strike outs, base hits, or whatever metric I’m trying to convey. Also, I think transitions just look cool and are fun to play with; I use them a lot to test out new ideas I’ve been thinking about.
SN: Let’s talk about that for a second, the concept of how to really drive home an idea. You’ve obviously talked about animation here, but what are some other techniques or ways that you experiment with or utilize to optimize your data viz? Colors, reframed perspectives, etc.…
DW: That’s definitely a great question. Animation certainly helps draw the attention in especially on a medium such as Twitter but using color is a great way to highlight certain points on a visual. Recently, I’ve been experimenting a bit more with opacities to highlight certain points of animations and then slowly fading out the colors as the next bit of data is flashed on the visual. I still consider myself a novice when it comes to most visualization practices and I’m always researching and looking for inspiration on this very topic. I’m extremely thankful for the dataviz community because that’s where so many of my ideas and experiments initially come from.
SN: Speaking of community, who are some influential folks that you think you’ve been able to learn from, folks who have perhaps provided some inspiration for your work? And since part of the inspiration for this interview is in helping to launch a sports dataviz focused publication, what are some of your thoughts on how to build a strong dataviz community for sports? Sports are such a rich field, but it certainly feels like a more infant part of the dataviz world than, say, healthcare or enterprise business analytics.
DW: There’s lots of people I take inspiration from but here are some who comes to mind, not in any particular order: Elijah Meeks, John Burn-Murdoch, Nadieh Bremer, Shirley Wu, Andy Kirk, Edward Tufte, Peter Beshai, and RJ Andrews. As for sports still being in its infancy, I 100% agree with you. I think Tableau has really helped the learning curve for sports data visuals though. All you really need is a CSV of sports data to start playing around and I’ve seen some exceptional visuals done in it recently.
Hopefully, dataviz applications like that can allow for more people to get interested in the field. I typically tell people who ask how to get a job in sports that they should learn data visualization. Teams are looking for ways to breakdown the enormous amounts of data they’re dealing with to the players and need people to create those visualizations. The experience you learn from wrangling the data, manipulating it, and creating the visuals will help you in any other aspect of your career so it’s a win-win even if you don’t get a job in sports.
SN: Do you think there tends to be this misconception that artificially inflates the barrier to entry? Like people think “unless I’m a PhD level full stack developer with 10 terabytes of data, there’s no point” when in reality, like you said, sometimes it’s as simple as having an interesting CSV? And if so, how can we as community work to change that perception or lower that barrier to entry?
DW: I think there certainly was a misconception, but I feel like that stigma has slightly gone away. I see quite a few people doing visualization with R and Tableau lately. A good approach to get started is to come up with a question you want to see answered. For instance, when I started, I wanted to see a particular player’s base hits overlaid on a field. That simple question basically started my passion for data visualization. I started researching where the data was, how I could get it, and once I had it, how I could overlay it onto a field. Breaking the various pieces down into smaller individual steps really helps me when I feel overwhelmed. I think today it’s easier than ever to at least get started with sports visualizations. There are so many packages to get started with the data, especially in R, like baseballr, nflscrapr, nbastatr, and nhlscrapr. None of those packages existed several years ago but now it’s pretty easy to get data.
SN: Yep — having access to rich public data really seems to be an inflection point for analytics in any sport. Alright, so I’ll wrap this up with an obligatory quick hitter. How do you think something like the Data Viz Society can help to advance the field of sports viz as it grows? What kinds of things are you looking forward to seeing?
DW: I think the Data Viz Society can help advance sports visuals tremendously. There are tons of people who have gotten involved in the DVS community in a relatively short amount of time. It’s cool to have a channel dedicated to sports related visuals where everyone can share interesting work they’re developing. I really enjoy the community so far because everyone seems receptive of feedback and also honest and helpful in their critiques. I hope to see the DVS continue to thrive!
Thanks so much again to Daren Willman for his time and input for this interview! Be sure to follow him @darenw on Twitter if don’t already. And look out for more sports-related pieces, starting with some basketball and hockey articles, coming soon!