Foundational Learnings for Cricket Data & Analytics

What we can learn from other sports about building effective analytics processes, teams and culture

Amol Desai
Boundary Line
18 min readSep 8, 2021

--

In the time that I have been actively working with data in Cricket, I have had several conversations with folks entrenched in Cricket analytics, media and teams at the highest levels and I have built an understanding of the state of data & analytics in the sport and the opportunities that exist to do it well, especially in the long run. In the past couple of months, I sought out & met practitioners and leaders in American sports analytics to help me understand the state of data and analytics in their sport, the challenges they face that we can proactively mitigate in Cricket, the frontiers that they are excited about and other tips and learnings that we can leverage to efficiently and effectively impact our sport.

Cricket may be a tad late to the sports analytics party, but it doesn’t have to follow the same path that others went through. It can leapfrog to be at par with other sports or even lead certain aspects if we can successfully leverage the progress made by others. Being late brings with it advantages of less accumulated tech debt and fewer headwinds from the inertia of large infrastructure & cultural investments. An analogous example that comes to mind is how Africa, with its lack of significant landline infrastructure quickly ramped up on cellphone technology in the previous decade. In order for this to happen, we need to act early and take a holistic perspective on our design decisions and investments.

Here, I am sharing what is a conflation of my thoughts and my notes from the conversations that I have had recently (2021). To provide perspective on the coverage that I got from these conversations, here is a brief summary on the backgrounds of folks I talked to:

  • R&D content producer for a third-party firm that works with MLB, NFL & NBA teams providing data and models evaluating player perf and in-game strategies.
  • R&D researcher for third party analytics provider across multiple sports
  • NFL team research and strategy director
  • NHL team analyst
  • Directors of strategy and research at multiple NBA teams
  • Director of sports analytics at media company
  • Sports analytics and management professor at top universities
  • Senior research and leadership at the NBA
  • Baseball strategy and player valuation coordinators at MLB teams

Why do we even need to think about this stuff?

With the amount of buzz that has been created, you can be an audience and consume the game at different levels of depth, from different sources and different types of media, or you could be on the stage; actively engaged in running, storytelling, analyzing and decision making around the sport or its various media and team entities; as long as you haven’t been under a rock, you have some impression, and possibly opinion on data & analytics in Cricket.

This is all great and shows that the game, its participants and its audience are ready for the advances and enhancements in competitive gameplay as well as audience engagement that data and analytics have brought about in other sports. There is a ton of work to be done around making all the depth that we can bring to the game simultaneously accessible to decision makers and audiences alike. And, there is a ton of risk around the current state that makes me wary of premature dismissal due to early failures, misinterpretations or sour experiences. There is also the risk of being over-eager and having a myopic, short term focus rather than a holistic, longer term one. This will lead to a meandering journey. You want to allow for a path with switchbacks, not plan for it.

Don’t get me wrong; I am all for enthusiasts of all skill levels and even backgrounds (this doesn’t have to be limited to code and stats; think visualization, communication, media creativity etc.) pushing their weight. In fact, I’ve started an initiative to enable this and have been an advocate of open sourcing data and coordinating efforts amongst the enthusiast community. What I am saying though, is that we need to manage this well. We need to manage this well for the people who are contributing, for the businesses that are going to be funding the process and leveraging the outcomes, for the players, coaches and teams that are going to be served and assisted and for our game that is giving us an opportunity to make a dent.

The perception around the state of analytics in Cricket can be broadly categorized into three types:

  • Folks who believe that we are already using a lot of advanced data & stats in several aspects of the game and have hit a maturity level where leveraging most of what can be leveraged from it. These folks, unfortunately seem to tend to be closer to practitioners and decision makers in the game.
  • People who believe that stats are unreliable. You know what? These people are right on two counts: a) You can’t be data driven in sport to the extent that you can be in some other industries because there just isn’t enough data and often, there are quality issues as well. It is important to understand limitations and use data intelligently as an aid. b) The primary thing to understand about using data for decision making in sport or otherwise, is uncertainty in outcomes. However, the uncertainty isn’t introduced by using data, it is exposed by it. This is a key distinction.
How about - enhance your instincts with data and then use your instincts?
  • Finally, there are practitioners who don’t know what they don’t know and frankly end up misusing the data. This leads to a communication and credibility nightmare or bad decisions or both. My favorite example here is how often we hear about matchups between individual players when both players have faced each other for ten odd deliveries. These also tend to be folks who will tout “moneyball” & “AI”, but won’t have followed less sensationalized stuff that came after moneyball.
If the BCCI Job description for the national Women’s team performance analyst from 2019 is anything to go by, we are in the dark ages (link)

Compared to other sports, the general perception is that cricket is not as mature on the analytics side, and my narrative here is also leaning towards that perception. I’ve talked to leaders in Cricket and they shared this sentiment as well. I should say though, that I was a bit surprised to learn that the gap may not be as vast as perceived on the technology stack side; but this has more to do with other sports not being as advanced as perceived rather than it does with the perception in cricket itself. I’ve also learned from my own outreaches and connections that IPL teams are just getting started looking around in the advanced analytics and equipment industry for collaborations and services, which is great to hear. However, again, how all the available help and tools get wrapped up in a coherent, coordinated and focused operation is critical to leveraging it for success.

Enough said (hopefully), let’s get to the learnings already!

OK. Let’s do it.

I’ve divided this up into 4 sections: 1) Maximizing impact 2) Third party analytics vs in-house analytics teams 3) Maturity & Frontiers 4) Building an Analytics team.

Maximizing impact

  • Most sports data sets are much smaller in scale and much sparser than data in other industries. As a result, one can’t expect to be “data driven” in the same sense and a recalibration on expectations is needed.
  • Across the NBA, NFL, MLB & NHL, it is harder than other industries to get to deeper work and higher levels of maturity. The reason for this is that the analytics community within a sports team becomes an echo chamber and partnership with stakeholders is harder to build. As a result, vetting of work done & feedback is harder to come by. There is also a data literacy gap between the people consuming and people producing analyses. I believe both of these problems can be addressed to some extent. But it takes carefully building culture and relationships.
  • Tip: See every interaction as an opportunity to learn and educate rather than an argument to be won. This is a small bullet, but worth every ounce of attention that you can muster.
  • The audience for an analytics team keeps changing frequently. Coaches and players change and bring in with them, expectations on interactions and experiences based on things they used in their previous team. So both in the product and methods, there is a balance to be hit. This involves some convincing and some catering. The immediate thought here is that standardization via third party services can help. We’ll look at the tradeoffs around that in the next section and some more on standardization in the Maturity section.
  • Visibility of solutions for the right stakeholders, building credibility & a positive perception of the work done and its influence, and ensuring that analytics work results in solutions and tools that become a part of the regular decision making workflow is important for impact. This invariably results in more critical contributions. There are too many pieces of good work that are done in an adhoc and uncoordinated manner that become disposable. It is very hard to build incremental value in these cases.
  • Even in 2021, actual impact derived on the ground varies greatly from team to team in the MLB & NFL. Some teams tend to plug & chug a fancy model or solution that they got from a third party analytics firm. Here, the people who decided to bring in the third party; typically the front office leadership, are not in tune with the in-house decision making team around aspects of the game itself; typically coaches & players. This results in the tool/model not being used; a wasted opportunity, or used blindly and hence sub-optimally. For teams, there is an obvious lesson on improving communication and collaboration, but this is good insight for the third party as well. Given that this could be an issue with the team they are working with, aiding credibility building by doing something as simple as adding a case study with the model delivery itself, something that is often overlooked, can be effective.
  • It is important to set expectations on the outcomes that can be achieved and the role that data can play in obtaining strategic or performance advantage, but also on the work needed for preparing the right data sets with the necessary quality and having the right tooling in place to be able to achieve these agreed upon outcomes. Often times, stakeholder demands are grandiose but their willingness to invest and prioritize dependencies from their end doesn’t match. I agree with the take on the data itself, but I actually don’t think that the mismatch in demands and investments is a unique problem to sports. This is why math, data and modeling skills aren’t sufficient to have in analytics orgs. There is a strong need for setting technical and directional vision, evaluating holistic considerations and aligning with decision makers.

Third party analytics vs in-house analytics teams

  • In-house teams can typically go deeper into specific problems simply because they know more about problems that are specific to them. It makes sense for them to be protective of some of their work and while in season, they have the need to stay nimble.
  • Third parties on the other hand can go deeper into more general research areas. Third parties can provide the most value by building common tools and building blocks. For them, solutions that scale are critical as a core and unique value- add as well as for efficient resource usage. They typically have a larger analytics workforce than teams and so they can invest in research. Once they create foundational tools, they can easily keep cranking out incremental improvements. This way they can get more market coverage (reach more teams) and complement team analysts.
  • For teams not having strong analytics teams, hiring a third party is a short term, quick way to up-level themselves. This is the level of maturity that a lot of cricket teams seem to be at right now. As they become more mature, this is a good mid-term aspect for both teams and third party providers to keep an eye on in terms of envisioning their own evolution. Ultimately, in-house teams are going to have to be the preferred way for teams to differentiate as is the case in the MLB & NBA at this point. In this more mature world, even upcoming third party analytics firms see their value as a X-check tool for established team departments, rather than being a primary driver.
  • The biggest challenge that teams face, by far, is the challenge to coordinate across vendors. I see that cricket teams have this challenge as they work with multiple vendors and their own in-house team, farming out different decisions to different parties without good coordination, and are generally unable to connect the dots across these decisions. This results in chaotic and potentially cannibalizing decision making & hurts the ability to communicate clearly to coaches and players among other things.
  • Team researchers want to understand what’s under the hood of third party models and tools. This doesn’t happen for every team as we saw earlier, but this is the reality of business with more data-mature teams. I don’t think we are at this level of maturity with cricket yet, but it is a good place to aspire to be at. This will help with the vetting and feedback problem that we saw earlier and instill accountability. This will also ensure that tools and data are used appropriately and thus, help maximize value.
  • The MLB & NBA own and open-source (at least to teams) a lot of data. However, even with this, 3rd party data providers still have value to add. They can employ an army of people to annotate and enhance even existing data sets. Although nobody expects the ICC or national boards to become data providers and owners any time soon, we will get a lot of benefits of standardization if that happens. It is important to note, that 3rd party providers can still have a major role to play and so dismissing the idea may not be an obvious choice for them to make.
  • In the absence of league managed data, third party providers have a significant role to play in standardization and monopolies may be helpful. Third party model and tool providers help with standardization of tooling, user experiences, and jargon. This helps with changing personnel on the on-field as well as analytics side of the teams, and with enabling incremental work in the industry. For the NFL, ProFootbalFocus has huge coverage and market share and is pretty much a monopoly. It has destroyed competition but this has helped with standardization. Hat tip to the Cricvizes and Kadambas on the cricket side of the world, both on ploughing ahead and on watching out.

Maturity & Frontiers

If there is one takeaway for leagues and boards here, it is that standardization of data collection and distribution and figuring out tiers of access, including open sourcing some of it, goes a long way in uplifting the state of analytics in the sport. This is needed in cricket to see a Ravi Jadeja like transformation from “bits and pieces” to “legend”.

The takeaway for any entity handling data here is that it is critical to do it well. Tech debt slows things down to a snail’s pace and discourages practitioners.

Having said that, the below will also provide some perspective on my earlier comment about several American sports not being that far ahead of cricket.

  • MLB has hit a level of maturity where there are few new discoveries to be made with existing data. But, there continues to be plenty of opportunity in minor leagues, and at the college and high school level.
  • The frontier for MLB in terms of new data is on biomechanical tracking and there is a lot of excitement there. Teams collect practice data through vendors offering sensors plus data and apis and such e.g. catapult, pitchvision etc. Even Hawkeye is relatively new in the MLB. It is being used but still not proven.
  • MLB, the league, owns data and shares w/ teams. This includes some Hawkeye data. Hawkeye has some additional stuff that teams can pay to obtain. Teams can also agree to mutually share this data e.g. tracking and biomechanics from their respective home stadiums. There is some strategic advantage to gain and business value to exploit by leaving these doors open.
  • Scouting reports in the MLB are interestingly in the form of depth charts, grades and long form write-ups. To be clear, Cricket is quite far from this. In the MLB, Data is mature enough now that reports in long form are starting to be given less weight. It is used in a bayesian sense — prior comes from R&D and models and scouting report is used to adjust that belief.
  • Even in the MLB, research is mostly self-driven. Selling research work to stakeholders is not trivial and this is an unsolved problem not just in terms of funding, but also in terms of value perception and prioritization. It is an ongoing conversation and a bit of a chicken and egg problem. The value of this kind of work isn’t visible until it is done and getting it done comes with a significant level of uncertainty and risk. For cricket, I think just being cognizant of this problem & setting appropriate expectations can go a long way. We have an opportunity to mitigate this or at least not be surprised by it.
  • NHL & NFL still have 2 person analytics teams. Teams in the MLB have 7–10 times that.
  • NFL has a long way to go on biomechanics. Not all teams have started to use it but tracking is well and truly in place from a data collection perspective. Using Player tracking data is hard and most teams are still learning how to do it well. This can be considered to still be in research phase.
  • NFL handles data collection. It is centralized and standardized. NHL has recently started standardizing and providing data. So far, it has been third party vendors collecting and providing data.
  • In the NHL, data science is a fairly new introduction and it still has the last seat at the table. Decisions are made by old school methods. Former players are in leadership and decision making roles. People managing analytics teams are usually ex-video analysts and others in similar roles from around teams and can provide limited guidance on leveraging data for deeper insights. Instead, they provide valuable pointers from a domain expertise perspective.
  • The NBA has matured quickly and well. It is more mature than NFL & NHL. It used to be challenging to get seat a the table for data science and analytics. Now, the function has a seat and a say, but their voice may be drowned out because of more voluminous representation from other related functions e.g. scouts.
  • NBA also handles data collection and distribution. I don’t have a relative perspective here, but I did note that there are a lot of challenges with NBA data and working with it can be a slow and painful process at times.
  • There are two types of third party vendors for data in most sports— those that have good data , but bad coverage and those that have bad data but good coverage. It is hard to find someone who does well on both. This is something for teams to consider as they decide to make investment decisions. Cost of accumulated tech debt, people and time spent on extracting the desired level of value from the available quality of data etc. need to be considered in addition to initial startup cost.
  • As we move towards more mature states, teams will truly start to invest in players. Some IPL teams are trying to do this. Something to consider here is defining appropriate success criteria and balancing between outcome based player evaluation vs skill based evaluation. We talk a lot about how certain bowlers bowled well without reward or batters being unlucky. But we don’t necessarily have a good objective framework to evaluate skills and outcomes independently while also factoring in uncertainty.

Building an Analytics Team

  • There are in general 3 different types of functions to support: 1) Scouting 2) Coaching & Strategy 3) Research. These can be associated with different phases allowing us to start conservatively and have room to adapt: Scouting is slow burning and throughout the off-season, peaking during drafts, coaching involves firefighting and lots of ad-hoc work in-season and research happens in the off-season.
  • To avoid chaos in-season, it is important to create replicable processes. Research solves problems similar to those with coaching & strategy, but in a deeper way. So research in the off-season can feed coaching in the subsequent season.
  • More mature teams understand the value of having engineering (data engineering and front end tool development) expertise to aid data science
  • The talent market for analytics is saturated with people mildly interested in the position, but who may not already possess the necessary skills. There is also a fairly large volume of easily accessible/visible work from these people who are not that skilled, but want to work in the field. I think we can relate to this in cricket. The challenge for us is to create the right platforms and training setup to encourage and enable these people to up-level themselves.
  • On the other hand, there are also a lot of hobbyists that are skilled, but not easily reachable or in some cases, discoverable. These are folks you want to get on board but are hard to get. For them, we need to create the right incentive structures or other avenues to understand their motivations and pain points and decide whether to and how to address them.
  • The need is to find people with a mix of a keen understanding of Cricket and data. Moreover, as we saw earlier, there is also a need to be able to coordinate efforts and communicate in an accessible manner. Generally, it is easier to find people who can build models and work with data, but for the same people to also be able to build and work with teams as well as understand the sport involves additional degrees of rarity.
  • Analytics teams that are fairly large allow more things to be done in parallel. This gives analysts niche roles and thus calls for a different set of skills and growth path. It is easier for the growth path to be deep and lateral moves are a bit more costly. Smaller teams allow people to wear multiple hats and a broader growth path. This also brings with it more serial execution and a need to prioritize. These are generalizations based on anecdotes from my discussions. There are balancing acts in both modes and room to mould how a particular team functions.
  • Apart from finding the right people, another challenge in sports analytics is compensation. There is competition from other industries and relatively speaking, people in sports get a 30%-50% haircut on compensation. In cricket, Dan Weston has recently written about this and the need to put analyst value in the right perspective. Top leadership often misses this issue because there are always new people who are willing to work at lower pay for a short period. However, the value of retention and experience is often overlooked. There is a significant cost in retraining, rebuilding a team etc.
  • Two cultural areas that I heard about that we must address from the get go in cricket are a) Diversity b) Growth opportunities. I heard anecdotes about women and people of color not wanting to join teams because they feel threatened that their voice will be even more marginalized. We must not let this happen in Cricket for any demographic. On growth, while the field is growing fast, analysts and data scientists in sport don’t see scope for career growth. This leads to people either getting too comfortable or bored. I think there is just so much to do in a sport just getting started with data, that we have an opportunity to change this. There is plenty of room to create opportunities, growth paths and build role models for the next generation.
  • One interesting thing that some MLB teams are doing for hiring is to set up an apprentice system. This isn’t unlike an internship, but is set up over a longer period. This allows for candidates to test the waters, explore areas and build a portfolio of work. They also do a fair amount of recruiting through conferences like Sabermetrics and Sloan. The NFL tries to find people through hackathons and open competitions like the Kaggle big data bowl. All of these, in addition to academic collaborations and setting up platforms for what I like to call guerrilla analyst communities are worth exploring.

Acknowledgements

I am extremely grateful to Brendan Kent & team at The Measureables Podcast for the opportunities that they created for someone like myself to meet with some amazing people in sports analytics. I’d never have been able to build the perspective that I was able to share here if it wasn’t for the generosity and candidness of individuals like Arup Sen, Anthony Cacchione, Mark Simon, Josh Pohlkamp-Hartt and others that I spoke with who prefer confidentiality (but I am not any less grateful to have met). Everyone was extremely gracious with their time when we frequently went over our slots as they patiently addressed my explorations. I feel fortunate to have made these connections and I hope to engage with many of them and even give value back to them as we go about our journeys.

I also want to offer my deepest gratitude to folks that I have connected with in Cricket over the years. Folks closely associated with top teams, media and data in Cricket, like Dan Weston, Rajesh Aravamudhan, Srinath Bhashyam, Jake Lush McCrum, Sankar Rajgopal, Joe Harris, Tom Body, Nathan Leamon, Freddie Wilde, Gaurav Sundararaman & many others that I am not mentioning for confidentiality or preference, theirs or mine, have played a tremendous role in helping me understand the landscape of Cricket analytics and have been open and honest with their feedback, opinions as well as unsolicited advice, which I wholly appreciate.

If you enjoyed this piece, check out more of my work at Boundary Line and follow along here & on twitter @amol_desai

I can be reached on twitter or via email or Linkedin

--

--

Amol Desai
Boundary Line

Cricket Analytics Consultant, Cricket Platform @ZelusAnalytics (working with Rajasthan Royals), Freelance @CricViz linkedin.com/in/amoldesai-ds