Creating an Accessible Research and Development Software For Pitching Development

6 min readJun 15, 2024

Creating an Accessible Research and Development Software For Pitching Development

The Need For a Baseball R&D System

Data drives success in all aspects of the world today, especially in sports. Virtually all professional organizations and large college programs have an internal research and development (R&D) software system to assist player development and scouting. This technology and access to data are becoming necessary for a successful club. These systems supply the foundation for sophisticated analytics. However, they often require a lot of money to construct and maintain, making them financially inaccessible to many smaller college and high school programs. To level the playing field and democratize data-driven player development, we have created an affordable, efficient internal Research and Development system for pitching development.

Creating a good R&D department requires at least one of two things: a full-time analyst who understands advanced analytics and modeling, or enough money to pay for a technology service to do that for you. Those costs can be inaccessible to smaller programs. Beyond technology costs, teams also face issues with the quantity of data they have access to. A strong analytical foundation requires a vast amount of data. MLB organizations accumulate 162 games of regular season data, in addition to the playoffs and spring training. It would take years for smaller college programs and high schools to accumulate enough data to fulfill even one MLB season’s worth. Despite these constraints, we have designed a web application that uses in-depth statistical modeling and the power of the cloud to work through the issues smaller programs may face.

Outlining Our R&D System: The Pitching + Server

Our R&D database dubbed “The Pitching + Server” operates as an R Shiny application. We utilized the R programming language to code a series of modules that help create a data foundation. To ensure that large amounts of data are saved safely on the cloud with user privacy ensured, we constructed an SQL database supported by Amazon Web Services. This database is crucial to the success of this server. Since the technology required for this server to operate is minimal, we need to make up for it with good modeling and a large sample size. This shared SQL database allows for the server to refine our models and show detailed data visualizations. Additionally, we use an AWS password system to ensure that each team’s data is secure and confidential.

The first module in the Pitching + Server is our Stuff + generator. The module has a series of input boxes allowing the user to input thrown ball metrics from a Rapsodo, Trackman, or other similar device after a bullpen or live outing. These thrown ball metrics are used to calculate Stuff + values for each pitch in the player’s arsenal. Our Stuff + models which we outline here (The ART of W.A.R Pitching + Model) use machine learning to quantify a player’s “stuff”. This module also utilizes the model to provide players with development advice by indicating which thrown ball metrics most negatively affect their Stuff values for each pitch. The Stuff+ generator is an easy and efficient way to track a player’s development in pitch quality.

The second module in the Pitching + Server is our charting database. To work this module you simply input the pitch type (Fastball, Slider …), result (Swinging Strike, Ball, Groundball…), and velocity, if available. Our server tracks those results along with counts, outs, and innings. After the game, our server outputs a large data table with in-depth information about said outing. This large database is extremely useful for a team’s internal use. The number of statistics shows programs with detail on how their pitcher did, supplying a foundation for analysis and development.

Our server continuously updates season statistics on save. These data tables supply programs with information not only on the pitcher’s outings individually but cumulative season statistics. These cumulative statistics also make use of our Pitching + model in the form of Location +, a statistic that quantifies a player’s ability to command the strike zone. This stat is an important evaluation and development tool. When Stuff + numbers and a Location + value are both present for our player the module will also calculate a Pitching + value, a metric that quantifies a player’s overall pitching ability. Pitching + has been strongly correlated with future ERA in the models we have run.

The final module in the Pitching + Server is our Strike % and Stuff + calculator. In this module, the user inputs Strike%, Stuff +, and usage rate for each pitch in their arsenal. The module will calculate the player’s overall strike percentage and overall Stuff +. The sliding bars in the module are used to adjust the usage rate. This simple but effective module helps coaches refine pitch-calling strategies to optimize usage rates for their players’ arsenals. This could be a helpful tool for any pitching coach who is looking for an edge in their pitch-calling strategy.

We also constructed a database of baseball development writing from across the internet. Most of this writing is simply a link to other excellent work in the field, it makes research for players a lot more efficient as they get directed to fantastic information instantly. When paired with our Stuff + Generator, this database can be used as a resource to make development progress as efficiently as possible.

Moving Forward

Looking ahead, we plan to enhance the Pitching + Server with the use of data visualizations. Visualizations make data more understandable, especially for coaches and players. The first source of inspiration for these would be to replicate some of the pitching charts from MLB’s Baseball Savant, such as the percentiles charts. These plots very clearly show the statistical value of each pitcher through comparison to the rest of the league. Our version of this would allow you to select each player on your roster. The chart would be rendered using all of the statistics in the entire database to create averages and display where the selected player ranges compared to the rest of the level of their competition on a handful of statistics.

We also plan for increased use of machine learning models. Currently, we have our Stuff + and Location + models, which work very well. However, they can only say so much about a pitcher and do not quite represent all aspects of pitching success. We will continue to build useful models to further help smaller programs. We think that models focusing on a pitcher’s mechanical movements or aspects of their delivery that create deception could be very interesting next steps. We would also love to find a way to implement a generative AI system into our server, serving as an AI pitching coach.

Our journey to democratize baseball data to programs across the country has only begun. If you have any interest in being a part of our journey or would like to speak with us, please feel free to reach out.

Written by RG Pitching -- Art of W.A.R.