PGA Golfer Earnings: What aspects of golf set apart the best of the best from the average PGA player?

Elliott Bauer
INST414: Data Science Techniques
5 min readFeb 11, 2024

For my first module assignment for INST414, I decided to focus on a topic that I have been gaining interest on over the past several years: golf. I started playing my sophomore year of college, and now that I am in my senior year, I would consider it one of my favorite hobbies. Golf is one of the hardest sports that exist, as it requires so much precision and technique — while also being an extremely mental game. For my data, I decided to extract a CSV of Professional Golfers’ Association (PGA) golfer statistics from the 2022–2023 tournament season. I gathered my data from ESPN.com. From here, I used Jupyter Notebook, along with the Pandas library to see if I could find any trends that seem to be most impactful to a golfer and their earnings across a season. I created a table, ‘pga_sats’, that displayed columns with information that matched up to a golfer from this past season. The columns are as follows:

  • RK: Rank
  • EARNINGS: Season-long earnings in US dollars
  • CUP: FedEx Cup Points
  • EVNTS: Number of events participated in
  • RNDS: Number of rounds played
  • CUTS: Number of cuts made (a ‘cut’ is when they take the highest performers of the earlier rounds, and they advance in order to reduce the number of players)
  • TOP10: Top 10 finishes
  • SCORE: Average score per tournament
  • DDIS: Average drive distance off the tee, in yards
  • DACC: Driving accuracy, as a percentage
  • GIR: Greens in regulation, as a percentage
  • PUTTS: Putts per hole
  • SAND: Save percentage out of sand traps
  • BIRDS: Birdies per round (a birdie is when the golfer achieves a score of one under the par)

The main thing I thought would be most interesting to analyze was golfer by putting statistics. To do this, I created a new table called ‘putting’ from my initial table. This table would only contain the golfer’s name, their total earnings, and the average number of putts they take per hole. I also dropped golfers that had an average of 0.000, as this data is deceiving considering no golfers truly take 0 putts per hole. From here, I sorted the putting column in ascending order to show the highest average putts to the lowest average putts. Right off the bat, I was surprised to find that the difference between the worst golfer and the best golfer on the tour is separated by 0.167 putts. Jim Herman had the high at 1.834 putts per hole, while Taylor Montgomery had the low of 1.667. This not only shows how elite the competition is on the PGA tour, but also that stronger putting can make a world of a difference in the grand scheme of things. I then decided to use the .head and .tail functions to show the top 20 and bottom 20 putters on the tour. I took the average of the two columns, and they averaged out to be 1.712 and 1.802, respectively. Once again, this goes to show how close of a gap there is in overall performance between the top golfers.

However, after seeing that, I thought it would be interesting to take a closer look at earnings based on putting tendencies. I subtracted the sum of earnings of the 20 worst putters from the sum of the earnings from the 20 best putters. The results were very shocking to me. The top golfers earned $135,804,242, while the bottom golfers earned $14,999,296. This makes for a difference of $120,804,946 between the top 20 and bottom 20 golfers from this past PGA season. It was absurd to be able to see how much of a discrepancy there was in the data, despite the fact that their putting averages were separated by such a small margin of strokes. My data is very flexible, and could answer many questions that golf fans have surrounding the best golfers in the world. I think most of my stakeholders would be in the same boat — fans of the PGA tour, or just avid golfers. My results would inform people of all levels of golf how important putting truly is. It can be an overlooked portion of one’s game, as many prefer to work on improving their mid range and long range game. In reality, performance on the green is a huge contributor to overall success. There are not too many limitations from my analysis, as it seems as though there is enough data from the season to prove how valuable putting is to a golfer’s game. However, there were numerous golfers that had a 0 in the ‘PUTTS’ column, which is a confusing metric, being that it is impossible to never putt. I am curious as to what caused this — is it a mistake in the data, did the golfer withdraw from the competition, etc.? I faced a few bugs in my data as well, but was able to deal with them effectively. The most noteworthy issue was that there were dollar signs and commas in my ‘EARNINGS’ column, which was messing up some of my calculations. I had to go back into my CSV file to find and replace all of these values, as Jupyter Notebook was reading them as a string instead of a float or integer. I do not think that there is any bias in my analysis, as it is all dependent on golfer statistics and there are no opinions involved whatsoever. All in all, I had a really good experience analyzing PGA golfer data. I feel as though it helped me refresh on crucial Python skills, while also enhancing my knowledge on a subject that I have a growing interest for. I am looking forward to either expanding my analysis on this topic, or utilizing these skills elsewhere to draw analysis from another one of my interests.

I have attached a link to my GitHub repository below:

https://github.com/elliottbauer99/INST414/blob/main/README.md?plain=1

--

--