Golf Tournament Simulations

Jack Overby
The Startup
Published in
7 min readJan 24, 2021

--

Last week, I gave an overview of PGA Tour’s ShotTrack and how they used it to measure the Strokes Gained metric. Now, we’re going to use course and individual player data to try to estimate a player’s chance of winning a tournament given their skill and position.

Example

Artists use real life happenings to inspire their works. I can’t think of a higher art form than a Medium article about simulating golf tournaments in JavaScript, so I’ll do the same: I’ll take the tournament currently underway, The American Express (wonder who’s sponsoring?) and calculate each player’s chance of winning. Let’s see how things look!

Pay no attention to the giant banner ad in the middle of the leaderboard… thanks CBS!

Here’s our game plan:

  1. Get course statistics, to create a probability distribution for each hole (e.g. 15% chance of birdie, 75% of par, 9% bogey, 1% double bogey).
  2. Get player statistics, namely Total Strokes Gained, to determine a player’s skill level and adjust probability distributions based on how above/below average a player is (e.g. in the above hole, perhaps Player A has a 16% chance of birdie and a 8% chance of bogey).
  3. Write a function to generate a random outcome for each player on each hole.
  4. Make another function to simulate each player’s round, and thus to determine the winner for a given simulation (note: if the tournament ends in a tie, we’ll randomly choose a winner from the tied players, with all players weighted equally. Not quite realistic, but close enough!). Even though 71 players made the cut and thus “could” win, we’re going to limit ourselves to the top 12 for this exercise. Realistically, it’s highly likely one of these players will end up winning, so it’s not a ridiculous assumption.
  5. Create one final, overarching function, which runs however many simulations the player wishes (e.g. 1000) and outputs the # or % of wins by each player.

Now let’s get to it!

Course Stats

Fortunately, the course website provides nice little synopses for each hole! We’ll use this to create the baseline probability distribution. I should probably use JS DOM functions to programmatically loop through each hole and extract the numbers, but (1) there are only 18 holes, and (2) it’s a Sunday afternoon and I’m feeling lazy, okay? After 5 painstaking minutes, here’s what we get:

// Note: the Object keys are equal to the score under/over par.
-2 = eagle, -1 = birdie, 0 = par, 1 = bogey, 2 = double bogey
let holeProbs = {
1: {
"-2": 0,
"-1": 0.21,
"0": 0.67,
"1": 0.12,
"2": 0
},
2: {
"-2": 0,
"-1": 0.23,
"0": 0.70,
"1": 0.07,
"2": 0
},
3: {
"-2": 0,
"-1": 0.21,
"0": 0.67,
"1": 0.12,
"2": 0
},
4: {
"-2": 0,
"-1": 0.149,
"0": 0.759,
"1": 0.089,
"2": 0.003
},
5: {
"-2": 0.01,
"-1": 0.375,
"0": 0.425,
"1": 0.145,
"2": 0.045
},
6: {
"-2": 0,
"-1": 0.05,
"0": 0.70,
"1": 0.20,
"2": 0.05
},
7: {
"-2": 0,
"-1": 0.24,
"0": 0.64,
"1": 0.07,
"2": 0.05
},
8: {
"-2": 0.0075,
"-1": 0.4575,
"0": 0.4875,
"1": 0.0475,
"2": 0
},
9: {
"-2": 0,
"-1": 0.20,
"0": 0.62,
"1": 0.14,
"2": 0.04
},
10: {
"-2": 0,
"-1": 0.21,
"0": 0.69,
"1": 0.08,
"2": 0.02
},
11: {
"-2": 0.01,
"-1": 0.3275,
"0": 0.5775,
"1": 0.0675,
"2": 0.0175
},
12: {
"-2": 0,
"-1": 0.2875,
"0": 0.6075,
"1": 0.0975,
"2": 0.0075
},
13: {
"-2": 0,
"-1": 0.10,
"0": 0.71,
"1": 0.15,
"2": 0.04
},
14: {
"-2": 0,
"-1": 0.2575,
"0": 0.6675,
"1": 0.0575,
"2": 0.0175
},
15: {
"-2": 0,
"-1": 0.0925,
"0": 0.8025,
"1": 0.1025,
"2": 0.0025
},
16: {
"-2": 0.01,
"-1": 0.395,
"0": 0.525,
"1": 0.065,
"2": 0.005
},
17: {
"-2": 0.00,
"-1": 0.1125,
"0": 0.7025,
"1": 0.0825,
"2": 0.1025
},
18: {
"-2": 0,
"-1": 0.1425,
"0": 0.6825,
"1": 0.1325,
"2": 0.0425
}
};

Several notes:

  • I gave a 1% eagle probability for all the par 5s. This might not be super realistic, but neither is 0%!
  • I marked all “double bogeys or worse” as double bogeys. This also is not realistic, as triple bogeys or higher do indeed happen, even to the pros, but they’re rare enough that we can discard them for the purpose of this exercise.

Player Skills

Let’s find the Strokes Gained Total (SGT) for each of the leaders:

// Note: I used 2019 data for Molinari, since 2020 was unavailable due to lack of PGA Tour tournaments playedconst playerAdjustments = {
"Max Homa": 0.279,
"Si Woo Kim": 0.267,
"Tony Finau": 1.243,
"Richy Werenski": 0.484,
"Russell Knox": 0.030,
"Brian Harman": 0.767,
"Emiliano Grillo": -0.182,
"Cameron Davis": 0.560,
"Rory Sabbatini": 0.071,
"Chase Seiffert": 0.196,
"Francesco Molinari": 0.038,
"Doug Ghim": -0.520
}

Note that 10 of the 12 players have positive SGT, i.e. are above average. Unsurprisingly, better players tend to contend and win tournaments more often!

Next, we’ll make the following adjustments:

  • A player’s SGT, which is per round, will be divided by 18 and applied to each hole. For example, Max Homa’s SGT is 0.279, so we’ll assume he’s 0.279 / 18 = 0.0155 strokes better per hole.
  • The per hole adjustment will be performed by increasing birdie % and decreasing bogey %. Ideally, we’d also change eagle/double bogey average, but again, Sunday afternoon, so I’m taking a few shortcuts here! Here’s an example:

Player A SGT = 0.01 per hole

// Hole 1
{
“-2”: 0,
“-1”: 0.21,
“0”: 0.67,
“1”: 0.12,
“2”: 0
},

So we’ll increase the -1 (birdie) prob by 0.01 / 2 = 0.005 and decrease the 1 (bogey) prob by 0.005. So player A’s hole 1 distribution will now look like this:

// Max Homa Hole 1
{
“-2”: 0,
“-1”: 0.215,
“0”: 0.67,
“1”: 0.115,
“2”: 0
},

Simulating Holes

First, let’s take the hole Objects and transform them into arrays, using an intricate bit of code:

holeProbs = Object.values(holeProbs).map(hole=>{
const sortedKeys = Object.keys(hole).sort((a,b)=>parseInt(a)>parseInt(b)?1:-1);
let sum=0;
const sums=[];
for (const key of sortedKeys) {
sum+=hole[key];
sums.push(sum);
}
return sums;
});

Now, we’ll make a function that randomly generates a number from 0 to 1 and returns the appropriate hole score:

function randomHoleScore(holeDist) {
const randNum = Math.random();
return [...Array(holeDist.length).keys()].find(ix=>holeDist[ix]>random)-2;
}

This will return a number from -2 to 2 (eagle to double bogey), based on the random number and the probability distribution.

Player Round

Now, we’ll make a function that will take a player’s score and probability distribution for a given hole, then return the updated score based on the output of randomHoleScore:

function playerRound(playerObj, courseObj) {
Object.values(courseObj).forEach(holeDist=>{
holeDistCopy = JSON.parse(JSON.stringify(holeDist));
holeDistCopy[1] += playerObj.SGT / (18 * 2);
holeDistCopy[2] += playerObj.SGT / (18 * 2);
playerObj.score += randomHoleScore(holeDistCopy);
})
}

Now, let’s make a function that takes each player, with a SGT and initial score, and simulates a round for each player. At the end, the player with the best score is the winner:

function simulateRound(playersArray, courseObj) {
playersArray.forEach(player=>playerRound(player, courseObj));
const bestScore = Math.max(...playersArray.map(player=>player.score));
const winners = players.filter(player=>player.score=bestScore);
return winners;
}

Simulate N Tournaments

While going through this exercise, I realized a slightly better way to do this. Here’s my final function, which makes a copy of the playersArray input and simulates N tournaments. Each player starts out with 0 wins. At the end of each tournament, winners/cowinners have their wins adjusted upwards. The function returns an object, where the keys are each player’s name and the values are the # of wins divided by N- i.e. the player’s expected win percentage:

function winProbSimulator(playersArray, courseObj, N=100) {
const winners = {};
playersArray.forEach(player=>winners[player.name]=0);
for (let i=0; i<N; i++) {
const playersCopy = JSON.parse(JSON.stringify(playersArray));
const bestScore = {score: 1000, count: 1};
playersCopy.forEach(player=>{
playerRound(player, courseObj);
if (player.score < bestScore.score) {
bestScore.score = player.score;
bestScore.count = 1;
}
else if (player.score === bestScore.score) {
bestScore.count += 1;
}
});
playersCopy.forEach(player=>{
if (player.score===bestScore.score) {
winners[player.name] += 1 / bestScore.count;
}
})
}
Object.keys(winners).forEach(key=> winners[key] /= N);
return winners;
}

Here’s an example run, with N=10,000:

This took about 8 seconds to run; functions not optimized!

This output definitely passes the “sniff test”! The three players tied for the lead at the start of the day have the highest win %- and Tony Finau, the strongest player of the three, has the best chances of the three by far (30% vs ~18% for the other two). Looks like we’ve built a reasonable-looking simulator!

Conclusion

Thus concludes another exercise in golf statistics. Hope you enjoyed… just kidding, I know you didn’t, but I did!

P.S. The winner, by 1 shot, was… Si Woo Kim!

--

--