The Runners of the Boston Marathon 2017
A Data Analysis of the 2017 Boston Marathon: Part 1
- What happened at the Boston Marathon this year? In this article I present an initial analysis based on the data from this year’s event.
- This will be the first in a series of posts looking at the people, performance, pacing, ‘bonking’ and other topics arising out of the 2017 Boston Marathon.
- For now we will focus on the participants of this year’s race — their gender, age, place of origin — comparing their participation numbers and finish-times.
- In the process we will look at how gender and age influence performance and we compare countries and states based on their participation levels and the finish-times of their runners.
With the Boston Marathon just completed I thought I would write up a few posts on the data generated by the thousands of runners who completed the punishing course this year.
The dataset for the 2017 Boston Marathon includes 26,263 standard runner race-records; a small fraction of records have been excluded (<1%) because they contained missing timing data or other anomalies. Each remaining race-record includes information such as the gender and age of the runner, their country, state, and/or city of origin, their overall finish-time, and the split-times for 5km intervals along the course.
This will be the first of a sequence of posts over the next few days. In it I will focus on the runners of the Boston Marathon looking at factors such as gender, age, and place of origin and how they influence performance. Later posts will consider pacing, hitting the wall, Boston qualification times, among other topics, which are hopefull of interest to marathon runners and data-geeks alike.
Look who’s running!
Let’s start by looking at the runners themselves, their gender, age, and places of origin.
On Gender & Age
The bar chart below shows that overall about 45% of Boston runners are female and the line-graph shows that they have an average age of about 40, some 5 years younger than their male counterparts.
The nice thing about the Boston dataset is that it provides precise age information for each runner, as opposed to the age-ranges that are more usual in marathon datasets. Further detail on the distribution of ages is presented in the histograms in Figure 2, which show the relative proportion of runners of each gender and for different ages. For example, we can see a greater proportion of women between the ages of 20 and 42, compared to men. But, from 42 years on, there are proprtionally more men than women.
It is interesting to contrast sharp increase in the proportion of females runners from ages 20 to 28, compared to a more gradual increase for men between the ages of 20 and 45. In both cases, after the age of 45, the proportion of runners participating in Boston drops steadily.
Places of Origin
Where do these runners come from? Unsurprisingly, the large majority (over 20,000) come from the USA, and about a quarter of these from Boston’s home state of Massachusetts. Approximately 1,800 runners come from Canada and after that there is a long tail of countries with different levels of partcipation. All in all, this year Boston attracted runners from 91 different countries around the world.
Let’s take a look at some of these. First, for ease of presentation, we will limit ourselves to countries that have more than 50 runners participating; there are 23 such countries. To avoid the USA and Canada from drowning out the other countries we will exclude both of these from the graphs that follow.
The first graph below shows the total number of participants per country for the remaining 21 countries. Great Britain comes out on top, with more than 400 runners, followed by Mexico, China, and Germany. My own country, Ireland, is there too, with more than 80 particpants.
Just for fun, let’s look at participation rates as a fraction of the population of each country. Well would you look at that! Now little ol’ Ireland tops the table — we don’t beat the US or Canada, however — followed by Hong Kong, Costa Rica, and Switzerland.
What about the locals?
Since we left out the USA from the above analysis, let’s now take a look at US participation by state and by city; once again, for ease of presentation we will limit ourselves to the top 20 states and cities and for variety, this time, we will show the number of male and female runners.
In the first graph below — participation by state — we see a much higher participation level from Massachusetts along with a healthy showing from California, New York, and Texas. Similarly, Boston and Cambridge feature strongly in the next graph — participation by city — with the likes of New York and Chicago, two big marathon cities, also providing plenty of runners.
It may be worth noting that while most states and cities provide more men that women to run Boston, Boston itself, and Massachusetts, provide more female runners than males.
Next, let’s take a look at performance in terms of the finish-times of runners. Men are faster than women on average. For example, in this year’s event the average finish-time for men was just over 228 minutes compared with 249 minutes for women. The graph below plots the proportion of runners with various finish-times. We can see a greater proportion of male finishers up to the 210 minute mark (3.5 hours), and after this it female runners tend to domainate.
How does performance vary with age?
This gender gap is preserved when we compare finish-times by age; see Figure 8. For both men and women, the fastest runners tend to be in their 30’s, slightly older for women than for men. After this, as we age, our finish-times suffer. Interestingly, the relative difference between men and women reduces with age; the gender gap appears to close.
Closing the Gender Gap
Thus, as we age, the finish-times of men and women tend to become more and more similar. This is illustrated in the graph below in terms of the percentage difference in finish-times between the genders. For example, for 20 year-olds there is a 20% gender gap — on average, men are about 20% faster than women at this age — however, by the time runners reach their 40’s this gap has narrowed to about 10%, and it continues to narrow, heading towards 5% for the oldest runners in their 70s.
We also graph the gender gap for the fastest runners within each age group; we compute the average finish-time of the top-10 fastest males and females at a given age. As before, the gap closes initially, as runners age, but then widens again between the ages of 35 and 60; the fastest men in their late 30s, 40s and 50s tend to do a lot better than the fastest women of the same age, relative to average runners. In other words, although the finish-times of men and women become more similar as they age, on average, the fastest men manage to maintain a significant finish-time gap compared to the fastest women during their late 30s, 40s and 50s.
Of course, it is likely that experience — the number of marathons completed — plays an important role here and, all other things being equal, we might expect that older runners will have completed more marathons than younger runners. We do not consider this here however, but we might speculate, for example, that perhaps the increased gender gap observed for the fastest runners is due to different levels of experience between these older fast men and women. For now we will leave this question hanging and return to it in a later analysis.
The Fastest Countries and States
Finally, again just for fun, let’s return to the country and states of origin of our Boston 2017 runners and try to determine those countries and states with the fastest runners; once again we will limit ourselves to the countries and states used above.
To keep things simple — but not too simple — we will consider two different ways to answer this question. The first way is to compute the average finish-time for all runners from a particular country or state. This is effectively the finish-time of the typical runner from that country or state, and may favour countries and states with fewer participants, as these participants may be more experienced marathoners.
The second way to evaluate how fast the runners from a given country or state are, is to focus on the fastest finishers — we will look at the top-10 finishers from each country and state — and report their average finish-times. The results of this for countries and states are presented in the two graphs that follow; in both the countries and states are ordered based on the average finish-times of their top-10 fastest runners.
When we focus on the finish-time of all runners, Costa Rica and Colombia come out on top of the country rankings, with average finish-times of 204 and 206 minutes, respectively; see Figure 10. But when we look at the average finish-time of the fastest (top-10) runners then the US, Canada, and Japan win out, with finish-times of 134, 155, and 158 minutes, respectively, pushing Costa Rica and Colombia mid-table, with times of 171 and 170 minutes, respectively.
Among US states, Colorado, Maryland, and Minnesota come out on top for the average finish-time of all runners with finish-times of 223 minutes for Colorado and 227 minutes for Maryland and Minnesota. However, the fastest US runners hail from Massachusetts, California, and Colorado with finish-times of 144, 149, and 151 minutes, on average, for their top-10 finishers, respectively.
That’s enough data for now — maybe too much? — or just enough to whet the appetite as marathon season takes hold. The aim of this post was to provide a summary of the participants of this year’s Boston Marathon, focusing on gender, age, place of origin, and comparing their participation numbers and finish-times.
That’s just the start of this analysis, however, and in the coming days I plan to look more deeply at the pacing of the Boston Marathon, hitting-the-wall, and the small matter of Boston’s Qualifying standards.
If you would like me to explore any other aspect of the data then please leave a comment. And if you like these posts then please let me know. You might even wish to share them … that would be nice!
You can find out many more looking at the data of marathon running in Running with Data including the following focusing on Boston.