Making xG Trend Charts using ggplot2

Caleb Shreve
caleb_shreve
Published in
5 min readJun 26, 2019

My favorite Statsbomb data viz is their xG Trend Charts. I also spend too much time on trains. The combination of those two things led me to try and figure out how to make something similar to that chart using the xG by Game table from American Soccer Analysis and ggplot2. The idea behind this post is to provide a basic guide on how to recreate the charts. There are tons of customization possibilities and this is by no means an exhaustive guide, but it gets close enough for a free blog post that I’m typing during breaks from being bossed around by an eight-year old. Let’s get started.

Packages

We’re going to use “rvest” to scrape the data from American Soccer Analysis, and then “tidyverse” will give us “dplyr” and “ggplot2” for some data manipulation and then actually creating the chart.

require(rvest)
require(tidyverse)

Data

The first step is pulling the xG by Game table from American Soccer using “rvest.”

webpage <- read_html("https://www.americansocceranalysis.com/by-game-2019/")
tbls <- html_nodes(webpage, "table")
tbls_ls <- webpage %>%
html_nodes("table") %>%
html_table(fill = TRUE)
games <- do.call(rbind.data.frame, tbls_ls)

Then I went ahead and filtered for one team’s games. LAFC is on pace to be the best team in the history of MLS so I decided to look at them first.

home <- games %>% filter(Home == "Los Angeles FC")
away <- games %>% filter(Away == "Los Angeles FC")
games <- rbind(home, away)

Once I had all of LAFC’s games in a dataframe, I created Expected Goals For, Expected Goals Against, and Expected Goals Differential columns for each game.

games <- games %>% mutate( "xGF" = ifelse(Home == "Los Angeles FC", games$HxGt, games$AxGt),"xGA" = ifelse(Home != "Los Angeles FC", games$HxGt, games$AxGt))games <- games %>% mutate("xGD" = games$xGF-games$xGA)

The final step before trying to start plotting was formatting the “Date” column so that R would recognize it as dates.

games$Date <- as.Date(games$Date, format = "%m/%d/%Y")

Plotting the chart

The first step was plotting the lines with points for individual games (I used teamcolorcodes.com to find the hex color codes for LAFC).

ggplot(games, aes(Date))+ 
geom_point(aes(y = xGA, color = "xGA"), size = 3)+
geom_line(aes(y= xGA), color = "#c39e6d", size = 1)+
geom_point(aes(y = xGF, color = "xGF"), size = 3) +
geom_line(aes(y= xGF), color = "#000000", size = 1)+
scale_color_manual(values=c("#c39e6d", "#000000"))

Then I wanted to move the legend, remove it’s title, and change the axis titles.

ggplot(games, aes(Date))+ 
geom_point(aes(y = xGA, color = "xGA"), size = 3)+
geom_line(aes(y= xGA), color = "#c39e6d", size = 1)+
geom_point(aes(y = xGF, color = "xGF"), size = 3) +
geom_line(aes(y= xGF), color = "#000000", size = 1)+
ylab("Expected Goals")+
xlab("")+
theme(legend.position = "bottom")+
theme(legend.title = element_blank())+
scale_color_manual(values=c("#c39e6d", "#000000"))

Next, I added the trend lines to the chart.

ggplot(games, aes(Date))+ 
geom_point(aes(y = xGA, color = "xGA"), size = 3)+
geom_line(aes(y= xGA), color = "#c39e6d", size = 1)+
geom_point(aes(y = xGF, color = "xGF"), size = 3) +
geom_line(aes(y= xGF), color = "#000000", size = 1)+
ylab("Expected Goals")+
xlab("")+
theme(legend.position = "bottom")+
theme(legend.title = element_blank())+
scale_color_manual(values=c("#c39e6d", "#000000"))+
stat_smooth(method = 'lm', aes(y = xGF), color = "black", linetype ="dashed",alpha = 0.5, se = FALSE)+
stat_smooth(method = 'lm', aes(y = xGA), color = "#c39e6d", linetype= "dashed", alpha = 0.5, se = FALSE)

My final steps were cleaning up the background, changing the scale of the y axis, and adding a title/subtitle. The final code for my plot looked like:

ggplot(games, aes(Date))+ 
geom_point(aes(y = xGA, color = "xGA"), size = 3)+
geom_line(aes(y= xGA), color = "#c39e6d", size = 1)+
geom_point(aes(y = xGF, color = "xGF"), size = 3) +
geom_line(aes(y= xGF), color = "#000000", size = 1)+
ggtitle(paste("LAFC xG Trend from ", min(games$Date), "to", max(games$Date)),
subtitle =paste("@caleb_shreve,", "created on", Sys.Date(), sep = ' '))+
ylab("Expected Goals")+
xlab("")+
stat_smooth(method = 'lm', aes(y = xGF), color = "black", linetype ="dashed",alpha = 0.5, se = FALSE)+
stat_smooth(method = 'lm', aes(y = xGA), color = "#c39e6d", linetype= "dashed", alpha = 0.5, se = FALSE)+
theme(legend.position = "bottom")+
theme(legend.title = element_blank())+
theme(panel.background = element_rect(fill ="white", color ="black"))+
theme(panel.grid.major=element_blank(), panel.grid.minor = element_blank())+
scale_y_continuous(breaks=seq(0,5,.5))+
scale_color_manual(values=c("#c39e6d", "#000000"))

The chart turned out fairly well. There are absolutely some aesthetic things that could be improved, and some bells and whistles you could add from a data perspective (coaching and season changes as xintercept lines for example), but I’m fairly happy with the final product and excited to use this for the rest of the MLS season.

Complete Code

require(rvest)
require(tidyverse)
webpage <- read_html("https://www.americansocceranalysis.com/by-game-2019/")
tbls <- html_nodes(webpage, "table")
tbls_ls <- webpage %>%
html_nodes("table") %>%
html_table(fill = TRUE)
games <- do.call(rbind.data.frame, tbls_ls)
view(games)
home <- games %>% filter(Home == "Los Angeles FC")
away <- games %>% filter(Away == "Los Angeles FC")
games <- rbind(home, away)
games <- games %>% mutate( "xGF" = ifelse(Home == "Los Angeles FC", games$HxGt, games$AxGt),
"xGA" = ifelse(Home != "Los Angeles FC", games$HxGt, games$AxGt))
games <- games %>% mutate("xGD" = games$xGF-games$xGA)
games$Date <- as.Date(games$Date, format = "%m/%d/%Y")
ggplot(games, aes(Date))+
geom_point(aes(y = xGA, color = "xGA"), size = 3)+
geom_line(aes(y= xGA), color = "#c39e6d", size = 1)+
geom_point(aes(y = xGF, color = "xGF"), size = 3) +
geom_line(aes(y= xGF), color = "#000000", size = 1)+
ggtitle(paste("LAFC xG Trend from ", min(games$Date), "to", max(games$Date)),
subtitle =paste("@caleb_shreve,", "created on", Sys.Date(), sep = ' '))+
ylab("Expected Goals")+
xlab("")+
stat_smooth(method = 'lm', aes(y = xGF), color = "black", linetype ="dashed",alpha = 0.5, se = FALSE)+
stat_smooth(method = 'lm', aes(y = xGA), color = "#c39e6d", linetype= "dashed", alpha = 0.5, se = FALSE)+
theme(legend.position = "bottom")+
theme(legend.title = element_blank())+
theme(panel.background = element_rect(fill ="white", color ="black"))+
theme(panel.grid.major=element_blank(), panel.grid.minor = element_blank())+
scale_y_continuous(breaks=seq(0,5,.5))+
scale_color_manual(values=c("#c39e6d", "#000000"))

Data from American Soccer Analysis, chart inspiration from Statsbomb

--

--