Profiling keepers with data

Victor Renaud
14 min readJul 26, 2021

A dive into different keepers’ styles as a first step in the scouting process

The position of goalkeeper is probably one that has changed the most in the last decade. The influence of generational players like Manuel Neuer have had a part in reshaping the position. Also, football itself has changed and a new era of high press, extreme intensity, or even the emergence Pep Guardiola as a coach, who was reliant on his goalkeeper to make the play from the start of his coaching career at Barcelona, have clearly shifted what we expect from a keeper.

Moreover, new goal kicks rules encourage keepers to play short passes, shortening the distance of their kicks and forcing the keeper to be more involved in building up the play under (sometimes very intense) pressure. This rule change in turn has helped to make this structural shift last.

The purpose of this work was a first try at profiling goalkeepers through taking several metrics that are part of the keeper palette and group them to visualize different profiles and similarities.

The aim here was also making the first step into a scouting process in a way it could be useful to make the right recruitment decisions. This process could be then scaled with outfield players and facilitate this first profile work for the technical structure, in order to target the right profile and to be sure what to expect and what not to expect from the player.

There is not much open data available to assess goalkeepers. However, few of the metrics from the available open data can help understand what is the profile and what can he add or not add to the squad before an extensive scouting analysis.

Even though the data used here can be useful to define a profile, it is obviously only a (first) step in the scouting process. Goalkeeping is a very specific position, and some facets of his play can’t easily be evaluated/profiled through data. The eye test is key in any kind of player analysis process but especially very important in evaluating a goalkeeper’s shot-stopping or passing technique, decision-making, or even influence on his teammates.

To visualize different keepers’ profiles, a t-SNE (T-Distributed Stochastic Neighbor Embedding) was used. t-SNE is basically a machine learning algorithm and it is often used to embed several features in a low dimensional space, and plot keepers according to their similarities. If you want to know more, you can find a good guide here.
Furthermore, a k-means algorithm was used to separate the different clusters and help visualize the different groups.

Data from the Big 5 European leagues from 2017/18 to 2020/21 was collected to perform this analysis. The different top 5 leagues have all their specificities, but adding European Competitions (UCL & UEL) was not making so much sense, as the intensity teams can find on the European stage could have flawed the process. Plus, some of the keepers here played more than 35 Champions Leagues games from 2017/18, when some have only played in their domestic leagues.

Regarding the sample size, the keeper must have played at least 40 nineties during this period. The idea was to analyze keepers that have played more than one entire season, as going through different contexts (playing in different teams, team principles, under different coaches, etc…) does impact keepers’ performances and function. It gives a stronger idea of the keeper’s profile at the end, not only their role. Please note that keepers’ that are retired today but who have played more than 40 nineties during the period are included.

Let’s have a look at the result, before breaking down the whole methodology and metrics used:

The results the t-SNE gave, helping to visualize the different keepers’ profiles.

The metrics used:

Penalty area dominance

Shot-stopping: GSAA% (Goals Saved Above Average %) is probably, as of today, one of the best metrics to assess keepers' ability to stop shots. The metric basically helps figure out if the keeper saves more or less than the model expects him to save, based on the difficulty of shots faced. You can find a detailed explanation of Post-Shot xG and GSAA% here.

Cross stopping: Percentage of crosses into the penalty area which were successfully stopped by the goalkeepers.

Sweeper-keeping

Sweeping: Number of defensive actions performed outside the penalty area per ninetiesHelps to assess if the keeper is very active out of the penalty area and often sweeps or not.

Distance of intervention: Average distance from goal of all defensive actions — this time, this is about all defensive actions, in and outside the penalty area. This gives a clear idea if the keeper clearly sticks to his line or if he does not hesitate to perform defensive actions such as claiming, punching the ball, getting out of his six-yard box to challenge a 1vs1, among other defensive actions.

Ball-playing

Build-up involvement: Open play keeper passes made per 100 team touches in the defensive third. This is a proxy that tries assessing to what extent the keeper is being made responsible by his teammates and the squad principles in the build-up play.

Build-up ability: this in-house completion % created from events data, taking into account passes from open play and goal kicks that are supposed to be part of a deep build-up play, helping to assess the keeper’s ability to play passes that could be part of it.

The “build-up zone” is defined as a larger than the defensive 3rd (the maximum passes ending limit is 42.5 meters).

The idea is to keep only a few types of passes that would allow an assessment of keeper’s distribution in the build-up phases. For example, passes that jump opposition first or second lines of pressures to directly access the forwards, among other types of passes. Here are the several conditions for a pass to be included in the build-up ability metric:

  • If a pass is made from a goal-kick, it must end outside the penalty area. Short passes from the keeper to the central backs inside the box, as per the new rule, are not included under this in-house metric. These kinds of very short passes inside the box don’t serve as a genuine indicator of the keeper’s ball-playing ability and are usually beneficial to keepers that play in very dominant teams, rarely pressed high, that just give a short and not pressured pass to restart the play.
  • If a pass is made from open play, it must be greater than 5 meters, making sure that extremely short passes are removed. This might seem a short threshold, but the aim was to advantage keepers that got involved a lot in shaping up the play, as for instance ter Stegen under Setien, who used to get involved in an advanced position and get sometimes integrated between the CBs.

NB: there is no maximum passing distance as we seek to evaluate keeper’s passing ability to penetrate the zone. Also, only goal kicks and open passes falling under certain conditions are taken into account.

A quick visualization of what is the “build-up ability” metric, and what is its aim

Here is the 2020/21 seasonal “build-up ability” passes from Ederson, who ranks as the best keeper over the last four seasons.

A look at Ederson’s “build-up ability” successful and unsuccessful passes in 2020/21

Far zone ability: This is a similar principle (a completion %) as the “build-up ability” metric, but this time, only passes, from goal kicks and open play, that end above 42.5 meters are taken into account. To what extent the keeper is able to find the second (progression) and the third (finishing) phases areas?

This metric aims to assess the goalkeeper’s ability to deliver long-range passes, passes in behind the defense (like Ederson can do), and passes into congested areas.

Also, there are medium-range passes that can be launched from an intermediate distance, as the “build-up ability ”, and “far zone ability” metrics mainly seek to analyze zonal accession ability, no matter what the distance is. Obviously, there is a huge share of the passes from this metric that are long (greater than 40 meters), but also intermediate ones that are riskier due to the keeper’s advanced position at the moment the pass is launched, in case the ball possession is lost.

NB: there is no maximum passing distance, as we seek to evaluate keeper’s passing ability to penetrate the second and third zones. Also, only goal kicks and open passes falling under certain conditions are taken into account.

A quick visualization of what is the “far zone ability” metric, and what is its aim

You can find here the seasonal “far zone ability” passes from ter Stegen, who ranks as one of the best keepers over the last four seasons. Manuel Neuer is, not surprisingly, another example of a keeper who is very good at passing to the far zone.

A look at ter Stegen’s “far zone ability” successful and unsuccessful passes in 2020/21

These are the seven metrics used for the profile similarity work. It is fair to say that a good share of the metrics mentioned here is related to the keeper’s ability to play with his feet. However, as mentioned before, this forms nowadays an integral part of their play and this is, most of the time, extremely important within the framework of keeper recruitment. No matter what the principles of play of the team are, the ability to cleanly access teammates in the build-up zone but also to reach players in the advanced area are key elements that must be assessed.

The next section is dedicated to discussing the results of each profile that can be drawn from the features put in the algorithm.

Results:

Line keepers:

Line keeping is probably the most old-school way of exercising the goalkeeping role. The line keeper is, most of the time, an elite shot-stopper, very limited regarding the other aspects of the game.

Concerning the other part of his play, the line keeper tends to stay on his line in an extensive way, even if the play asks for it. Crosses or even opposition through balls in-behind his defensive line does not push him to get out of his six-yard box or to sweep.

Furthermore, line keepers barely get involved in the build-up play and their ball-playing ability is, most of the time, limited.

Some examples of line keepers: Oblak, Lloris, David de Gea

Classic keepers:

Dominant inside the penalty area, on his line, and at intercepting crosses, the classic keepers are usually not involved in an extensive way in the build-up phase. Able to circulate in a compressed area, they can also access farther zones in a good way as their overall distribution is above average. However, keeping a high line is not their main quality as they strictly intervene outside their penalty area when the play is easily readable or requires it at all costs.

Mancini during the Euro 2020 capitalized on Donnarumma’s strengths and his interpretation of the classic keeper role. Being very reassuring and dominant inside his penalty area, “Gigio” was also involved in the first phases to build-up the play, attracting the opponent as high as possible to bypass them through short and medium passing but also longer when the play required jumping one or two lines of pressure.

Some examples of classic keepers: Gianluigi Donnarumma, Edouard Mendy, Samir Handanovic

Anticipative keepers

Very dominant inside the penalty area and outside, the anticipative goalkeeper is above average in terms of stopping opposite teams’ actions. Very mobile and aggressive, those keepers exploit every line and have an elite ability to read the play and ball trajectories. They are accountable for slowing down the rhythm when their team suffers and stopping opposing attacks outside their six-yard box, allowing their teams to take a breath of fresh air during key moments. Plus, they can be an offensive weapon, launching quick transitional moments after claiming the ball, as Mike Maignan used to do a lot with Lille.

This kind of goalkeeper can be heavily useful to teams used to being dominated and conceding frequently, as anticipative keepers can control a particular zone, and have impacts on key moments. However, most of the anticipative goalkeepers are quite limited in the distribution and in their contribution to shaping up the team’s attacking play.

Some examples of anticipative keepers: Nick Pope, Alexandre Oukidja, Mike Maignan

Proactive keepers

Their main quality in possession is their ability to be part of the build-up play as an outfield player could be. Their ball-playing ability is elite when it comes to accessing the second (progression) and the third (finishing) phases, as for some of them, they can play in a very advanced position, sometimes almost between the two center backs to make a back three, offering width and turning the opponent’s press into a very difficult exercise.

Regarding their out-of-possession strengths, they are above average at keeping a high line, covering their defense and interrupting the play, such as collecting the crosses. Their shot-stopping abilities can vary inside this cluster, as some are excellent at it (i.e.: ter Stegen), and some are not (i.e.: Kepa).

Some example of proactive keepers: Kevin Trapp, Marc-André ter Stegen, Rafal Gikiewicz

Complete keepers

The title of the cluster here says a lot, “modern keeper” could have also been a suitable name for this profile of goalkeepers. These keepers possess all the facets of the modern game integrated into their palette. They are above average at stopping shots, can read the play and cover their defensive line in an elite way, have the ability to play like an outfield player and alternate short play and long play under pressure with impressive accuracy.

For some of them, they’ve also developed a trend that has inspired a generation of keepers. I am obviously referring to Manuel Neuer, who has completely turned the keeping role into a new era.

Some example of complete keepers: Alisson, Ederson, Koen Casteels

One of the best examples of complete keepers is Alisson of Liverpool

Libero keepers

This is clearly a cluster that Manuel Neuer could be part of, as he clearly is a libero keeper. But not only. A libero keeper is able to be the player that most of the deep build-up starts from, circulate and target the right zone, even under pressure. Though, some of them are more limited at reaching farther zones with their feet.

Also performing great at keeping a high line by controlling the zone behind the defensive line. Most of them though, tend not to be dominant in the penalty area, helping the team way more higher up the pitch, in and out of possession.

Some of the players in this cluster can be seen as pure sweeper, for instance Koubek but also Alban Lafont as they don’t have particular ball-playing abilities.

Some example of libero keepers : Roman Bürki, Pierluigi Gollini, Ron-Robert Zieler

Reactive keepers

Not the most targeted and commonly recognized profile, reactive keepers have valuable profiles nonetheless. Not being often included in the deep passing circuits, they are not very comfortable shaping up the play, doing better at finding teammates in more advanced zones with long balls. Even though some of them are above average shot-stoppers, they show, for most of them, limited sweeping skills and tend not to frequently jump or leave the six-yard box to interrupt opponents’ actions.

Some example of reactive keepers: Salvatore Sirigu, Mattia Perin, Mathew Ryan

Conclusion and further discussion

As part of a goalkeeper scouting process, working with profile clustering can help the scouting team decide on suitable profiles and ease the recruitment process.

For different team projects or game models, you can find the goalkeeper that could suit your team perfectly. Game models oriented towards heavy possession and especially long deep build-up to create artificial transition can favor keepers that like to be a constant option for their backline, comfortable with their feet. Here, some of the classic and proactive keepers can be their first choice. Depending on their willingness to play with a very high line, they can also favor a libero keeper.

Teams that generally defend using a low-block and rely on transitions to create attacking opportunities could favor anticipative keepers who are generally able to engage in key moments to disrupt opposition attacks and control the tempo of the game.

Reactive keepers can be a good fit if a team is looking for a keeper that can target congested areas with accuracy, and additionally aware constantly and skilled enough to control the team’s six-yard box well. The team here should not focus on deep ball circulation and expect impressive ball-playing skills from their keeper, especially under pressure.

These are few examples of what a scouting department could perhaps produce as a first step, before eye-testing the findings. As said previously, keeping is a very specific role and some aspects of a keeper’s play can’t be easily evaluated through data (like in the case of some outfield players).

However, there is some private data (or model that can be developed from it) that can help to profile the keepers, as for instance the accuracy of their positioning when an opposition shot is taken, their ability to play under heavy pressure, or even building an expected pass model from the two metrics (build-up ability and far zone ability) developed here could give an even more precise idea of the keeper’s profile.

I consciously decided not to include some metrics in this analysis like, for instance, the keeper’s throws. This is a very valuable keeper skill but I was not sure whether it was part of a profile, and could help understand who the keeper actually is.

Ultimately, and in addition to the above-mentioned types of keepers, another common type could be ‘ball-playing keepers’, ranking great at the “build-up ability” and “far zone ability” metric. However, this type of keeper was widespread among the different clusters. For instance, Marwin Hitz, Ederson, or even Marc-André ter Stegen all have ball-playing predispositions but are part of different clusters and have different strengths and weaknesses.

Acknowledgment

You can follow me on Twitter : @victorrenaud5

--

--