Our First Visuals

SportsViz
7 min readMar 27, 2021

--

Diverge — Our First Ideas:

Since the formation of our team we have been having weekly meetings, and sometimes more. We assembled the team on the basis that we were all sports fans, but were not quite clear on what we wanted to visualize within that scope. In the beginning we were looking at different ways to visualize football matches. We decided we wanted to start with the World Cup, but were unsure of how to do that. We read several scientific papers on the matter, from which we learned that the world of football data visualization is a largely privatized world, where the people paying for the sensors are keeping data for themselves. From that we began to dig deeper and look for publicly available datasets.

This is where we found Pappalardo’s repository of datasets, which is thankfully a very rich repository. Originally we had planned to use a dataset which dates all the way back to the 1850s and contained all historic World Cup information to-date such as wins, team names, and player IDs. However after our first feedback session we realized that the dataset might be too limited in dimensionality, since it went back so far it was only able to capture so much information from the beginning. It only allowed us to do limited visualizations in regard to matches won by country. In that same feedback session, we mentioned that Pappalardo had a few other datasets, and that we might be able to somehow visualize the players if we stick to a single world cup. We decided to move forward with that idea.

In our new dataset you can find four tables: Events, Players, Matches and Teams, linked with foreign keys (Events — Matches, Events — Teams, Events — Players, Players — Teams). For the purpose and scope of our project, we preprocessed the dataset and dropped dimensions that are either correlated or not relevant for us and filtered out matches from the World Cup 2018.

Our meetings soon began to become design sessions, where we worked together to make different drawings of ideas that we could use, mirroring the process as we had done in class. During the initial design process we came up the following drawings, and put them on our shared miro board:

A preview of our Miro board
This was our first visualization of what it might look like to visualize all the events we have for each game. From this we were able to learn that putting all the events, even for a single game at once is too much visually. After this we decided we either needed to find a way to meaningfully reduce the number of events, or perhaps cluster them in some way.
This sankey diagram shows the distribution of events of single football match of world cup between the 2 teams and their players. The middle stack bar shows the number of events per each category (passes, shots, fouls, etc.) for a single match. The stack bars on each side show events per each player of the team. The height of each stack reflects the number of events per player (side stack bars) and per category respectively (middle stack bar).
The scatterplot shows the scores of different games during the tournament. Clicking on the mark would lead to an on-pitch visualisation of where specific events happen, coded by mark type and colour.
This idea we had come to visualize the players in some way, but we were not quite sure what would come from it. The main idea was really to try to come up with some label for each player. For example, discovering that two players which are beloved turn out to have completely different playing styles and ways in which they succeed would be very interesting.

These were our first main original designs, and we then decided to move on to the emerge process during our subsequent design sessions.

Emerge — How it Evolved:

We started clustering our ideas from the diverge phase together, based on what we thought would work well together, using the preliminary map of all events in one game we’d made in Tableau (see above).

We liked the idea of visualizing where on the pitch events occurred, but saw with the tableau visualization, that our goal of identifying key players/difference makers would get lost in the overwhelming amount of dense information. Hence, we made the decision to narrow our datapoints down to only the 2–3 min before goal event, to see which players were involved in creating goal opportunities and where on the pitch those happen.

To explore this idea further, we tried to find specific ways to implement the pitch-based visualization on a player/match/team levels.

These two visualizations follow the same pattern as above, but are sorted by team or by match respectively. We wanted to play around with different ways to reduce the overall data included in the visualization, while still keeping some sort of natural structure.
The idea here stemmed from the fact that in our first visualization on the field the events crowded everything, make it nearly indecipherable to properly which events came from where. We were trying to come up with a way to meaningfully and naturally select only some events, and that’s when we thought that perhaps it made sense to select only the events which are closely leading up to goals? We also wanted to combine the idea from the emerge phase, and our first feedback session of visualizing the players in some way. An explanation of the visualization can be found handwritten on the page itself.

What we want to keep & what we want to throw out:

We then decided to focus on visualizing game events on the football pitch as our primary visuals. A couple of different variations came up from that (see above), as we could visualize either on the player basis, match basis or team basis. We want to focus our further work on developing, improving and implementing these ideas. In later stages (converge), we should come to a decision which of these seems as the most relevant and insightful one.

We’d like to find a way to include the Sankey diagram idea in some way, ideally in a complementary fashion to the pitch-based visuals. We were thinking about using it to visualize the proportion of different events in each match and their relation to the players of each team, showing how each player contributed to the match in what different way in terms of event types. Following our feedback session, our next step is to narrow down the event types to include only those, that are actually important and describe game changing moments of football matches.

We like the simpleness of the scatterplot but are not yet sure of how to use it in connection with the pitch-based approach and if we are going to keep it. We thought of starting the storytelling with a scatterplot of goals and clicking on the scores leading to the pitch-based visualizations and Sankey diagram.

Why did we not choose certain visuals?

We had an idea of visualizing individual players and their events using a spider-web like diagram. During our research, we found an existing visualization which compares different players performance across 50 years at the World Cup. It included their passes, defensive, attacking and other options as a spider-web-meets-pi chart diagram. Within each of these four quadrants, there were smaller divisions, such as goals per game. We found that not only this spider-web kind of visual already done, but we found that you could not really see which player makes a difference in the game, based on this visual. It made it easy to see where the strengths of a certain player lay and to find players of similar style, but did not answer our research question. From this we were able to learn that a good visualization done this way would not be a great end goal as we had hoped.

Converge — Where We are Now:

Today we had our feedback session where we presented the ideas that we came up with in the diverge and emerge sections. It helped us realize that we will perhaps move away from the idea of visualizing events on the field which only pertain to the few events before the goals in lieu of exploring a clustering method.

Our next steps are to perform an exploratory factor analysis on the data and see if any meaningful structure can be found in the data that way. We also realized that we would like to further explore our sankey idea which originally had not made it past the diverge phase.

So we are now moving in two directions, we will further explore the sankey idea, and we will also considered performing an exploratory factor analysis as a means for attempting to cluster the data. However when we began to perform the EFA, we realized that it does not quite work, since our goal is not so much to reduce dimensionality of our data set. So we did some further research after the feedback session, and we learned that this problem of too many data points is quite well known in football data visualization. One suggested way of dealing with the issue is by producing interactivity, which enables looking at the problem from multiple perspectives at once, but each one at a time. So we are planning to proceed from there. However the paper suggests using Tableau, which might be an issue for us since we have been told Tableau is not one of the options for us. We are still playing around and figuring it out.

However we plan to seek further feedback from the professors, in hopes of clarifying whether we are headed in the right direction. It looks like some things are converging, but we are still figuring out exactly how, and what. Stay tuned!

--

--