FIFA World Cup Statistics: A Data Science Perspective — Part II

Record Evolution
7 min readJun 14, 2018

--

Image by Arif Wahid

Data-driven Historical Stats from the FIFA World Cup between 1930 and 2014 — Reporting and Analyzing with Repods

Author: Rafael Queiroz, Full-Stack Developer and IT-Infrastructure Management @Record Evolution GmbH <rafael.queiroz@record-evolution.de>

In the second part of this article, we are going to use our previously created FIFA World Cup Analysis Pod to create Reports and Workbooks. We will be going deeper into the FIFA dataset to glean facts about an event that unites people from across the globe and is followed by over a billion soccer fans.

Before we start, you might want to check out the first steps:

You can also check out the live version of our Data Pod here.

Table of Contents

4. Reporting

Repods has a powerful reporting tool that includes SQL Workbooks. The reporting tool allows us to navigate through the data and its relations in an intuitive tree structure.

Inside the Analysis (Alt+4) section, we can have a look at some of the pre-created reports as shown in Figure 4.1:

4.1. Reports Overview

All Reports and Workbooks are listed from the newest to the oldest.

The Reports section contains a tree-structure overview that enables us to navigate through relations and generate queries.

The Workbooks section allows us to add text (plain text, markdown, html), execute and visualize SQL queries.

To start, let's take the World Cup Winners as an example. As we can see in Figure 4.2, there are four main areas that will help us during the reporting process:

  • The top bar shows the Pod name, a search panel, as well as the current user and other users online.
  • Just below the top bar, you can set the time range of our analysis. Here we can select the beginning and the time scale (per century, decade, year, quarter, month, week, day, hour, minute or second).
  • In the center, there are all the available data entities in the Pod. We can click on each of them and they will show all the attributes they have. To add an attribute to a report, we can click on it and choose one of the offered report options.
  • At the bottom, we can also see the generated chart, the respective result, and the automatically created SQL query of the respective result.
4.2. Reports in Detail

For instance, we want to know all the countries that have won the World Cup and the number of times they have won the World Cup. Therefore, we can simply click on the World Cup entity, select the field Winner, and apply two of the available options: Detail (Group by) and Count.

Repods places all data into the time dimension. So we will always see data related to a time axis. When selecting an attribute, there are two different categories of options to choose from — State Related or Point in Time. Not all available options are shown as these depend on the data type and kind of attribute.

  • In the State Related option, we can see information about a whole period of time, including the changes throughout the period. For instance, if we want to count the stadiums per city within a year, we would still count a stadium that has been demolished mid-year — even though it would not exist anymore at the end of the year.
  • In the Point in Time option, we can analyze the data at an exact point in time, as a cut or snapshot of the data to the precision of milliseconds. Point in Time transformations are typically easier for the database but they are usually meaningless for event tables.

In the next step, let's have some fun finding out about the World Champions!

5. Incrementing with More Data

In our previous tutorial, we have used FIFA data only — but some important information is missing. For instance, we have the venues where the matches have taken place and the cities, but there is no dataset correlating the cities with the countries. So let's import the dataset World Major Cities (provided by Simple Maps here).

After importing this data, we need to adjust our model to receive the new data. So we can simply go to Core Data (Alt+3) select the City entity, and add the fields Province/State, Latitude and Longitude.

With the model ready, we can go back to Pipes (Alt+2) and create a new pipe to load the cities. In the Pipe Transformation, modify the auto-generated fields according to the Code Snippet 5.1.

SELECT
r1."city_ascii" as "Name",
'-infinity'::timestamptz as VALID_FROM,
'infinity'::timestamptz as VALID_TO,
r1."iso3" as "Country",
r1."province" as "Province/State",
r1."lat" as "Latitude",
r1."lng" as "Longitude"
FROM
S_WORLDCITIES_BASIC r1

Code Snippet 5.1. Updating the Cities Information with External Data

To speed up, let’s complete the other Core Tables with the data we have available in the Raw Tables. You can find all the Pipe Transformations below:

Code Snippet 5.2. Updating all Models https://gist.github.com/querafael/8f7609be1a4cc9bc7bc71bb632280876

Now that we are done with Reports, let’s play with Workbooks in the next section.

6. Playing with Workbooks

With all the data at hand (or a copy from an existing Pod), let's run a short overview to see all countries that have ever won the World Cup. We have the online version here, but we recommend that you create your own Pod to experiment.

Let's open Analysis (Alt+4) and click on Add Workbook. Here we can name the Workbook and start writing SQL queries immediately.

Figure 6.1. Empty Workbook View

Every new Workbook comes with a select ‘hello world’ statement and the option to add new queries or Markdown text. It looks pretty boring when empty but we can make it incredibly nice once we add some simple information.

To start, let’s add some information about what we want to see. Click on the Tt icon to add text and add the content of Code Snippet 6.1:

<img src=https://cdn1.iconfinder.com/data/icons/ensign-11/512/38_Ensign_Flag_Nation_brazil-512.png" width=”256px align=”right”></img><h2> Brazilian Victories — 5 Cups</h2>**FIFA RANK 2nd**20 APPEARANCES / 12 QUALIFYING CAMPAIGNS**FIRST FWC:** 1930 **LAST FWC:** 2014**BEST ACHIEVEMENT:** Champion 1958, 1962, 1970, 1994, 2002**Last victory:** 2002

Code Snippet 6.1. Brazilian Victories Info in Markdown and HTML

As we can see, this piece of code adds a picture and a title with HTML and a text using markdown syntax. Just below, we can add a small SQL query to retrieve the World Cup info about the winner countries:

SELECT 
“ID” as “Year”,
“Winner”, “Second”, “Third”, “Fourth”,
“Country”, “Name” as “Host Country”, “Matches Played”, “Goals Scored”,
“Attendance”
FROM T_WORLDCUP
WHERE “Winner” = ‘Brazil’

Code Snippet 6.2. SQL Query Example for World Cups where Brazil Won

As we can see in Code Snippet 6.2, the target tables must have a prefix "T_" that identifies that we have used the target table to search. You can learn more about this in our first article.

The result of that small piece of code can be seen on Figure 6.2. below:

6.2. Workbook with Metadata, Query, and Result Table

We can simply press the play button on the side to execute the query and see the desired information. It is also possible to hide/show the query and resize the panel.

Let’s create more panels to include some other World Champions: Germany, Italy, Argentina, Uruguay, England, France, and Spain.

6.3. Germany Workbook: observe that the country name has changed over time

Less than three decades ago, Germany, current World Champion and first in the FIFA Rank, was two countries — so it is a tricky query to be able to include the victories from the former Germany FR.

SELECT 
“ID” as “Year”,
“Winner”, “Second”, “Third”, “Fourth”,
“Country”, “Name” as “Host Country”, “Matches Played”, “Goals Scored”,
“Attendance”
FROM T_WORLDCUP
WHERE “Winner” = ‘Germany’
OR “Winner” = ‘Germany FR’
OR “Winner” = ‘Germany DR'

Code Snippet 6.3. SQL Query for German Victories

6.4. Italy Workbook

7. Adding Other Queries

We can proceed with the SQL queries for the other countries as shown in Code Snippet 7.1.

Code Snippet 7.1. Champions Selections https://gist.github.com/querafael/f038faeb3ed4f90493733abaae6c23f2

We can also take a look at the HTML/Markdown to include their flags and infos. It gives the Workbook a more refined look and helps to explain the information context. You can get all of them here.

That's it for now! In the next step, we will create infographics for a more interactive data analysis. You can learn more about this in our next article Data Visualization with Repods.

Repods is a data platform where you create and manage Data Pods. Data Pods are compact data warehouses with flexible Storage, vCores, and Memory plus all the tooling needed for complex data analyses.

Don’t forget to sign up for free at https://repods.io. You can find an online version of the Data Pod used for this tutorial here.

--

--