Engineering@ZenOfAI
Apr 8 · 7 min read

Let us analyse some important topics about Barcelona: population, accidents, transport, environment using QuickSight with a Kaggle dataset.

Note to readers:
This article largely focuses on visualisation and less on preparing data /cleaning raw data. So, we will be using well prepared/cleaned/refined public dataset from Kaggle.

Kaggle is an online community of data scientists and machine learners. Kaggle allows users to find and publish data sets, explore and build models and also enter competitions to solve data science challenges.

https://www.kaggle.com/datasets

Kaggle dataset list

Download dataset from kaggle:

Datasource: https://www.kaggle.com/xvivancos/barcelona-data-sets

Kaggle data set profile

Context:

Analyse some important topics about Barcelona: population, accidents, transport, environment.

About data source:

  • population.csv. Population by neighbourhood, by quinquennial ages and by genre of the city of Barcelona (2013–2017). Reading registers of inhabitants.
  • unemployment.csv. Registered unemployment by neighbourhood and genre in the city of Barcelona (2013–2017).
  • immigrants_by_nationality.csv. Immigrants by nationality and by neighbourhoods of the city of Barcelona (2015–2017).
  • immigrants_emigrants_by_age.csv. Immigrants and emigrants by quinquennial ages and by neighbourhood of the city of Barcelona (2015–2017).
  • accidents_2017.csv. List of accidents handled by the local police in the city of Barcelona. Incorporates the number of injuries by severity, the number of vehicles and the point of impact.
  • bus_stops.csv. Bus stops, day bus stops, night bus stops, airport bus stops of the city of Barcelona.
  • transports.csv. Public transports (underground, Renfe, FGC, funicular, cable car, tramcar, etc) of the city of Barcelona.

Metrics we shall visualise

  1. Population by Year (2013–2017)
  2. Population by gender (Ratio)
  3. Accidents by month (2017)
  4. Accidents by district (2017)
  5. Stations by district (Transport)
  6. Pie chart of stations (Transport)

Files in S3

Generally, AWS S3 is used as a centralised location for all data files be it raw or refined.

I have already created a bucket and uploaded Barcelona city data files into it. We have to connect to this S3 bucket from quicksight and import data.

S3 bucket
Root folder
Data files in root folder

Introduction to Amazon QuickSight

Amazon QuickSight is an Amazon Web Services utility that allows a company/individual to create and analyze visualizations of their customers’ data. It’s a reporting tool used to visualise and build beautiful & insightful dashboards.

AWS Quicksight in a nutshell

SPICE

QuickSight uses Super-fast, Parallel, In-memory Calculation Engine (SPICE) to quickly perform data calculations and create graphs.

SPICE

SPICE uses AutoGraph, It allows intuitive visualisations:

  1. Automatic detection of data types
  2. Optimal query generation
  3. Appropriate graph type selection
  4. Ability to customise the graph type
  5. Rapid response

Importing data into QuickSight

Note to readers:
Data can be imported into QuickSight in a number of ways, out of which we shall be importing directly from S3 using manifest file.

Different datasources

Read up For Newbies:
Datalake vs Datawarehouse ⇒ https://www.guru99.com/data-lake-vs-data-warehouse.html

Following image depicts another way of importing data from a Data Lake (structured, semi-structured & unstructured data files in S3) using AWS Glue crawler. Glue stores metadata in AWS Glue Data Catalog.

S3 to Quicksight

This is followed by queries in Amazon Athena on Glue Data Catalog data to transform it into a required structure. Or use Amazon Redshift to create a Data Warehouse (structured, filtered or a queryable datasource). Then Finally import data into QuickSight.

I chose to import data directly from S3 because Barcelona data is well prepared.

Set permissions for access to AWS resources
Go to Manage Quicksight > Account settings

Account settings

Account settings are only available in N. East Virginia region only, for now, so we change to that region set permissions and change back to the Oregon region.

modify permissions for AWS services

Here, we select S3 buckets we would like to connect.

Access permission for AWS services
Access permission to S3 buckets

Update and then go to Manage Data (displays all your datasets in quicksight)


Creating the datasets

Now, before you can visualize anything, it has to be based on some data. Amazon will give you some datasets and analysis, so you can see what’s possible. To do anything really useful though, you’ll want to use your own data sets to do some analysis.

New Dataset

Importing from S3:

I chose to import data directly from S3 because Barcelona data is well prepared.

Select the type of source

Upload manifest file
Copy object URLs

S3 object properties

Create Manifest file
Earlier we talked about using a manifest file to select files in S3 to import data. The manifest file consists of object-URLs of files we would like to import or URI prefixes of root folders with large files of identical structure/schema.

{  
"entries":[
{
"url":"https://s3-us-west-2.amazonaws.com/aa-barcelona-data/barcelona-data-sets/population.csv",
"mandatory":true
}
]
}

Note to readers:
Use the following manifest file for large files aggregation/import.
URIprefixes:
root folder locations in S3 (use it if you have to aggregate large files of identical structure/schema)
Format: file type (will only import data from those extensions)

{  
"fileLocations":[
{
"URIPrefixes":[
"https://s3-us-west-2.amazonaws.com/aa-barcelona-data/barcelona-data-sets/"
]
}
],
"globalUploadSettings":{
"format":"CSV"
}
}

Upload it, connect to S3 and it will import all the data available in files at those corresponding URL endpoints.

Connect to datasource S3
Preview after import
Modify datatype of a field

Some of the fields data type may be interpreted in the wrong way

Edit format of the date field

After updating, save the dataset.

I followed the above process and created multiple datasets(accidents, migration, transport, unemployment) now let’s visualise these datasets for insights.

All datasets in QuickSight

Visualising imported data

The title and description can be added from here. Calculate field and parameter are two other important aspects which shall be discussed later in this article.

add something

In the following image, we plot 2 visuals (population by year & gender). Plotting is easy in QuickSight, all you have to do is select the fields in Fields list and Select appropriate Visual types. We can also modify Field wells.

Discovering Barcelona Analysis Dashboard

In the above picture, Fields selected for 1st visual are highlighted in Fields list. If you look closely, fields with green icons are measures and fields with blue icons are dimensions. You could convert a dimension into the measure and vice-versa.

Fields selected for 2nd visual are gender and number(population count). QuickSight automatically does the group by gender and aggregates the number field.

Edit or add new datasets to the dashboard

edit data set
add another dataset
select a dataset

I have added all the datasets to the analysis, click on drop down to change the dataset for each visual.

Choose from multiple datasets

Let’s build visuals for other metrics.

Firstly accidents by month by part of the day. Click on add visual, then choose fields to plot. Click on the drop down and format each visual.

add visual

We could also format the visual like editing labels, legend and changing the color of each metric.

Barcelona accidents

Another interesting plot can be Accidents per district.

Barcelona accidents per district

Let’s analyse the transport sector in Barcelona.

As of now, Geospatial charts in Amazon QuickSight aren’t supported in some geographies, including India and China. Automatic geocoding works only for US locations. However, we can add latitude and longitude coordinates to our data to make geospatial charts.

So, we shall be using other visual types to analyse transport metrics.

Barcelona transport

I will cover these other important topics like the filter, control, parameter, calculated field in part2.

Thanks for the read!

This story is authored by Koushik Busim. Koushik is a software engineer and a keen data science and machine learning enthusiast.

ZenOf.AI

AI | Machine Learning | Big Data

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade