Visualizing Multiple Datasets in AWS QuickSight: Part1

Apr 8, 2019 · 7 min read

Let us analyse some important topics about Barcelona: population, accidents, transport, environment using QuickSight with a Kaggle dataset.

Note to readers:
This article largely focuses on visualisation and less on preparing data /cleaning raw data. So, we will be using well prepared/cleaned/refined public dataset from Kaggle.

Kaggle is an online community of data scientists and machine learners. Kaggle allows users to find and publish data sets, explore and build models and also enter competitions to solve data science challenges.

Image for post
Kaggle dataset list

Download dataset from kaggle:


Image for post
Kaggle data set profile


Analyse some important topics about Barcelona: population, accidents, transport, environment.

About data source:

  • population.csv. Population by neighbourhood, by quinquennial ages and by genre of the city of Barcelona (2013–2017). Reading registers of inhabitants.
  • unemployment.csv. Registered unemployment by neighbourhood and genre in the city of Barcelona (2013–2017).
  • immigrants_by_nationality.csv. Immigrants by nationality and by neighbourhoods of the city of Barcelona (2015–2017).
  • immigrants_emigrants_by_age.csv. Immigrants and emigrants by quinquennial ages and by neighbourhood of the city of Barcelona (2015–2017).
  • accidents_2017.csv. List of accidents handled by the local police in the city of Barcelona. Incorporates the number of injuries by severity, the number of vehicles and the point of impact.
  • bus_stops.csv. Bus stops, day bus stops, night bus stops, airport bus stops of the city of Barcelona.
  • transports.csv. Public transports (underground, Renfe, FGC, funicular, cable car, tramcar, etc) of the city of Barcelona.

Metrics we shall visualise

  1. Population by Year (2013–2017)
  2. Population by gender (Ratio)
  3. Accidents by month (2017)
  4. Accidents by district (2017)
  5. Stations by district (Transport)
  6. Pie chart of stations (Transport)

Files in S3

Generally, AWS S3 is used as a centralised location for all data files be it raw or refined.

I have already created a bucket and uploaded Barcelona city data files into it. We have to connect to this S3 bucket from quicksight and import data.

Image for post
S3 bucket
Image for post
Root folder
Image for post
Data files in root folder

Introduction to Amazon QuickSight

Amazon QuickSight is an Amazon Web Services utility that allows a company/individual to create and analyze visualizations of their customers’ data. It’s a reporting tool used to visualise and build beautiful & insightful dashboards.

Image for post
AWS Quicksight in a nutshell


QuickSight uses Super-fast, Parallel, In-memory Calculation Engine (SPICE) to quickly perform data calculations and create graphs.

Image for post

SPICE uses AutoGraph, It allows intuitive visualisations:

  1. Automatic detection of data types
  2. Optimal query generation
  3. Appropriate graph type selection
  4. Ability to customise the graph type
  5. Rapid response

Importing data into QuickSight

Note to readers:
Data can be imported into QuickSight in a number of ways, out of which we shall be importing directly from S3 using manifest file.

Image for post
Different datasources

Read up For Newbies:
Datalake vs Datawarehouse ⇒

Following image depicts another way of importing data from a Data Lake (structured, semi-structured & unstructured data files in S3) using AWS Glue crawler. Glue stores metadata in AWS Glue Data Catalog.

Image for post
S3 to Quicksight

This is followed by queries in Amazon Athena on Glue Data Catalog data to transform it into a required structure. Or use Amazon Redshift to create a Data Warehouse (structured, filtered or a queryable datasource). Then Finally import data into QuickSight.

I chose to import data directly from S3 because Barcelona data is well prepared.

Set permissions for access to AWS resources
Go to Manage Quicksight > Account settings

Image for post
Account settings

Account settings are only available in N. East Virginia region only, for now, so we change to that region set permissions and change back to the Oregon region.

Image for post
modify permissions for AWS services

Here, we select S3 buckets we would like to connect.

Image for post
Access permission for AWS services
Image for post
Access permission to S3 buckets

Update and then go to Manage Data (displays all your datasets in quicksight)

Creating the datasets

Now, before you can visualize anything, it has to be based on some data. Amazon will give you some datasets and analysis, so you can see what’s possible. To do anything really useful though, you’ll want to use your own data sets to do some analysis.

Image for post
New Dataset

Importing from S3:

I chose to import data directly from S3 because Barcelona data is well prepared.

Image for post
Select the type of source

Upload manifest file
Copy object URLs

Image for post
S3 object properties

Create Manifest file
Earlier we talked about using a manifest file to select files in S3 to import data. The manifest file consists of object-URLs of files we would like to import or URI prefixes of root folders with large files of identical structure/schema.


Note to readers:
Use the following manifest file for large files aggregation/import.
root folder locations in S3 (use it if you have to aggregate large files of identical structure/schema)
Format: file type (will only import data from those extensions)


Upload it, connect to S3 and it will import all the data available in files at those corresponding URL endpoints.

Image for post
Connect to datasource S3
Image for post
Preview after import
Image for post
Modify datatype of a field

Some of the fields data type may be interpreted in the wrong way

Image for post
Edit format of the date field

After updating, save the dataset.

I followed the above process and created multiple datasets(accidents, migration, transport, unemployment) now let’s visualise these datasets for insights.

Image for post
All datasets in QuickSight

Visualising imported data

The title and description can be added from here. Calculate field and parameter are two other important aspects which shall be discussed later in this article.

Image for post
add something

In the following image, we plot 2 visuals (population by year & gender). Plotting is easy in QuickSight, all you have to do is select the fields in Fields list and Select appropriate Visual types. We can also modify Field wells.

Image for post
Discovering Barcelona Analysis Dashboard

In the above picture, Fields selected for 1st visual are highlighted in Fields list. If you look closely, fields with green icons are measures and fields with blue icons are dimensions. You could convert a dimension into the measure and vice-versa.

Fields selected for 2nd visual are gender and number(population count). QuickSight automatically does the group by gender and aggregates the number field.

Edit or add new datasets to the dashboard

Image for post
edit data set
Image for post
add another dataset
Image for post
select a dataset

I have added all the datasets to the analysis, click on drop down to change the dataset for each visual.

Image for post
Choose from multiple datasets

Let’s build visuals for other metrics.

Firstly accidents by month by part of the day. Click on add visual, then choose fields to plot. Click on the drop down and format each visual.

Image for post
add visual

We could also format the visual like editing labels, legend and changing the color of each metric.

Image for post
Barcelona accidents

Another interesting plot can be Accidents per district.

Image for post
Barcelona accidents per district

Let’s analyse the transport sector in Barcelona.

As of now, Geospatial charts in Amazon QuickSight aren’t supported in some geographies, including India and China. Automatic geocoding works only for US locations. However, we can add latitude and longitude coordinates to our data to make geospatial charts.

So, we shall be using other visual types to analyse transport metrics.

Image for post
Barcelona transport

I will cover these other important topics like the filter, control, parameter, calculated field in part2.

Thanks for the read!

This story is authored by Koushik. He is a software engineer and a keen data science and machine learning enthusiast.


AI | Machine Learning | Big Data