Visualizing Multiple Datasets in AWS QuickSight: Part1
Let us analyse some important topics about Barcelona: population, accidents, transport, environment using QuickSight with a Kaggle dataset.
Note to readers:
This article largely focuses on visualisation and less on preparing data /cleaning raw data. So, we will be using well prepared/cleaned/refined public dataset from Kaggle.
Kaggle is an online community of data scientists and machine learners. Kaggle allows users to find and publish data sets, explore and build models and also enter competitions to solve data science challenges.
https://www.kaggle.com/datasets
Download dataset from kaggle:
Datasource: https://www.kaggle.com/xvivancos/barcelona-data-sets
Context:
Analyse some important topics about Barcelona: population, accidents, transport, environment.
About data source:
- population.csv. Population by neighbourhood, by quinquennial ages and by genre of the city of Barcelona (2013–2017). Reading registers of inhabitants.
- unemployment.csv. Registered unemployment by neighbourhood and genre in the city of Barcelona (2013–2017).
- immigrants_by_nationality.csv. Immigrants by nationality and by neighbourhoods of the city of Barcelona (2015–2017).
- immigrants_emigrants_by_age.csv. Immigrants and emigrants by quinquennial ages and by neighbourhood of the city of Barcelona (2015–2017).
- accidents_2017.csv. List of accidents handled by the local police in the city of Barcelona. Incorporates the number of injuries by severity, the number of vehicles and the point of impact.
- bus_stops.csv. Bus stops, day bus stops, night bus stops, airport bus stops of the city of Barcelona.
- transports.csv. Public transports (underground, Renfe, FGC, funicular, cable car, tramcar, etc) of the city of Barcelona.
Metrics we shall visualise
- Population by Year (2013–2017)
- Population by gender (Ratio)
- Accidents by month (2017)
- Accidents by district (2017)
- Stations by district (Transport)
- Pie chart of stations (Transport)
Files in S3
Generally, AWS S3 is used as a centralised location for all data files be it raw or refined.
I have already created a bucket and uploaded Barcelona city data files into it. We have to connect to this S3 bucket from quicksight and import data.
Introduction to Amazon QuickSight
Amazon QuickSight is an Amazon Web Services utility that allows a company/individual to create and analyze visualizations of their customers’ data. It’s a reporting tool used to visualise and build beautiful & insightful dashboards.
SPICE
QuickSight uses Super-fast, Parallel, In-memory Calculation Engine (SPICE) to quickly perform data calculations and create graphs.
SPICE uses AutoGraph, It allows intuitive visualisations:
- Automatic detection of data types
- Optimal query generation
- Appropriate graph type selection
- Ability to customise the graph type
- Rapid response
Importing data into QuickSight
Note to readers:
Data can be imported into QuickSight in a number of ways, out of which we shall be importing directly from S3 using manifest file.
Read up For Newbies:
Datalake vs Datawarehouse ⇒ https://www.guru99.com/data-lake-vs-data-warehouse.html
Following image depicts another way of importing data from a Data Lake (structured, semi-structured & unstructured data files in S3) using AWS Glue crawler. Glue stores metadata in AWS Glue Data Catalog.
This is followed by queries in Amazon Athena on Glue Data Catalog data to transform it into a required structure. Or use Amazon Redshift to create a Data Warehouse (structured, filtered or a queryable datasource). Then Finally import data into QuickSight.
I chose to import data directly from S3 because Barcelona data is well prepared.
Set permissions for access to AWS resources
Go to Manage Quicksight > Account settings
Account settings are only available in N. East Virginia region only, for now, so we change to that region set permissions and change back to the Oregon region.
Here, we select S3 buckets we would like to connect.
Update and then go to Manage Data (displays all your datasets in quicksight)
Creating the datasets
Now, before you can visualize anything, it has to be based on some data. Amazon will give you some datasets and analysis, so you can see what’s possible. To do anything really useful though, you’ll want to use your own data sets to do some analysis.
Importing from S3:
I chose to import data directly from S3 because Barcelona data is well prepared.
Upload manifest file
Copy object URLs
Create Manifest file
Earlier we talked about using a manifest file to select files in S3 to import data. The manifest file consists of object-URLs of files we would like to import or URI prefixes of root folders with large files of identical structure/schema.
{
"entries":[
{
"url":"https://s3-us-west-2.amazonaws.com/aa-barcelona-data/barcelona-data-sets/population.csv",
"mandatory":true
}
]
}
Note to readers:
Use the following manifest file for large files aggregation/import.
URIprefixes: root folder locations in S3 (use it if you have to aggregate large files of identical structure/schema)
Format: file type (will only import data from those extensions)
{
"fileLocations":[
{
"URIPrefixes":[
"https://s3-us-west-2.amazonaws.com/aa-barcelona-data/barcelona-data-sets/"
]
}
],
"globalUploadSettings":{
"format":"CSV"
}
}
Upload it, connect to S3 and it will import all the data available in files at those corresponding URL endpoints.
Some of the fields data type may be interpreted in the wrong way
After updating, save the dataset.
I followed the above process and created multiple datasets(accidents, migration, transport, unemployment) now let’s visualise these datasets for insights.
Visualising imported data
The title and description can be added from here. Calculate field and parameter are two other important aspects which shall be discussed later in this article.
In the following image, we plot 2 visuals (population by year & gender). Plotting is easy in QuickSight, all you have to do is select the fields in Fields list and Select appropriate Visual types. We can also modify Field wells.
In the above picture, Fields selected for 1st visual are highlighted in Fields list. If you look closely, fields with green icons are measures and fields with blue icons are dimensions. You could convert a dimension into the measure and vice-versa.
Fields selected for 2nd visual are gender and number(population count). QuickSight automatically does the group by gender and aggregates the number field.
Edit or add new datasets to the dashboard
I have added all the datasets to the analysis, click on drop down to change the dataset for each visual.
Let’s build visuals for other metrics.
Firstly accidents by month by part of the day. Click on add visual, then choose fields to plot. Click on the drop down and format each visual.
We could also format the visual like editing labels, legend and changing the color of each metric.
Another interesting plot can be Accidents per district.
Let’s analyse the transport sector in Barcelona.
As of now, Geospatial charts in Amazon QuickSight aren’t supported in some geographies, including India and China. Automatic geocoding works only for US locations. However, we can add latitude and longitude coordinates to our data to make geospatial charts.
So, we shall be using other visual types to analyse transport metrics.
I will cover these other important topics like the filter, control, parameter, calculated field in part2.
Thanks for the read!
This story is authored by Koushik. He is a software engineer and a keen data science and machine learning enthusiast.