SQL and Tableau: Data visualization

An end-to-end tutorial for beginners about how to prepare raw data for a visual analysis

5 min readMar 27, 2022

--

When I decided I was going to live in Ireland, coming from Brazil, I knew I’d have some issues getting used to the weather so in order to get an overview about climate conditions in the country, I decided to create this project to help people seeking help during their first steps in analytics. Here therefore you can see how I prepared a visualization of Irish weather data using SQL Server and Tableau.

To make it easier to follow, the process is organized in five sections covering the main steps to manipulate and visualize raw data:

Data source
Set up
Raw data and prototypes
Dataset handling
Dashboard construction

SQL Server is used here to prepare the data once it is suitable to both simple and complex tasks envolving large datasets. The final result was built using Tableau Public, an eficient tool for building effective and shareble visualizations

1. Data source

Before we start, I would like to introduce you (in case you don’t know it yet) to Kaggle, a platform destinated to the data analysis community, where you can find this dataset and a lot of other interesting data.

When it comes to the weather data used on this project, you can find it available on Kaggle or download your own set in the The Irish Meteorological Service website.

2. Set up

First of all, once the .csv file is downloaded, you can see below how to open it using SQL Management Studio and create a new table with its information. At the end of the importing process, it will be possible to visualize and work with the raw data.

You can upload a file creating a new table in a database through the option “Import Flat File”

A new window with an assistant will show the steps to upload the file

Once the table is available, it’s possible to use queries to view or edit it

3. Raw data and prototypes

It’s always good to remember that a good way to start working with any data is knowing what are the main goals. Although you might change ideas while working on it, deeply knowing your original dataset and the needed columns for visualization is the best first step.

That said, I also highly recommend the construction of a dashboard mockup, that helps us to determine which information is useful and also which operations have to be done in order to bring the right metrics to the visualization tool.

The sketching is quick, risk-free and helps to think about our main goals

4. Dataset handling

To perform the analysis itself, my main interests were temperature, wind and rain, so the first queries were dedicated to bring those columns and filter the information related to 2019 in order to reduce the amount of data I’d deal with.

Also, location indicators like “county”, “station”, “latitude” and “longitude” were kept and time markers like “date” and “time” were created in the “hrly_data_2019” table.

Besides the actual weahter information, I also created a table with counties and respective geolocation considering the mean of latitude and longitude of stations.

Now, it’s time to organize the weather data, so I first aggregated rain, temperature and wind speed by county, calculating the average for these metrics in cases with more than one station per county.

After this, I aggregated the data again, sinthetinzing the hourly data into daily information because I don’t need so much detail for the overview. For this, each metric is calculated in a different way, once for rain, you must sum up the total precipitation amount but for wind speed, for example, I am getting the highest value.

Last but not least, in order to shape the data according to my visualization purposes, the last query was built to join the data information to the counties geolocation using a JOIN statement.

In addition, the CASE WHEN statement was used here to created classification columns considering existing columns values.

5. Dashboard construction

Last but not least this project’s final step consists on the creation of the dashboard using Tableau Public, a powerful tool when it comes to building insights and sharing them.

For this tutorial, I exported the data and imported it mannually into Tableau but if you have enough budget for a licence, you should be able to connect the tools as it’s shown here.

To read the data on Tableau, you first must export the dataset from SQL Server as a compatible to be used as a data source.

To import the file to Tableau, you must then select the correspondent type to set it as a new connection

Once the new data source is connect, you must be able to drag the tables you want to use in the views

With tables available, you can then build visualizations through Tableau tools, organized by the following parts:

Side bar: contains a data pane with metrics (divided between dimensions and measures)
Sheets: used to create views using the metrics from the data source
Dashboards: used to gather and organize views from multiple sheets

The views can be created by clicking and draging metrics, which can be used by Tableau to suggest the best type of chart.

Any views can be created dragging metrics to the worksheet and selecting which information is going to be shown in the labels, for example

Once all the visualizations are created, dashboards are useful tools to organize them and build sets of views to communicate ideas.

The dashboards can be created using the existing views dragging them and adjusting the layout to organize the information

All done, the work can be save to Tableau Public, where other people can check out, download and contribute to your discoveries.

That’s all folks! As I said, my intentions here are to cover the basics and show the main steps to make the process of visualization tangible for beginners. However, feel free to reach out in case of any questions about the projects or the tools. I’ll be happy to help!

Also, if you want to connect, this is my LinkedIn profile. Cheers!