Learning Relational Database & Data Visualization

Handriani Puspita
4 min readSep 6, 2021

--

Interactive data visualization of Netflix using Google Data Studio (Picture by Author)

About two years ago I joined a data analysis short class using Python and I was so amazed by how many insights we could get from mining the data. At that time, I was also trying to learn another programming language — SQL. However, due to time constraints and lack of intention, unfortunately, I decided to stop.

Until today, where most of my job is dealing with big data. I’m used to using Excel for work, yet because of the big amount of data, using Excel takes a lot of time to finish, especially if you apply various formulas. Hence, I think being able to pick up data processing and visualization skills will create a benefit in a long run. Luckily, I have such great colleagues, who wouldn’t mind sharing their knowledge and brainstorming about things we want to learn. With their help, I’m finally able to learn SQL and Google Data Studio (thank you, guys!).

What are SQL and Google Data Studio?

Quoting from this website, SQL stands for Structured Query Language, which is used for storing, retrieving, and manipulating data in databases. While Google Data Studio, it’s a tool to make interactive reporting dashboards. By using this tool, our data can be presented in easier to read format for the users.

I learned the basics of SQL from W3School, as per suggestions from tech colleagues in my current company. It helped me a lot because all the exercises are very practical. Even so, we can absorb the knowledge better by directly applying our knowledge to a small project.

As for Google Data Studio, I didn’t follow a certain course, I just directly tried the tool and figured things out through browsing for things that I wanted to know. You can also directly query your data from BigQuery if you have access, therefore you don’t need to upload the CSV files as the source for your data visualization.

Choosing The Topic For Your Project

Try to choose something that sparks your interests, such as something related to your hobby. For me, I chose to explore the Netflix data, which is available on the internet since I love binge-watching most of the movies and TV series there. I wanted to explore how many winning award movies & TV Series are available on Netflix and get an overview of the genre distribution of these movies & TV series by country.

Downloading Data Sets on Kaggle

I downloaded data sets of Netflix movies, nominees & winners of Golden Globe Awards, IMDb ratings on Kaggle, Countries’ data with latitude and longitude (it will be needed if you want to present infographics using the map). I will later combine those tables using SQLite.

Processing Raw Data on SQLite

I found that SQLite is very beginner-friendly since it’s very easy to install and use. As far as I know, there could be some difference in commands between SQLite and another relational database management system, like MySQL. Hence, if you want to use another program other than SQLite, you need to learn again and adjust.

This is one of the examples of a query that I made using SQLite. I want to combine the information from different tables and show the result only for Golden Globe-winning movies & TV series available on Netflix, including their ratings from IMDb.

SELECT DISTINCT Netflix.title, Netflix.country, Netflix.type, GoldenGlobes.category, Netflix.release_year AS Release_Year, SUBSTR(Netflix.date_added, -4) AS Netflix_year_added, RatingsIMDB.weighted_average_voteFROM NetflixINNER JOIN GoldenGlobes ON Netflix.title = GoldenGlobes.filmINNER JOIN MoviesIMDB ON MoviesIMDB.title = Netflix.titleINNER JOIN RatingsIMDB ON RatingsIMDB.imdb_title_id = MoviesIMDB.imdb_title_idWHERE GoldenGlobes.Win = "TRUE"ORDER BY RatingsIMDB.weighted_average_vote DESC;
The result from above query (Picture by Author)

Creating Data Visualization on Google Data Studio

This one is the most fun part for me! I get to explore the design of my report and make it easy to use to the viewer. We can choose the given theme or create a new one. As for my project, I think the red-black theme will be suitable since it’s the color of the Netflix brand logo. You can check out Visual Capitalist for the data visualization inspirations!

Some of the insights that I get from the visualization include the differences in movies and TV shows preferences in each region. In the USA for example, most of the movies available on Netflix are categorized under documentaries and stand-up comedy. Interestingly enough, in another country next to the USA — Mexico, crime movies and TV shows are more popular there. There’s also a different movie and TV series preference in Asia, like India and Indonesia, most of the movies available there are categorized under drama & romantic movies. I need to dig deeper to find the reason behind it, but it might be due to the cultural background as explained in Erin Meyer’s book called “The Culture Map”.

In my view, data processing and visualization are important skills that need to be acquired for people from any major. There are a lot of learning sources we can find on the internet these days. I love to look for courses on Udemy or LinkedIn Learning, because of the structured modules provided by the teachers. Youtube could also be a good learning source. However, I think it is more useful to solve a certain issue related to the topic that you already have in mind. In this case, meaning you already have a sort of basic understanding of data processing and visualization.

Anyway, whatever media that you will use for learning, what matters is don’t forget to have fun along the learning journey! :)

--

--

Handriani Puspita

Indonesian | Financial Analyst on Weekdays | Data Analytics Enthusiast | German Learner | ENFJ | Happy to share some book recommendations!