How I analyzed DVD rental data with SQL

Okoh Anita
We’ve moved to freeCodeCamp.org/news
7 min readDec 20, 2018

Introduction

I recently completed some training in Data Foundation facilitated by Bertelsmann’s School of Data Science (in partnership with Udacity). For a personal project, I decided to analyze the database for a DVD rental company we will call Rent A Film. Let’s take a look at a case study detailing my process and output.

Data-set

I began by taking a look at the database. The database DvdRental has 15 tables. Below are the different tables and a brief description of them.

  • actor — contains actors data including first name and last name.
  • film — contains films data such as title, release year, length, rating, etc.
  • film_actor — contains the relationships between films and actors.
  • category — contains film’s categories data.
  • film_category — containing the relationships between films and categories.
  • store — contains the store data including manager staff and address.
  • inventory — stores inventory data.
  • rental — stores rental data.
  • payment — stores customer’s payments.
  • staff — stores staff data.
  • customer — stores customer’s data.
  • address — stores address data for staff and customers
  • city — stores the city names.
  • country — stores the country names.

Note: I analyzed this database using PostgreSQL. You can get details to install PostgreSQL here and download the DVD rental database here.

Objective & Goals

In this project, I’ll aim to answer the following questions:

  1. What are the top and least rented (in-demand) genres and what are their total sales?
  2. Can we know how many distinct users have rented each genre?
  3. What is the average rental rate for each genre? (from the highest to the lowest)
  4. How many rented films were returned late, early, and on time?
  5. In which countries does Rent A Film have a presence and what is the customer base in each country? What are the total sales in each country? (from most to least)
  6. Who are the top 5 customers per total sales and can we get their details just in case Rent A Film wants to reward them?

Before getting started with analyses, I first tried understanding the ERM (Entity Relationship Model) of this database also known as Schema. Here is the Schema below:

DVD RENTAL SCHEMA

You can view my code on my GitHub profile here.

Analysis

To answer the first questionWhat are the top and least rented (in-demand) genres and what are what are their total sales?”, I first identified with tables I would need to Join, which are:

Category >film_Category >film>inventory>rental >customer >payment

Below is the query I used to extract to answer the question:

Insights

From the above table, we can draw 3 major insights:

  • Rent A Film has 16 available genres
  • The sports category seems to be the most rented genre in terms of the number of times being rented, and it also has the highest total sales in terms of money.
  • The music category is the least rented genre in terms of the number of times being rented and has the lowest total sales in terms of money

Question 2: Can we know how many distinct users have rented each genre? In short, yes we can.

The tables to join are as follows:

Category > film_Category > film > inventory > rental > customer

Below is my query for this question:

Insights

I wanted to know how many distinct customers rented each of the genres. One fascinating thing from the query is that although the music genre has the least total rented record, it does not have the least number of distinct customers who rented the genre. The travel genre holds that record.

By taking a step back and connecting the insights derived from question 1 and 2, we can say that the travel genre was re-rented more times than the music genre.

And of course, the sports genre has the highest number of distinct customers who rented the genre.

Question 3: What is the Average rental rate for each genre? (from the highest to the lowest)

The tables to join are as follows:

Category > film_Category > film

Below is my query for this question:

Insights

I went ahead to see if the number of times a category has been rented has anything to do the average rental rate of each genre. From the above table, we can easily conclude that average rental rate may not be a factor.

Although the game genre has the lowest average rental rate, it is one of the top five most rented genre. Surprisingly, the Music genre is not the most expensive — Action is, even though the action genre is one of the most rented genres.

We can safely say that most of the customers are lovers of sport-related movies and are least interested in musical movies.

Question 4: How many rented films were returned late, early and on time?

The tables to join are as follows:

film > inventory > rental

Insights

The return status of movies is arguably is one the most important aspects to monitor in a DVD rental business. From the above query, 48% of the movies are returned earlier than the due date while 41% of the movies are returned late and 11% arrive on time.

There could be a number of factors for why this could be happening, like the shipping distance of these movies from stores which could be totally out of the control of the customers and so on. We would need to dive deeper into the data to get the gist of the issue.

However, it is wise to note that a significant percentage of movies are returned late. Introducing a penalty fee for late arrival could be an extra source of income and in turn, discourage late returns.

But such a decision can only make sense if we know why the issue is occurring.

Question 5: In which countries do Rent A Film have a presence in and what is the customer base in each country? What are the total sales in each country? (From most to least)

The tables to join are as follows:

Country > City > Address > customer > payment

See query below:

Insights

Rent A Film has a presence in 108 countries with India having the highest customer base of 60 customers and the largest total sales in terms of money. Afghanistan has the smallest total sales in terms of money, even though it is not the only country with the smallest customer base of 1 customer

Question 6: Who are the top 5 customers per total sales and can we get their detail just in case Rent A Film wants to reward them?

The tables to join are as follows:

Country > City > Address > customer > payment

See query below:

Insights

Assuming we wish to reward or send physical gifts to the top customers, the above table shows their full names, addresses, email etc.

This information can be sent to the marketing team of the company in order to use their domain knowledge to decide how to reward them.

Conclusion

In this project, we analyzed data from a DVD rental company we decided to call ‘Rent A Film’ to find insights about the customers and their preference. We got 3 major conclusions:

  1. The company has sport-loving customers and they would be advisable to stock more sport-related films to increase total sales compared to music-related movies. It would be a good idea to increase the average rental rate of sport genre films since it is not a major factor in renting for the customers. This, in turn, increases total revenue. However further analysis needs to be done to conclude on this.
  2. There is potential to have an extra source of revenue through a fee on late film returns.
  3. Rent A Film has a presence in 108 countries with India been the largest market in people and revenue. Additionally, 20% of the countries they have presence in contribute to 80% of the total customer base.

P.S Like me, anyone can learn to be a data analyst and if you want to be notified on my next project or updates on my learning, feel free to sign up to my newsletter

--

--

Okoh Anita
We’ve moved to freeCodeCamp.org/news

Full-Stack Data Scientist | Building Data Product with Python, Analytics, and Machine Learning | New Technologies Obsessed| Self-care Conscious