Analyzing the MavenMovies Dataset

A SQL Project using MySQL

Sarah Rajani
13 min readFeb 16, 2023
Photo by Brett Jordan on Unsplash

I took the Maven Analytics “MySQL For Data Analysis” course to learn SQL fundamentals using MySQL. In the course, we are given a project to use our SQL skills to help answer the given business problems.

Premise

In this project, my role is co-owner of a DVD rental store, which I purchased with my rich Uncle Jimmy.

As a new owner, I need to learn everything about the business, including the product inventory, staff, and customer demographics. Since the current employees of the store are not able to provide much help, I need to access the Maven Movies SQL database and learn everything I can, on my own.

The project is divided into two parts:

  1. Answer questions from the insurance company.
  2. Answer questions from a new business prospect.

Importing the Dataset

The first step is to import the database called ‘create_mavenmovies.sql’ into MySQL, and label it ‘mavenmovies’. The full database can be found here.

While doing a cursory review of the data, we see that the Maven Movies database contains 16 related tables, consisting of information on the company’s customers, business, and inventory.

The diagram below shows the relationships between the 16 tables.

After a look through all of the tables to get a feel for the data, it was time to find out what questions needed to be answered.

Check out GitHub for the database and SQL statements carried out.

Project Part 1

The Scenario

The company’s insurance policy is up for renewal and the insurance company’s underwriters need some updated information from us before they will issue a new policy.

The Letter

Dear Maven Movies Management,

In our review of your policy renewal application, we have realized that your business information has not been updated in a number of years. In order to accurately assess the risk and approve your policy renewal, we will need you to provide all of the following information.

Sincerely,

Joe Scardycat,

Lead Underwriter

The Goal

Use MySQL to:

Leverage your SQL skills to extract and analyze data from various tables in the Maven Movies database to answer the underwriters’ questions. Each question can be answered by querying just one table. Part of your job as an Analyst is figuring out which table to use.

The Business Questions

  1. We will need a list of all staff members, including their first and last names, email addresses, and the store identification number where they work.
  2. We will need separate counts of inventory items held at each of your two stores.
  3. We will need a count of active customers for each of your stores. Separately, please.
  4. In order to assess the liability of a data breach, we will need you to provide a count of all customer email addresses stored in the database.
  5. We are interested in how diverse your film offering is as a means of understanding how likely you are to keep customers engaged in the future. Please provide a count of unique film titles you have in inventory at each store and then provide a count of the unique categories of films you provide.
  6. We would like to understand the replacement cost of your films. Please provide the replacement cost for the film that is least expensive to replace, the most expensive to replace, and the average of all films you carry.
  7. We are interested in having you put payment monitoring systems and maximum payment processing restrictions in place in order to minimize the future risk of fraud by your staff. Please provide the average payment you process, as well as the maximum payment you have processed.
  8. We would like to better understand what your customer base looks like. Please provide a list of all customer identification values, with a count of rentals they have made all time, with your highest volume customers at the top of the list.

The Analysis and Results

Full SQL code can be found here.

Here are the SQL statements and results:

  1. We will need a list of all staff members, including their first and last names, email addresses, and the store identification number where they work.
-- 1. List of all staff members including first and last names, email addresses, and store identification number where they work

SELECT
first_name,
last_name,
email,
store_id
FROM staff;

2. We will need separate counts of inventory items held at each of your two stores.

-- 2. Separate counts of inventory items held at each of the two stores

SELECT
store_id,
COUNT(inventory_id) AS inventory_items
FROM inventory
GROUP BY
store_id;

3. We will need a count of active customers for each of your stores. Separately, please.

-- 3. Count of active customers for each of the stores, separately

SELECT
store_id,
COUNT(customer_id) AS active_customers
FROM customer
WHERE active = 1
GROUP BY
store_id;

4. In order to assess the liability of a data breach, we will need you to provide a count of all customer email addresses stored in the database.

-- 4. Provide a count of all customer email addresses stored in the database

SELECT
COUNT(email) AS emails
FROM customer;

5. We are interested in how diverse your film offering is as a means of understanding how likely you are to keep customers engaged in the future. Please provide a count of unique film titles you have in inventory at each store and then provide a count of the unique categories of films you provide.

-- 5. Count of unique film titles you have in inventory at each store and then provide a count of the unique categories of films you provide

SELECT
store_id,
COUNT(DISTINCT film_id) AS unique_films
FROM inventory
GROUP BY
store_id;

SELECT
COUNT(DISTINCT name) AS unique_categories
FROM category;

6. We would like to understand the replacement cost of your films. Please provide the replacement cost for the film that is least expensive to replace, the most expensive to replace, and the average of all films you carry.

-- 6. Provide replacement cost for the film that is least expensive to replace, the most expensive to replace, and average of all films you carry

SELECT
MIN(replacement_cost) AS least_expensive,
MAX(replacement_cost) AS most_expensive,
AVG(replacement_cost) AS average_replacement_cost
FROM film;

7. We are interested in having you put payment monitoring systems and maximum payment processing restrictions in place in order to minimize the future risk of fraud by your staff. Please provide the average payment you process, as well as the maximum payment you have processed.

-- 7. Average payment you process, as well as the maximum payment you have processed

SELECT
AVG(amount) AS average_payment,
MAX(amount) AS max_payment
FROM payment;

8. We would like to better understand what your customer base looks like. Please provide a list of all customer identification values, with a count of rentals they have made all time, with your highest volume customers at the top of the list.

-- 8. List of all customer identification values, with a count of rentals they have made all-time, with your highest volume customers at top of list

SELECT
customer_id,
COUNT(rental_id) AS number_of_rentals
FROM rental
GROUP BY
customer_id
ORDER BY
COUNT(rental_id) DESC;

Project Part 2

The Scenario

You and your business partner were recently approached by another local business owner who is interested in purchasing Maven Movies. He primarily owns restaurants and bars, so he has lots of questions for you about your business and the rental business in general. His offer seems very generous, so you are going to entertain his questions.

The Letter

Dear Maven Movies Management,

I am excited about the potential acquisition and learning more about your rental business. Please bear with me as I am new to the industry, but I have a number of questions for you. Assuming you can answer them all, and that there are no major surprises, we should be able to move forward with the purchase.

Best,

Martin Moneybags

The Goal

Use MySQL to:

Leverage your SQL skills to extract and analyze data from various tables in the Maven Movies database to answer your potential Acquirer’s questions. Each question will require you to write a multi-table SQL query, joining at least two tables.

The Business Questions

  1. My partner and I want to come by each of the stores in person and meet the managers. Please send over the managers’ names at each store, with the full address of each property (street address, district, city, and country please).
  2. I would like to get a better understanding of all of the inventory that would come along with the business. Please pull together a list of each inventory item you have stocked, including the store_id number, the inventory_id , the name of the film, the film’s rating, its rental rate and replacement cost.
  3. From the same list of films you just pulled, please roll that data up and provide a summary level overview of your inventory. We would like to know how many inventory items you have with each rating at each store.
  4. Similarly, we want to understand how diversified the inventory is in terms of replacement cost. We want to see how big of a hit it would be if a certain category of film became unpopular at a certain store. We would like to see the number of films, as well as the average replacement cost, and total replacement cost, sliced by store and film category.
  5. We want to make sure you folks have a good handle on who your customers are. Please provide a list of all customer names, which store they go to, whether or not they are currently active, and their full addresses street address, city, and country.
  6. We would like to understand how much your customers are spending with you, and also to know who your most valuable customers are. Please pull together a list of customer names, their total lifetime rentals, and the sum of all payments you have collected from them. It would be great to see this ordered on total lifetime value, with the most valuable customers at the top of the list.
  7. My partner and I would like to get to know your board of advisors and any current investors. Could you please provide a list of advisor and investor names in one table? Could you please note whether they are an investor or an advisor, and for the investors, it would be good to include which company they work with.
  8. We’re interested in how well you have covered the most awarded actors. Of all the actors with three types of awards, for what % of them do we carry a film? And how about for actors with two types of awards? Same questions. Finally, how about actors with just one award?

The Analysis and Results

Full SQL code can be found here.

This part of the project involves more complex queries including cases and joins.

Here are the SQL statements and results:

  1. My partner and I want to come by each of the stores in person and meet the managers. Please send over the managers’ names at each store, with the full address of each property (street address, district, city, and country please).
-- 1. Managers’ names at each store, with the full address of each property (street address, district, city, and country).  

SELECT
staff.first_name AS manager_first_name,
staff.last_name AS manager_last_name,
address.address,
address.district,
city.city,
country.country
FROM store
LEFT JOIN staff ON store.manager_staff_id = staff.staff_id
LEFT JOIN address ON store.address_id = address.address_id
LEFT JOIN city ON address.city_id = city.city_id
LEFT JOIN country ON city.country_id = country.country_id
;

2. I would like to get a better understanding of all of the inventory that would come along with the business. Please pull together a list of each inventory item you have stocked, including the store_id number, the inventory_id , the name of the film, the film’s rating, its rental rate and replacement cost.

-- 2. List of each inventory item you have stocked, including the store_id number, the inventory_id, the name of the film, the film’s rating, its rental rate and replacement cost. 

SELECT
inventory.store_id,
inventory.inventory_id,
film.title,
film.rating,
film.rental_rate,
film.replacement_cost
FROM inventory
LEFT JOIN film
ON inventory.film_id = film.film_id
;

3. From the same list of films you just pulled, please roll that data up and provide a summary level overview of your inventory. We would like to know how many inventory items you have with each rating at each store.

-- 3. From the same list of films you just pulled, provide a summary level overview of inventory. How many inventory items you have with each rating at each store. 

SELECT
inventory.store_id,
film.rating,
COUNT(inventory_id) AS inventory_items
FROM inventory
LEFT JOIN film
ON inventory.film_id = film.film_id
GROUP BY
inventory.store_id,
film.rating
;

4. Similarly, we want to understand how diversified the inventory is in terms of replacement cost. We want to see how big of a hit it would be if a certain category of film became unpopular at a certain store. We would like to see the number of films, as well as the average replacement cost, and total replacement cost, sliced by store and film category.

-- 4. Number of films, as well as the average replacement cost, and total replacement cost, sliced by store and film category. 

SELECT
store_id,
category.name AS category,
COUNT(inventory.inventory_id) AS films,
AVG(film.replacement_cost) AS avg_replacement_cost,
SUM(film.replacement_cost) AS total_replacement_cost
FROM inventory
LEFT JOIN film
ON inventory.film_id = film.film_id
LEFT JOIN film_category
ON film.film_id = film_category.film_id
LEFT JOIN category
ON category.category_id = film_category.category_id
GROUP BY
store_id,
category.name
ORDER BY
SUM(film.replacement_cost) DESC
;

5. We want to make sure you folks have a good handle on who your customers are. Please provide a list of all customer names, which store they go to, whether or not they are currently active, and their full addresses street address, city, and country.

-- 5. List of all customer names, which store they go to, whether or not they are currently active, and their full addresses – street address, city, and country. 

SELECT
customer.first_name,
customer.last_name,
customer.store_id,
customer.active,
address.address,
city.city,
country.country
FROM customer
LEFT JOIN address ON customer.address_id = address.address_id
LEFT JOIN city ON address.city_id = city.city_id
LEFT JOIN country ON city.country_id = country.country_id
;

6. We would like to understand how much your customers are spending with you, and also to know who your most valuable customers are. Please pull together a list of customer names, their total lifetime rentals, and the sum of all payments you have collected from them. It would be great to see this ordered on total lifetime value, with the most valuable customers at the top of the list.

-- 6. List of customer names, their total lifetime rentals, and the sum of all payments you have collected from them. Ordered on total lifetime value, with the most valuable customers at the top of the list. 

SELECT
customer.first_name,
customer.last_name,
COUNT(rental.rental_id) AS total_rentals,
SUM(payment.amount) AS total_payment_amount
FROM customer
LEFT JOIN rental ON customer.customer_id = rental.customer_id
LEFT JOIN payment ON rental.rental_id = payment.rental_id
GROUP BY
customer.first_name,
customer.last_name
ORDER BY
SUM(payment.amount) DESC
;

7. My partner and I would like to get to know your board of advisors and any current investors. Could you please provide a list of advisor and investor names in one table? Could you please note whether they are an investor or an advisor, and for the investors, it would be good to include which company they work with.

-- 7. List of advisor and investor names in one table, note whether they are an investor or an advisor, and for the investors, it would be good to include which company they work with. 

SELECT
'investor' AS type,
first_name,
last_name,
company_name
FROM investor
UNION
SELECT
'advisor' AS type,
first_name,
last_name,
NULL
FROM advisor;

8. We’re interested in how well you have covered the most awarded actors. Of all the actors with three types of awards, for what % of them do we carry a film? And how about for actors with two types of awards? Same questions. Finally, how about actors with just one award?

-- 8. Of all the actors with three types of awards, for what % of them do we carry a film? And how about for actors with two types of awards? Same questions. Finally, how about actors with just one award? 

SELECT
CASE
WHEN actor_award.awards = 'Emmy, Oscar, Tony ' THEN '3 awards'
WHEN actor_award.awards IN ('Emmy, Oscar','Emmy, Tony', 'Oscar, Tony') THEN '2 awards'
ELSE '1 award'
END AS number_of_awards,
AVG(CASE WHEN actor_award.actor_id IS NULL THEN 0 ELSE 1 END) AS pct_w_one_film
FROM actor_award
GROUP BY
CASE
WHEN actor_award.awards = 'Emmy, Oscar, Tony ' THEN '3 awards'
WHEN actor_award.awards IN ('Emmy, Oscar','Emmy, Tony', 'Oscar, Tony') THEN '2 awards'
ELSE '1 award'
END

Conclusion

After analyzing the data in the tables and obtaining the requested information, we can see that:

  1. Store 2 has more inventory than Store 1, but Store 1 has more active customers.
  2. Both stores have the largest number of films with a PG-13 rating.
  3. Sports films comprise the highest number of films in Store 2, while Action films comprise the most films in Store 1.
  4. We also determined the various costs of the films if they needed to be replaced, and learned about the customers and their rental habits in each of the stores.

Hopefully, the answers we provided will help the insurance company confidently renew our policy, and the new business prospect will be interested in acquiring the business.

Thank you for reading my project!

Check out GitHub for the full SQL statements, and let me know if you have any comments.

Follow me on LinkedIn to see more data posts and projects, or view my data portfolio here.

--

--

Sarah Rajani

Data Analyst - Proficient in Excel, SQL, and Tableau. Turning data into actionable insights. LinkedIn: @sarahrajani1