Exploring the AdventureWorks Database

Let’s go on an adventure into the amazing world of using the AdventureWorks Database!

Zaynab Awofeso
Learning SQL
8 min readMar 6, 2023

--

image from unsplash

I recently discovered the AdventureWorks database thanks to a friend, and I have been absolutely loving it! This database is a fantastic resource for anyone who wants to practice their SQL skills, as it contains just about everything you could want to test or experiment with.

In this article, I’ll be showing you how to install the latest version of the AdventureWorks sample database — AdventureWorks2019. We’ll be taking a deep dive into the database together, analyzing it, and even answering some questions along the way.

If you’re new to AdventureWorks or just love practicing your SQL skills, you won’t want to miss this article. And if you’re the type of person who enjoys trying out new things, you’ll definitely want to stick around until the end!

Before we get started, let’s understand some terms that will appear frequently in this article.

This is important, trust me.

· Database — simply a place to store data.

· SQL — or Structured Query Language is a language to interact with databases.

· Microsoft SQL Server — relational database management (RDBMS) system developed by Microsoft.

· EDA — or Exploratory Data Analysis is the first step in the data analysis process. It is all about making sense of the data you have and figuring out what questions you want to ask and how to frame them, as well as how to best manipulate your available data to get the answers you need.

· AdventureWorks Database — Microsoft product sample for an online transaction processing database. It supports a fictitious, multinational manufacturing company that sells bicycles and cycling accessories.

NOTE: A prerequisite to follow this article is to have SQL Server installed. If you don’t, follow this guide to install it and don’t forget to return to this article.

Setting up the AdventureWorksDW 2019

We will be working with the AdventureWorksDW2019 database. There are 2 ways to install the sample database:

  1. Using Scripts
  2. Restoring the backup file (*.bak) file

I used the second approach because I find it to be more straightforward. You can download the backup file here.

Browse to the section named “Download backup files”, click on any file you want to download and save it in your target location. Below I downloaded AdventureWorksDW2019.bak and I’ll suggest you do the same so you can follow the article properly.

AdventureWorksDW2019 backup file

The next step is to restore this backup file to your SQL server instance. To proceed, you need to transfer the backup file with a ".bak" extension to your designated backup location in the SQL server instance. The exact location may differ based on factors such as the version of SQL Server, instance name, and installation location. For instance, for SQL Server 2019's default instance, the standard location is: C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\Backup.

Next, launch SQL Server Management Studio (SSMS) and connect to the SQL Server instance where you would want to install the sample database. Right-click on “Databases” and click “Restore Database…”

The window below will open, select the backup file by clicking the Device radio button, click , then Add, then select the backup file.

Now, click the “Files” tab to ensure the database file and path are correct. Once done, click OK to start the restore process.

By now you most likely want to get to the part where we write and execute queries, but… a drawback of AdventureWorks is that the data is not up to date.

You probably can tell from the name of the database but run the query below to check the last date in the database.

NB: Before you execute a new query, you have to activate the required database. In our case,

Activate AdventureWorksDW2019
View last date in the database
Last date in the database

From the result, you can see that the data in the database stops in 2019. I will be updating the database to the current year because I want to perform further analysis on the data and because it is nice to have the updated data.

Updating the database is easy. To update the database, download this script from Github. Once downloaded, open it in your SQL Server Management Studio and Execute it. It should update the data in your database to the latest year, in my case — 2023. To confirm that it worked, execute the query above again. Now the last date should be the last day of the current year.

Last date after updating database

Exploring the database

We are finally here!

Let’s assume I just successfully installed the database, the first thing I would want to do is to inspect the tables it contains, right?

To do this, I can expand the Tables folder, but there’s also a way to list all the tables using SQL commands. SQL Server provides a set of built-in views called INFORMATION_SCHEMA that contain metadata about the database objects, such as tables, columns, indexes, and constraints. By querying these views, you can retrieve information about the structure and properties of the database.

Execute the query below to list all the tables in the database,

You may notice that there are 4 columns returned when you execute the query above. You may also notice that there are two types of tables, the actual table (BASE TABLE) and the view (VIEW). But what we want is a list of the actual table names. Executing the query below should give us that.

List of tables in the database.

Next let’s check the total number of tables.

The following output gives the count of all the tables

Now that we have seen how many tables are in the database, let us dig deeper into the data it contains. We cannot explore each table in this article because there are 31 tables.

So, let’s take the Customer Table as a sample. In the data model of AdventureWorksDW the dimension table DimCustomer contains data about the customers.

Exploring the DimCustomer Table

Expand the DimCustomer table and then expand its columns folder to see the columns in the table. Each column has a name, a data type, an indication of whether it can contain null values, and in some cases an indication that the column is used as a primary key (PK) or foreign key (FK).

Let’s have a look at the customer data.

This query will retrieve all the data from the table.
This query returns all the rows and columns present in the table.

Next, let’s get the shape of the data.

This query gives us the number of columns in the dataset.
29 columns
The query gives us the number of rows in the dataset.
18484 rows

So there are 29 columns and 18484 rows in the DimCustomer table.

When performing EDA, it is important to inspect each column and check for missing values, outliers, or any spelling mistakes present in the data which can hinder the analysis. For example, let’s inspect the occupation column — EnglishOccupation.

The EnglishOccupation column is a categorical column. And the best way to explore categorical columns is to first find the unique values which will help us to narrow down whether there are any errors in spelling or not.

Let’s find out how many unique occupations are in the data.

This query gives us the unique occupations of the customers.

Let’s check for missing values

This query checks for missing values in the EnglishOccupation column.

There are no errors and missing values present in this column.

Other things you can try asides finding missing values and spelling errors are locating outliers and finding duplicates in your data.

There are many benefits of conducting exploratory data analysis. It helps you organize your dataset, it helps you make assumptions and predictions about your dataset and it can also help you make decisions before you model your data.

Feel free to explore the other tables in the database. The Sales data, Product data, Employee data and many more. Just have fun!

So, we have had the chance to explore the database a bit, it is time to dig deeper by answering some questions.

Q1. What is the total Sales Amount?

SUM () sums up the sales. So, the total sales amount is $29358677.2207

Q2. What are the total number of products?

Number of products are 504. ‘Distinct’ keyword helps to know unique values.

Q3. What are the number of products in each category? — i.e., frequency of products?

35 products are Accessories, 125 products are Bikes, 48 products are Clothing and so on. The above statement also provides the information on which product category has got more products i.e., highest number of products are Components.

Q4. Total order quantity?

Using SUM () function to fetch total order quantity.

Q5. Top 10 customers with the most sales?

The above statement joins the customer and the sales table on the customerkey column and fetches the top 10 customers with the highest sales. Turner Jordan has the most sales.

Q6. Total Profit?

The statement subtracts the total product cost from the total sales amount. The total profit is $12080883.645

Q7. Year with the highest sales?

The year with the highest sales is 2022.

Q8. Top 5 employees with most sales?

Similar to question 5 query but this statement gets the top 5 employees with the highest sales. Tsoflias Lynn has the most sales.

Q9. Sales Territory Region with the most sales?

Similar to question 3 also. Australia has the most sales.

Q10. Customer with the highest order?

The statement joins the sales and customer table on the customerkey column and fetches the customers with the highest order. Henderson Ashley has the highest order.

Q11. Frequency of customer order quantity?

The 2 items are ordered the most, followed by 3 and 4 and so on.

Q12. Rank Customers based on Total Sales?

The above statement joins the customer and the sales table on the customerkey column and ranks sales greater than 10000 as diamond, sales between 5000 and 10000 as gold, sales between 1000 and 5000 as silver and sales below 1000 as bronze.

Q13. Average sales per customer?

Adams Aaron’s average sale is $20.49, Adams Adam’s, $70.99 and so on.

Thank you for reading this article. There is so much you can do with the AdventureWorks database. From data analysis to data cleaning, there are endless opportunities to explore and expand your knowledge.

Your support means a lot to me and puts a smile on my face. Thank you once again!

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--

Zaynab Awofeso
Learning SQL

I like to simplify complex topics and foster growth through words.