Run the Parch & Posey DB Locally in SQLite

gauravinthevalley
Udacity Technology Scholarship
7 min readFeb 16, 2020

One of the goals I’ve set up for myself in the Bertelsmann Tech Scholarship Challenge Course — Data Track Nanodegree Program in Udacity, is that by the end of the course, I would like to build a basic dashboard using Flask and some basic Javascript, hosted on Heroku. My plan for the dashboard is to display analyzed-data from the Parch & Posey DB, which is the database used in the course to teach SQL. Since I’m new to building backends with Flask (I’m more comfortable with Node.js), I’ve been following this series from Corey Schafer in YouTube to teach myself Flask — he is pretty good. The front-end Javascript part is something I can manage myself. I plan to use React to build it.

Corey Schafer starts you off with SQLite with the intent to switch to Postgres later in the course. Because of this reason, I had to run the Parch & Posey DB in SQLite. Though I’m sure there are better and more efficient ways to run the Parch & Posey DB in SQLite, here’s what I did.

Install SQLite3

This page is your friend. If you’re on a Mac, I highly recommend you use Homebrew to install SQLite3. If you’re on Linux, depending upon your flavor of it, you can use the package manager that comes bundled with your flavor to install SQLite3.

Once installed, I recommend you play with the CLI for a bit to get comfortable with SQLite3. The CLI is pretty easy to get a handle on.

Create the Parch & Posey DB

As you already saw here, to create a DB in SQLite3, all you have to do is run the following command on your terminal:

sqlite3 parchandposey.db

This command creates a literal parchandposey.db file inside the directory from where you ran the command above; and you are also redirected to the SQLite3 CLI.

This bit may be tricky for beginners. For instance, it took me about 5 minutes of figuring out why the .databases command inside the CLI was not displaying any databases for me: it was because, I was in a different directory than where the parchandposey.db resided. So ensure that you are in the directory you intend to be in while creating your database.

Inside the SQLite3 CLI

Like I mentioned above, running .databases will display the literal parchandposey.db file created, like so:

Alright, like we did for Postgres, the first step is to create your tables. Let’s get on with creating the schema and importing the CSVs one by one.

Create and Import region Table

I’m assuming you already have all the CSVs required. If you don’t, you can get them from here. Downloading the CSVs to your current directory will make things easy (you don’t have to type the entire CSV path when importing, for instance). Please ignore the .R files in that Github repo.

There’s actually another way to procuring the CSVs: you can download the data from the course itself. From your workspace, if you SELECT * from Region, you can download the CSV for the Region table.

Download CSV from Workspace

No matter which method you go with to procure the CSVs, be sure to download and store them to your current directory.

When importing CSVs, there are generally 2 cases to consider: whether you want to import the headers, or you do not want to import the headers. In our case, because we will be creating schemas for all of our tables beforehand, we do not want to import the headers. SQLite is pretty specific here. They clearly mention in their documentation that if during import, table does not exist, it will treat the first row as headers and the rest as data. But if a table does exist (which will be our case), it will treat the first row also as data.

The point above also means that we will have to manually delete the 1st line of the CSV - the one with headers. Why? Again, because we will create our tables’ schema first and then import the CSVs.

Open your region.csv (bear in mind that you could have named this file anything; for clarity, I named mine region). Here’s how mine looks like:

Region with Header — region.csv

Check out that first row! “id” and “name” will be the names of our columns in soon-to-be-created region table. If we wouldn’t be creating the region table manually and directly importing this file into SQLite3, we wouldn’t have to delete this row. But in our case, we do so let’s remove the headers:

Region with No Header — region-no-header.csv

Once the first row (header) is deleted, I saved the the file as region-no-header.csv. This is the file we will be importing; not before creating the region table first though.

Run the following CREATE TABLE statement to create the table:

CREATE TABLE region(id integer NOT NULL,name text NOT NULL);
SQLite3 — Create region table

As you can see in the screenshot above, you can verify that the region table was actually created using the friendly .tables command.

If you run the .schema region command, you can also check the schema of your table, like so:

SQLite3 — View Schema of Region table

Let’s actually get to importing the region CSV now, shall we?

First off, you will need to change the mode of SQLite. Run the .mode command you check your current mode.

SQLite3 — View your Mode

Change the mode from ‘list’ to ‘CSV’ with .mode CSV:

SQLite3 — CSV Mode

Very noice! Since you already have the CSV downloaded, header removed, and saved in the current directory, all you now have to do is run the following command to import:

.import region-no-header.csv region

SQLite3 — Import region CSV (no header)

Apart from importing region CSV above, I have also verified that the data was actually imported with a simple SELECT statement. The data in region table is not that many, so I decided to SELECT * FROM region; above. For other tables, we will use the COUNT(*) function to verify imports.

Create and Import sales_reps Table

Ensure that the mode is still csv. This applies to the rest of this guide.

Also ensure that your .CSV for sales_reps is downloaded and stored in your current directory. I have named mine sales_reps.csv. Like we did for region.csv, we will also remove the headers manually for sales_reps. This also applies to the rest of the .CSVs on this guide.

Now, let’s create the sales_reps table and verify that it was actually created. Run the following CREATE TABLE statement to create the table:

CREATE TABLE sales_reps(id integer NOT NULL,name text NOT NULL,region_id integer NOT NULL,FOREIGN KEY(region_id) REFERENCES region(id));

Verify that the table was actually created:

SQLite3 — Create and Verify sales_reps Table

Okay, you know the drill! Let’s import the .CSV without headers for sales_reps. Mine is named sales_reps-no-header.csv so I will run the following command to import it into my newly creates sales_reps table in SQLite3.

.import sales_reps-no-header.csv sales_reps

SQLite3 — Import sales_reps CSV (no header)

There you have it! Imported and verified.

Create and Import accounts Table

From here on out, I will just post the CREATE TABLE script. Here we go:

CREATE TABLE accounts(id integer NOT NULL,name text NOT NULL,website text NOT NULL,lat text,"long" text,primary_poc text,sales_rep_id integer NOT NULL,FOREIGN KEY(sales_rep_id) REFERENCES sales_reps(id));
SQLite3 — Import accounts (no header)

Imported and verified! Let’s move on.

Create and Import orders Table

Run the following CREATE TABLE script to create the orders table first:

CREATE TABLE orders(id integer NOT NULL,account_id integer NOT NULL,occurred_at text,standard_qty integer,gloss_qty integer,poster_qty integer,total integer,standard_amt_usd real,gloss_amt_usd real,poster_amt_usd real,total_amt_usd real,FOREIGN KEY (account_id) REFERENCES accounts(id));
SQLite3 — Import orders (no header)

Create and Import web_events Table

Let’s create the web_events table with:

CREATE TABLE web_events(id integer NOT NULL,account_id integer NOT NULL,occurred_at text,channel text,FOREIGN KEY (account_id) REFERENCES accounts(id));
SQLite3 — Import web_events (no header)

All done! We’ve created and imported all tables in the Parch & Posey DB. I haven’t explained the CREATE TABLE queries in this post. I’m no expert but I’m assuming you are already familiar with schema design, etc. If you aren’t, you probably aren’t reading this. If you aren’t but did make it this point, there are plenty of resources online to help you out — and they are probably much better than what I will ever come up with. Good luck!

Before we part, let’s run a query from the course:

SELECT r.name region, a.name account, o.total_amt_usd/(o.total + 0.01) unit_price
FROM region r
JOIN sales_reps s
ON s.region_id = r.id
JOIN accounts a
ON a.sales_rep_id = s.id
JOIN orders o
ON o.account_id = a.id
WHERE o.standard_qty > 100 AND o.poster_qty > 50
ORDER BY unit_price;

This one is from Lesson 27, Section 20, Number 5. Go ahead and run it.

Isn’t that something? 😄

Happy querying!

--

--