Learning the Basics of SQL with Santa’s Naughty or Nice List
This is Hour 1 of 12 Hours of Kosmas, our small agency’s marathon of mini research spikes each lasting….an hour! As it’s the season of giving, we’ve decided to write-up about each with a step-by-step, simple project to show a proof of concept. Ho ho ho!
It’s been awhile since I’ve had to write simple queries so I took an hour to read through some SQL docs and created a simple scenario to prove I knew the basics: Due to declining forests in the North Pole, Santa has decided to finally digitize his list. He’s asked his new database admin elf to create a database with a few tables and to analyze some data within each.
And that’s where our story starts…
Creating Your First Database
Note: This tutorial is meant as an introduction to SQL. We took some shortcuts here that you wouldn’t typically do for production queries or database designs. The goal of this article is to get people playing around with the concepts. Not as a university-level course.
For this tutorial, we’re focusing on Mac users because OSX comes with SQLite 3 already installed so no wasted time in downloading and installing software. Hooray!
To start, let’s open our Terminal. If you can’t find yours, go to the top right corner of your window where you’ll find a magnifying glass icon (Spotlight). Click it and type Terminal. The Terminal is where we’ll write all our code.
Normally you wouldn’t use this method to create a local database, but it’s the easiest way for us to get started.
Let’s start by typing:
into your Terminal.
cd stands for Change Directory. In simple terms, it means we’re accessing our desktop folder from the Terminal.
Note: Please write out the commands. If you copy and paste, you might get an error in your Terminal which won’t let you execute the code. That’s because the Terminal won’t recognize the quotes that Medium uses. Plus, writing it out yourself helps you memorize faster.
Note Pt 2: If you forget the ; at the end of your code you may see ..> and will be stuck. Type out .help to see possible commands or .quit to exit SQLite3 and start again.
mkdir stands for Make Directory and this means we’ve just created a folder on our desktop called naughtyornicelist. Go check it out if you don’t believe me!
Now let’s access our new folder by typing:
Alright, now to build our database. It’s going to be very complicated so try to follow. Let’s start by typing:
And we’re done! You just created your first database. It’s that simple (I know, I know, I lied earlier…).
What did we do? The SQLite3 part of our statement told our computer that we’re looking for a database (db) called children within our folder. Because there isn’t one there, SQLite 3 is smart enough to create one for us.
So there you have it. But a database without data is like Rudolph without his red nose: useless. So let’s create our first table.
Creating Your First Table
You can imagine a database like an Excel file and tables are like Excel sheets on various tabs. You generally want one table for each category of something. In our case, let’s create a table to keep track of all the children in the world. Except we’re going to pretend there are only three… #AggressivePopulationControl
To create a table type:
CREATE TABLE children (id int, firstname string, lastname string, location string);
Ok what did we do here? The first part is really simple. We created a table called children.
Note: It’s typical in SQL queries that you uppercase all SQL commands so you know the difference between a command and data and names. We’ll follow that style here for best practice.
Everything inside the parenthesis are the columns inside of our table. Again, it’s a similar concept to using the top row of an Excel sheet to identify column names.
Each column needs a name and type to tell the database what type of data will be in that column. Our first pair is id (which is the name of our column) and int which stands for integer. An integer is a whole number: 1, 2, 5235235, 230581039503.
Our second, third, and fourth columns are called firstname, lastname, and location, and they hold string data. A string is simply a series of letters (and numbers, but you won’t be able to do math with string numbers).
And finally, you have to have a semicolon at the end of your statement so the Terminal knows you’ve executed a command. When you get into complex queries, sometimes it helps to have pieces on different lines. Hence the allowance of hitting ENTER without executing the code you’re writing. Don’t forget your semicolons!
To test that you did the above properly you can type:
A schema is the blueprint of your database and if you’ve executed the code properly up to this point, it should return the information in your database. In our case, that’s simply the column names and types.
But that’s about to change.
Inserting Data into Your Table
Let’s start inserting data into our database!
INSERT INTO children (id, firstname, lastname, location) VALUES (1, “josh”, “davis”, “usa”);
We used the prompt INSERT INTO children to tell SQLite 3 we’re inserting data into our database called children. Then the parenthesis tells our prompt which columns in the database we want to insert our data.
The VALUES keyword then tells SQLite 3 that we’re ready to insert the data. 1 will go under the id column, josh will go under firstname, davis under lastname, and usa under location. Simple, right?
Three things to note here:
- The values need to be in the same order as the columns headers.
- All strings need quotation marks around the text to identify them as strings.
- You need to put the correct value type in the correct column. A string in id will produce an error.
Now let’s write our first proper query.
Basic SQL Queries
SELECT * FROM children;
As you can see, your first row of data appears. SELECT in SQL means exactly what it means in English: you choose the data you want. The * means everything we have in the table and because we only have one row of data, our first row is the only thing that appears. Then FROM means where you want the data to come from. In our case we’ve chosen the table children.
Let’s add more records into our database so we can do more advanced queries.
INSERT INTO children (id, firstname, lastname, location) VALUES (2, “george”, “bartz”, “usa”), (3, “sabrina”, “yen”, “taiwan”);
And make sure you inserted the data properly by typing:
SELECT * FROM children;
Now we can get more specific about the data we want to pull by changing the * for a specific column.
SELECT firstname FROM children;
If you did that properly, you should only see the first names for each of our records: josh, george, and sabrina.
Now let’s add a bit of logic.
SELECT firstname FROM children WHERE location=”usa”;
That query will result in josh and george being pulled from the database because they are the only two located in usa.
Querying Across Two Tables
Now to create a new table so we can write more complex SQL queries.
Each table in a database is meant to focus on a subject. Our first table (children) houses all the information about our children (name and location). Now we want to create a new table to understand whether the children are on the nice or naughty list and what they want from Santa.
CREATE TABLE masterlist (id int, naughtyornicelist string, present string);
You should be familiar with what we just did but to review: we used CREATE TABLE to create a new table that we called masterlist. Then we told the database that it’s going to have three columns we’re calling id, naughtyornicelist, and present. And in order, the values they’ll hold are integer (int), string, and string.
Note: Why do we have id in both tables? This is a common pattern to ensure that you can query the same thing across multiple tables. In this case we may want to check what present josh wants. So all we need to do is query across both tables using josh’s id.
You may ask why we can’t just query josh’s name across both tables. That’s fine for our short example but we know that there are thousands of people with the name josh in the world so you’re bound to get back wrong data if you only relied on josh as an identifier. So an id number is the most effective way of keeping your database in order.
Now let’s insert some data into our new table.
INSERT INTO masterlist (id, naughtyornicelist, present) values (1, “naughty”, “train”), (2, “nice”, “horse”), (3, “naughty”, “horse”);
Again, nothing here should be new. We simply inserted three sets of data into our masterlist table.
SELECT * FROM masterlist;
to confirm you see all the records.
Now let’s say Santa wants to know of all the children in his list, which ones are naughty and which are nice. How would we do that?
Let’s walk through the process of doing this. We need to tell our database:
- What tables we’re going to query.
- That we’re going to combine data across two tables because one table has the children’s name and the other has whether they’re naughty or nice.
- Which columns from which tables we need information from.
- Which column connects a child across tables (id in our case).
To do this type:
SELECT children.firstname, masterlist.naughtyornicelist FROM children INNER JOIN masterlist ON children.id=masterlist.id;
Now let’s walk through what we just did.
First we selected which columns we needed and from which tables. For example: children.firstname says we want data from the children table and data from the column firstname inside of that table.
Then we wrote FROM children INNER JOIN masterlist which says we’re going to pull those columns from a join on the tables children and masterlist. A join is simply an instance of combining tables for a specific query so you can pull data across tables.
Note: There are different kinds of joins, but that’s beyond this tutorial.
Finally, ON children.id=masterlist.id said that we will connect the two tables by the commonality of id. In other words, we needed this because if we combine both tables, we need to know how to map the two together so the data aligns properly.
In our example, it’s easy because id is simple. But imagine if we had a database of all the joshs in the world and Santa wants to know how many joshs are on the naughty or nice list. The commonality there will be the firstname and we’d only pull a few specific rows, not everyone like in our example.
And to illustrate a bit more advanced logic, let’s say that Santa is now asking his database admin elf to pull a list of how many naughty kids he has in the usa.
To do that we’ll type:
SELECT COUNT(*) FROM children INNER JOIN masterlist ON children.id=masterlist.id WHERE children.location=”usa” AND masterlist.naughtyornicelist=”nice”;
Before reading the breakdown of what we just did, try to understand it yourself using some of the principles you know and just reading this query like an English sentence.
Tried that approach and still couldn’t figure it out? Let’s explain what happened.
SELECT COUNT(*) — SELECT is easy but what about COUNT()? COUNT is a function which means it does something. In our case it’ll literally count how many records our query produces (1, 2, 3, 4, etc).
FROM children INNER JOIN masterlist ON children.id=masterlist.id is easy: the query is saying we’re joining two tables together (children and masterlist) based on the common field id.
WHERE children.location=”usa” AND masterlist.naughtyornicelist=”nice” gets fancy, but not complicated. This part of the query is simply saying we only want to pull records where the column children.location is equal to “usa” (so in our example, not Taiwan) AND (so it has to work for both pieces of logic) the column naughtyornicelist is equal to nice.
So the only record that has both location equal to usa AND naughtyornicelist equal to nice is george. So that’s why our query resulted in 1.
If you’ve made it this far, first congratulations on reading that novel of an introduction. And second, now you have a few basic tools that’ll help you on your way to understanding SQL and database.
Let’s review what you learned:
- How to create a database and tables
- How to add data to tables
- Basic querying commands
If you want to get crazy, instead of just outputting the count of our final query, try to output the name of the person. So your result should be george.
If you have any questions, please feel free to post them below and we’ll happily answer them. Found any typos? Let us know those so we can correct them. Interested in digging deeper into this topic? We recommend Codecademy’s free course on SQL.
If you enjoyed this tutorial, check out 12 Hours of Kosmas for more tutorials on other topics including design and coding!