SQL Querying 101
What is SQL and why is it useful?
If you want to build full stack applications, it’s important to create apps that use data from databases. Databases store simple variables and data structures in memory. Databases can store varied amounts of information like the user, their followers, or their favorites. Learning how to query data from databases is not just for data scientists, it’s an important skill for developers too!
There are generally two types of databases to work with: SQL or NoSQL. I will be going over basic SQL querying statements. SQL stands for “Structured Query Language” and is the standard language used for communicating with relational databases like Postgres or Oracle. Think of them being like excel sheets where data is stored in the form of columns and rows. Columns hold the value and type, whereas rows hold the raw data. Relational databases are called so because they are used to define relationships in the form of join tables. Joining tables allow us to relate pieces of data to each other. The tables are linked through ids called primary and foreign keys.
Let’s say you’re trying to build an app for students to track their grades. We can think of students and grades as “models”. We want to establish a relationship between students and their grades. When we store data about one student, we want to give them a unique identifier or id — this is known as the primary key. Our student Alyssa can easily be looked up in the database by her id#194. Grades wouldn’t exist without students, so there is a relationship between the two models. Since grades “belong to” students, the grade table will have a column for the student it belongs to — this relationship is joined by the foreign key of the student. When we look the grades of 94%, we can see an id#194 is listed, so we know Alyssa has been studying hard. These primary and foreign keys can be established when creating a schema (table blueprint).
As you might guess, NoSQL is a non-relational database. They don’t use tables with primary/foreign keys to establish relationships between models. In fact, they don’t even use tables to store data. They are able to use many different stores like key-value pairs or even graph stores. The values can be any type of binary object. NoSQL databases tend to allow for easier scaling because they scale horizontally. This is why many companies that deal with manipulating real-time data prefer using NoSQL databases.
SQL Querying
On to what you came here for. Here is a cheatsheet for some common SQL queries you can use. SQL query statements start with specific keywords such as SELECT
, INSERT
, CREATE
, UPDATE
, and DELETE
. We can string these keywords with pieces of data from the model tables to abstract the values that we want in the form of new tables. Values can be stored in various datatypes. Explore the different types here: https://www.w3schools.com/sql/sql_datatypes.asp
Join Tables — “Select…from…join…on…”
We need unique values to relate rows in one table to another, à la primary keys. Say we have two tables of Animal(s) and their Diet(s) and we want to create a new join table of animals that eat fish.
We can query those values like so:
SELECT animals.names, animals.species, diet.food FROM animals JOIN diet ON animals.species = diet.species where food = 'fish';
SELECT
is followed by three column names (the names of animals and their species and the foods they eat)FROM
the Animal table. We are selecting the values from those three columns in the Animal table to join those values with the Diet table where the species are the same and also the species who eat fish.
“Select…where” Statements — get more specific data
We can use SELECT...WHERE
statements to abstract more specific data. This is called a restriction since we are restricting the values we want. Used along with comparison operators (=, <, >, <=, >=, !=, and, or, not), we can write the same query in many different ways.
Let’s select from our Animal table where the animal is named “Jim” and the species is “gorilla”. We can write this query like so:
SELECT name FROM animals WHERE name = 'Jim' and species = 'gorilla'
Looking at the Animal table above, we can see there is indeed an animal named “Jim” but we will get a null value because we queried for an animal that’s named “Jim” and is a “gorilla”. As we see in the above Animal(s) table, there are no species of gorillas listed. The correct query for Jim would be:
SELECT name FROM animals WHERE name = 'Jim' and species = 'human'
Since our Animal(s) table above only has one Jim in general and one species of human, we could’ve omitted one or the other.
Select Clauses
We saw that we’re able to write restrictions in our queries with the WHERE
keyword. But here are some other keyword modifiers that can be used in conjunction with SELECT
to manipulate the output.
LIMIT count OFFSET skip
—this clause allows us to output several pages of results. It takes in two numbers, wherecount
tells how many rows to return andskip
tells how far in the results to start. An example queryLIMIT 10 OFFSET 50
would return 10 rows starting with the 51st row. In this caseOFFSET
could be used optionally. Clauses give great flexibility.ORDER BY column1, ... DESC
— this clause allows us to sort the results. The column names you want to output should follow the clause and be separated by commas. You can sort in descending or ascending order by appendingDESC
orASC
after the column names. An example query would beORDER BY name DESC
.GROUP BY column
— this clause is used in conjunction with aggregations such ascount
orsum
to manipulate the return data. We append the column that we want to perform the aggregation upon. An example query would beSELECT species, min(birthdate) FROM animals GROUP BY species;
. This query selects the birthdays of all the animals and groups them from oldest to youngest by their species.
Adding Rows — create new rows into existing tables
We can also add new rows to existing tables by using the INSERT
keyword. To add a row, all we need to write is INSERT INTO table VALUES ("Sam", "rabbit", "2020-08-10")
where the values are added in the parentheses following the VALUE
keyword. In the case that the values are in the same order as the columns we can write a more specific query like so: INSERT INTO animals (species, name, birthdate) VALUES ("rabbit", "Sam", "2020-08-10")
.
“Having” clause — perform restrictions on the results
The HAVING
clause is similar to the WHERE
clause, in that it performs a restriction. WHERE
restricts the table it will draw data from, while HAVING
restricts the return values. It filters the group rows created by GROUP BY
and both HAVING
and GROUP BY
keywords must be used together. An example syntax would be:
SELECT columns FROM tables GROUP BY column HAVING condition;
There are tons of more clauses out there but we can do a lot of robust querying already with the statements I’ve talked about above! I’ve listed a PostgreSQL documentation about clauses below.