SQL Starter Pack

Published in

CodeX

6 min readJul 11, 2021

If you are looking to transition to data science and analytics, SQL will be one of the things that you should be learning first. Aside from the relatively easy learning curve compared to other toolkit, proficiency in SQL will also open doors for you when looking for a job. Based from experience, whether you are looking into a tech or a non-tech company, SQL will be a handy skill as it is part of their “basic requirement”. At least in the Philippines, most companies are still in the process of transforming digitally and SQL helps their analysts to extract and manage their databases.

SQL, meaning Structured Query Language, is commonly used to query and analyze data storage or collection of tables known as Relational Databases. This is most useful for data analysts but still useful for data scientists when managing data.

The most used statements in SQL are SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY. SELECT and FROM are mandatory statements in querying, whereas the others optional but should be used in particular order. To remember this easier, a mnemonic was created by my friend @janinecheong — “San Francisco, We’re Getting Hot Oreos!”

SELECT & FROM

SELECT: used to select data from table by specifying columns to be reflected
to select multiple columns from a table, use commas (,) to separate different columns
to select all columns from a table, use an asterisk (*)
to select only all the unique values and disregard duplicates, use DISTINCT keyword
FROM: specifies the table where your data will be pulled from

-- selecting NAME and ADDRESS columns from PEOPLE table
SELECT name, address
FROM people;-- selecting ALL columns from PEOPLE table
SELECT *
FROM people;-- selecting unique values from NAME column from PEOPLE table
SELECT DISTINCT name
FROM people

COUNT statement can used to complement SELECT to return the number of non-missing values/rows in a particular column. When doing exploratory analysis on your data, COUNT can be used with DISTINCT to identify how many unique values are there in the column.

-- selecting the count of values from TRANSACTION_ID column from ORDERS table 
SELECT COUNT(transaction_id)
FROM orders;-- selecting the count of UNIQUE values from EMPLOYEE_ID column from EMPLOYEE table
SELECT COUNT(DISTINCT employee_id)
FROM employee;

LIMIT keyword can be used to return only a specific number of rows in your result. This is useful if you only want to verify what type of values are to be expected in each column.

-- selecting the FIRST FIVE values of ALL columns from PEOPLE table
SELECT *
FROM people
LIMIT 5;

Aliasing with AS keyword can be used to rename the selected column.

-- renaming the derived count of values from TRANSACTION_ID column as num_of_transactions
SELECT COUNT(transaction_id) AS num_of_transactions
FROM orders;

WHERE

sets the parameters or logical condition to follow when data is to be filtered
below operators can be used to set the logic for filtering

+ — — — — + — — — — — — — — — — — — — — — -+
| =       | equals                         |
| <>      | does NOT equal                 |
| <       | greater than                   |
| >       | less than                      |
| <=      | greater than or equal to       |
| >=      | less than or equal to          |
| BETWEEN | range between two values       |
| LIKE    | match with similar pattern     |
| IN()    | equals one of specified values |
+ — — — — + — — — — — — — — — — — — — — — -+

example use of WHERE statement

-- selecting values of ALL columns from EMPLOYEE table, given that the employee is hired on the year of 2016 or later
SELECT *
FROM employee
WHERE hired_year >= 2016;-- return the COUNT of employees from the EMPLOYEE table, hired on December 1, 2020
-- IMPORTANT: remember to use ISO date format (YYYY-MM-DD)
SELECT COUNT(*)
FROM employee
WHERE hired_date = '2020-12-01';

Additional statements such as AND, OR, IN, and LIKE can be used with WHERE to modify filtering results

+ — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — -+
| AND  | returns records meeting ALL of multiple conditions set |
| OR   | returns records meeting ANY of multiple conditions set |
| IN   | alternative to using multiple OR conditions            |
| LIKE | use of wildcard or pattern matching as conditions      |
+ — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — -+

examples in using AND & OR statements.

-- selecting title of songs by ‘The Chainsmokers’ and ‘Custismith’ from PLAYLIST table
-- if filtering for two values under the same column name, the column name should be repeated when filtering
SELECT title
FROM playlist
WHERE artist = ‘The Chainsmokers'
AND artist = ‘Curtismith’;-- selecting title and release_year of all KPOP songs after 2020
SELECT title, release_year
FROM playlist
WHERE release_year > 2020
AND language = 'Korean'; -- selecting songs released in either 2010 or 2020
-- if filtering for two values under the same column name, the column name should be repeated when filtering
SELECT title
FROM playlist
WHERE release_year = 2010
OR release_year = 2020;-- when combining AND and OR, be sure to enclose the individual clauses in parenthesis
SELECT title
FROM playlist
WHERE (genre = 'pop' OR genre = 'country')
AND (artist= 'Taylor Swift' OR artist= 'Blake Shelton');

GROUP BY

sets the criteria on how to group the data/results
data can be segmented by multiple columns which will allow user to create a similar view of a PivotTable with sub-group

-- counting the number of days when overtime was rendered by employees, grouping by job grade
SELECT job_grade, COUNT(num_of_overtime_date)
FROM employee
GROUP BY job_grade-- multiple groupings
SELECT category, rating, COUNT(title) as count_of_movies
FROM film_list
GROUP BY category, rating

Aggregate functions like COUNT, COUNT DISTINCT, MIN, MAX, AVG, and SUM can be used to provide group-level statistics.

+ — — — — — — — -+ — — — — — — — — — — — — — — — — -+
| COUNT          | count of records                 |
| COUNT DISTINCT | count of distinct values         |
| MIN            | returns the lowest value         |
| MAX            | returns the highest value        |
| AVG            | average of all values            |
| SUM            | adds up the total numeric values |
+ — — — — — — — -+ — — — — — — — — — — — — — — — — -+

example of using AGGREGATE functions and GROUP BY.

-- create a table to compare rental price of title per film rating
SELECT
     rating,
     COUNT(title) AS num_of_films,
     MIN(price) AS cheapest_rental,
     MAX(price) AS most_expensive_rental
     AVG(price) AS average_rental
FROM film_list
GROUP BY rating

HAVING

filters based on the result of aggregate functions

-- returns the list of ratings with at least 20 films in its category
SELECT
    rating,
    COUNT(*) AS num_of_films
FROM film_list
GROUP BY rating
HAVING COUNT(*) >= 20

ORDER BY

used to sort or put the results in order, in ascending order by default
DESC can be used to sort values in descending order
can be used with multiple conditions but the first criteria will be prioritized, then sort by the next, soon.

-- select list of employees showing their employee_id, job_grade, and hired year in order of newest hire to oldest hire
SELECT employee_id, job_grade, hired_year
FROM employee
ORDER BY hired_year, hired_date DESC

Leveling up your SQL journey

Practicing above statements will be a good start to flex your basic SQL knowledge. Should you want to pursue, below are concepts that will be useful in solidifying your foundation.

What is Relational Database Management System (RDBS) and how does it work?
Using JOINS (Inner, Left, Right, Full Outer, Union)
Bridging tables without common fields

Useful reference below:

If you enjoyed reading this quick reference guide and would like to know more, feel free to reach out to me via Twitter or LinkedIn. I’ll be more than happy to hear about your learning journey!

SQL Starter Pack

Written by Maico Rebong