Intro to SQL

Vicky Szuflita
6 min readMay 20, 2020

--

For beginners, SQL can be extremely counterintuitive.

So I’ll leave you with three pieces of advice:

  1. Acknowledge that the order of operations doesn’t make sense, but trust me, you’ll get the hang of it!
  2. Don’t beat yourself up if you don’t connect with online tutorials (quite frankly… it’s hard to care about “how many bats the baseball store sold”).
  3. Don’t assume there’s only one path to an answer. You can almost always write different SQL instructions to get to the same result!

What is SQL?

While most coding languages are written instructions that tell computers what to do, SQL is instructions to a database about what information you want to retrieve from it.

Situation: You have a database with many many tables.

Goal: You want to do analytics (averages, counts etc.) on information in the tables, or pull a custom list of items from the table.

Problem: The tables are likely massive (too big to open and run calculations in Excel) and/or the information lives in multiple tables and you need to mash them together.

When would you use SQL?

It is not a language that will let you craft something out of nothing, rather, it lets you dig through “somethings” that are already there.

Your job or school will quite literally say “Here is the database, use SQL to query the information out of it”.

That’s why it is so hard to learn before you’re thrown into a situation where you can use it in action — most learning platforms have to rely on hypothetical situations with fake data.

Let’s ground ourselves in a (hopefully more meaningful) example…

Since I found it so hard to learn SQL using hypothetical data, let’s imagine we’re looking at a dataset that’s close to home — data about your family.

Do I hope you think up your own tables instead of trying to follow this narrative about my boring fake family? Yes! :)

Imagine you have a database that has tables and tables of information about every little thing about your family…

One table is called “Family_Members” and it’s a list of your family members — including their names, their ages, and their relation to you.

You have another table called “Places”.

And another table is called “Calls_Home,” that is a record of every time you called a family member, when it happened, how long it lasted, what phone number you called, and what phone number you used.

The list goes on! If this was a real world database, there would also be handfuls of other tables you don’t care about and would probably never use.

How do you write a SQL query?

Cool cool cool… what now?

Command for choosing the table you want to look at:

SELECT Statement to choose which columns you’re interested in seeing at the end.

FROM Statement you use to choose the main table you’re interested in looking at.

Command for Combining Tables:

JOIN ON Telling the database which tables to combine together, and on which identifier. There are “inner”, “outer”, “left” and “right” joins. An explanation of the difference can be found HERE.

Commands for Filtering:

WHERE Filtering down to items that meet desired conditions.

Commands for Aggregations:

GROUP BY Choosing a column to group by, so that an aggregation can be done.

I’m so confused! Can I seen an example?

EXAMPLE 1: I want to return a table with all my family members, their ages, and where they live.

Desired Result

SELECT *

FROM Family_Members

LEFT JOIN Place ON Family_Members.Name = Places.Name

What’s going on here? —

  1. SELECT * means we want to return, ALL the available columns from the tables we’re calling. (* = All).
  2. FROM Family_Members means this is the table we’re calling.
  3. LEFT JOIN Place means we want to smash the Place table and Family_Members table together. (Does it matter the order? Could we instead “FROM Places” and “JOIN Family_Members”? In this case, yep! Totally!)
  4. ON Family_Members.Name = Places.Name This is what we’re matching up between the tables.

EXAMPLE 2 (More complicated!): Now we want to see how many times I call each aunt (because Aunt Meredith is starting to get jealous of Aunt Shannon!)

Desired Result

SELECT Name, Count(*) AS Aunt_Calls

FROM Family_Members

LEFT JOIN Calls_Home ON Family_Members.Name = Calls_Home.Name

WHERE relation = ‘Aunt’

GROUP BY Name

What’s going on here? —

  1. SELECT Name, Count(*) means we want to see the name column, and a count of how many rows there are for each name (because we know each row represents one call made). In this case it’s just a count, so we can use the *, but we could do other aggregates like Avg(column_name), Max(column_name), Min(column_name), etc!
  2. AS Aunt_Calls The AS allows us to name the column something else, otherwise, it would just return as “Count”. But now that column is called “Aunt_Calls”.
  3. FROM Family_Members means we are interested in this as our base table.
  4. LEFT JOIN Calls_home means we want to combine the Calls_home table with the Family_Members table. (Does it matter the order? Could we flip the Family_Members and Calls_home table? In this case, yep! Totally!)
  5. ON Family_Members.Name = Calls_home.Name This is what we want to have match up. In this case, it’s on “name”. We know Aunt Shannon is our ‘Aunt’, and we want to match this table to all the records of when we called her.
  6. WHERE Relation = ‘Aunt’ Since we don’t care about the other types of family members, we are reducing it to only aunts.
  7. GROUP BY Name Note, every time you do an aggregation (Avg, count, sum etc) you need to GROUP BY the columns included in your select statement. Because quite frankly, when we tell it to “count” the computer is like “cool cool… count what?” By including GROUP BY Name, we’re telling it to count how many rows there are per Name. And since each row represents one phone call, it will tell us how many phone were made to each aunt.

Help! I still don’t get it. Is it just me, or does this order seem really confusing and counterintuitive?

It’s not just you! Here is a comparison of the order the language expects the instructions in, versus the order that the instructions make sense logically.

With this final example, we’ll try thinking of it with this new order in mind.

EXAMPLE 3: I want to know the average age of the family members in each city. (I want to move to a city with family, but I want that family to be young and cool!) But I know I don’t want to move to Boston (too cold).

Desired Result

SELECT Lives, Avg(Age) AS Avg_Age

FROM Family_Members

LEFT JOIN Place ON Family_Members.Name = Place.Name

WHERE Lives not in (‘Boston’) → Can also be written… WHERE Lives != ‘Boston’

GROUP BY Place

ORDER BY Avg_Age

The FROM and JOIN steps get us to a table we’ll ultimately be working with.
Now that we have our new parent table, we cut out the rows we know we don’t want.
Next, think about the matching rows being placed into groups based on the column name provided. In this case “Lives”. Then we identify which final columns we want to see, and do the average within the groupings.
Now it’s just a matter of ordering based on the lowest to highest average age!

ATLANTA IT IS!

--

--