First steps to SQL!

tugce satir
Analytics Vidhya
Published in
7 min readJun 2, 2020

Do you know that SQL has been around since 1970?

When we think about the amount of data we are storing and using, it looks like SQL is going to be around for a long time.

What is SQL?

Let’s assume that we have a database on a server. This server can be on a local network, in the cloud or in your computer. SQL is a way to talk to your database.

With SQL, we can create, read, update and delete data in a database. There are many SQL sublanguages like PostgreSQL, SQLite, Oracle, MySQL ext. But no need to worry about them! When we learn the basics of SQL it is easy to adapt and learn others.

Actually, one thing is certain about SQL that it is very easy to learn and use.

Let’s start !

  1. Installing PostgreSQL

You can download PostgreSQL from this link: postgreSQL. If you have any questions regarding the installation or need a post about it please email me, I am more than happy to help.

If you don’t want to download and install an application at this point we can use w3school’s SQL editor from here: w3school.

Just a small reminder: I will use two fictional data tables called ‘Sales’ and ‘Delivery’ to illustrate examples.

2. Getting Started with First Queries

The list below is the core statements of SQL. If you need to work with SQL, these are the main things you need to know.

Let’s deep dive into each statement and discover what they are doing with an example.

SELECT

Select helps us to return which columns we want to see. When we use ‘*’ operator, it returns every column in our table.

SELECT    * 
FROM Sales;

If you want to select columns you want to see at the end table you can specify column names in the select field.

SELECT   Product, Order
FROM Sales;

WHERE

Where statement helps us to filter data, give conditions. We can use ‘=’, ‘<’, ‘>’, ‘<=’, ‘>=’, ‘!=’ to define conditions. If the condition returns true, the value is kept in returned data.

For example, below statement returns products only from New York

SELECT  Product
FROM Sales
WHERE State = ‘NewYork’;

AND

We can also define multiple conditions. Here, ‘AND’ statement comes to our help!

Below statement returns products from not only New York but also Washington.

SELECT   Product
FROM Sales
WHERE State = ‘NewYork’
AND State = ‘Washington’;

OR

We can return results in which any one expression meets the condition. Below statement returns products either from New York or Washington.

SELECT   Product
FROM Sales
WHERE State = ‘NewYork’
OR State = ‘Washington’;

BETWEEN

With between statement we can return results that fall in a range that we defined.

Below statement returns states that has the number of sales between 1000 and 5000. Of course we can write same query with ‘<’ and ‘>’. But, using between is faster for this case.

SELECT   State
FROM Sales
WHERE Order BETWEEN 1000 AND 5000;

ORDER BY

ORDER BY orders selected columns. Default sorting is from low to high (ascending order). If we type ‘DESC’ in order by, it orders from high to low.

Below statement returns product which are ordered from high to low.

SELECT    Product
FROM Sales
ORDER BY Order DESC;

Counting, Grouping and Listing Distinct Records

COUNT

We already looked at how to select, sort and filter our data. Here, we will explore how to modify our results?

By using count, we can count number of rows that meets conditions we defined.

For example, below query returns the total number of records in our table.

SELECT  COUNT(*)
FROM Sales;

We can also count distinct values in a specific field like the below query. Since we have 3 distinct products, namely Breakfast Blend, Columbia and Dark Roast, below query returns 3.

SELECT   COUNT(DISTINCT PRODUCT)
FROM Sales;

Tip: COUNT doesn’t include NULLs in its calculations.

AS

We can rename a field in the return table using AS. The query below returns results with a new column name: ‘Distinct_Products’

SELECT COUNT(DISTINCT PRODUCT) AS Distinct_Products
FROM Sales

GROUP BY

If we are using aggregate and non-aggregate fields in SELECT, we need to group non-aggregate fields in GROUP BY.

For example below query returns number of orders for each state. 1 from Washington, 2 from Virginia and 2 from New York.

SELECT   COUNT(Order_Id), State
FROM Sales
GROUP BY State

AVG

We can take averages with AVG function.

The query below retuns the average orders per state.

SELECT   AVG(Order), State
FROM Sales
GROUP BY State

SUM

The query below retuns the total number of orders per state.

SELECT     SUM(Order), State
FROM Sales
GROUP BY State;

MIN

The query below retuns the minimum number of order in the Sales table.

SELECT MIN(Order)
FROM Sales;

MAX

The query below retuns the maximum number of order in the Sales table.

SELECT  MIN(Order)
FROM Sales;

CASE WHEN

I use CASE WHEN a lot! It is very helpful.

CASE WHEN can simply be summarized as if-then. We can create a new column with CASE WHEN statement. It divides data according to any condition we defined and returns a new field.

SELECT Order_Id,
CASE WHEN Order < 2000 THEN ‘Small’
WHEN Order < 5000 THEN ‘Middle’
ELSE ‘Big’
END AS Order_Type
FROM Sales;

HAVING

HAVING can be used as a filter for aggregations.

The query below returns ‘States’ which has Orders more than 1.

SELECT    COUNT(Order_Id), State
FROM Sales
GROUP BY State
HAVING COUNT(Order_Id) > 1;

LIMIT

LIMIT command limits the number of rows that can be returned from the query.

The below query returns 100 rows.

SELECT Order_Id, State
FROM Sales
LIMIT 100;

That’s end of the first part of “Everything you need to know about SQL!”

I tried to tell important fields in SQL. In the second part, we will continue with JOINs.

3. JOINs

Until now, we worked with one single table. But, in business life working with only one table is not very likely to happen. We generally need to use data from multiple tables. In this case, JOINs are here to help!

I heard from many people that JOINs can be difficult to get during the first try. So, I’ll try to explain everything very specifically.

Before we start joins we need to define two things.

1. We need to define how we want join these tables. Do we want to keep all values in the main table? Or do we need just the intersection ? (LEFT, RIGHT or INNER JOIN)

2. We are trying to join two table. We need a common column that used in both of the tables. (ON)

Let’s say that we want to learn whether each order in our Sales table is delivered or not. And information about delivery is stored in another table called Delivery.

In order to join two tables, we need to find a shared column. This column might have two different names in two tables but it has to carry same information. In our case, we have order_id’s common. Basically, I want to take everything in the Sales table and want to add it’s delivery info from Deliver table. But how?

The script is;

SELECT Define columns you want to see at the endFROM   First table name (Order matters here ! ) WRITE JOIN TYPE (INNER, LEFT, RIGHT) Second table nameON     First Table Name.Common column name from 1st table = Second table name.Common column name from 2nd table

Let’s discover JOIN types.

LEFT OUTER JOIN

In LEFT OUTER JOIN all records from table A are returned (even if there is no match with the second table) with the matched field from table B.

SELECT A.Order_Id, B.Delivery_f
FROM Sales A LEFT OUTER JOIN Delivery B
ON A.Order_Id = B.Order_Id;

INNER JOIN

Sometimes we need only the intersection of two tables. May be I want to look only states whose delivery status is 1. In this case, we need to use INNER JOIN.

Inner join returns all records that have a match in both table.

SELECT   A.Order_Id, B.Delivery_f
FROM Sales A INNER JOIN Delivery B
ON A.Order_Id = B.Order_Id;

FULL OUTER JOIN

It returns all records from both first and second table.

SELECT   A.Order_Id, B.Delivery_f
FROM Sales FULL OUTER JOIN Delivery
ON Sales.Order_Id = Delivery.Order_Id;

JOINs are not the only way to combine two tables, we can also use Union operator.

But, What is the difference between JOIN and UNION ?

With UNION, we don’t need a matching condition. We are basically combining two tables top and bottom.

UNION

UNION statement removes duplicate values and combine two tables.

SELECT  Order_Id, State FROM Sales_January
UNION
SELECT Order_Id, State FROM Sales_February

UNION ALL

UNION ALL statement keeps all values and combine two tables.

SELECT  Order_Id, State FROM Sales_January
UNION ALL
SELECT Order_Id, State FROM Sales_February

That’s it!

If you’re still reading you made it till the end! Good news, you covered all the basics about SQL.

There are more to learn, but as a beginning this is all you need. To strengthen your skills, I can suggest two websites that have great practice questions:

After that, examples from leetcode can be solved. First start with easy questions and continue with hard ones.

--

--

tugce satir
Analytics Vidhya

I have just started to write about data science! Check out my work: https://tsatir.github.io/