SQL- Structured Query Language

A language used for accessing data stored in a database.

yu yang
yu yang
Sep 9, 2018 · 5 min read

Journey with Udacity Data Science Challenge Scholarship 2018/2019 powered by Bertelsmann — — Blog 6

Question 1: What is the entity relationship diagram?

An entity relationship diagram (ERD) is a common way to view data in a database.

Source: Udacity

You can think of each of these tables as an individual spreadsheet. Then the columns in each spreadsheet are listed below the table name. For example, the region table has two columns: id and name. Alternatively the web_events table has four columns.

Question 2: What is the statements in SQL? What are the most used statements?

In SQL, you can think of a statement as:

  1. A piece of correctly written SQL code.
  2. A way to manipulate data stored in a database.
  3. A way to read data stored in a database.

SQL is not case sensitive.

SELECT: where you tell the query what columns you want back.(choose columns)

FROM: where you tell the query what table you are querying from. (choose tables)

LIMIT: when you want to see just the first few rows of a table.

The LIMIT command is always the very last part of a query.

ORDER BY: to order our table by any row. The ORDER BY statement is always after the SELECT and FROM statements, but it is before the LIMIT statement. Remember DESC can be added after the column in your ORDER BY statement to sort in descending order, as the default is to sort in ascending order.

we can ORDER BY more than one column at a time.

WHERE: subset out tables based on conditions that must be met. The WHERE statement can also be used with non-numerical data. we can use the LIKE, NOT, IN, AND & BETWEEN or OR operators.

Derived Columns: Creating a new column that is a combination of existing columns is known as a derived column.

Question 3: how to leverage SQL to link tables together?

JOIN: pull data from multiple tables. It introduces the second table from which you would like to pull data. JOIN is a INNER JOIN that only pulls data that exists in both tables.

ON: tells you how you would like to merge the tables. condition check.

PK: PK is associated with the first column in every table and stands for primary key. A primary key exists in every table, and it is a column that has a unique value for every row.

FK: A foreign key is when we see a primary key in another table.

AS: When we JOIN tables together, it is nice to give each table an alias. Frequently an alias is just the first letter of the table name. Frequently, you might also see statements without the AS statement.

Here are three types of JOINs. LEFT JOIN, RIGHT JOIN and FULL OUTER JOIN.

Source: Udacity

LEFT JOIN: a way to pull all of the rows from the table in the FROM even if they do not exist in the JOIN statement.

RIGHT JOIN: a way to pull all of the rows from the table in the JOIN even if they do not exist in the FROM statement.

NULL frequently occurs when performing a LEFT or RIGHT JOIN.

NULL: a datatype that specifies where no data exists in SQL. NULL is not a value. When identifying NULLs in a WHERE clause, we write IS NULL or IS NOT NULL. We can not use =.

Question 4: How to aggregate data?

COUNT: return a count of all the rows that contain some non-null data.

SUM: Unlike COUNT, you can only use SUM on numeric columns. SUM will ignore NULL values.

MIN: return the lowest number, earliest date, or non-numerical value as early in the alphabet as possible.

MAX: returns the highest number, the latest date, or the non-numerical value closest alphabetically to “Z.”

MIN and MAX can be used on non-numerical columns.

AVG: returns the mean of the data.

AVG, MIN and MAX are aggregators that again ignore NULL values. If you want to count NULLs as zero, you will need to use SUM and COUNT.

GROUP BY: can be used to aggregate data within subsets of the data. The GROUP BY always goes between WHERE and ORDER BY.

DISTINCT: is always used in SELECT statements, and it provides the unique rows for all columns written in the SELECT statement. you only use DISTINCT once in any particular SELECT statement.

HAVING: Essentially, any time you want to perform a WHERE on an element of your query that was created by an aggregate, you need to use HAVING instead.

WHERE appears after the FROM, JOIN, ON but before GROUP BY. HAVING appears after the GROUP BY but before the ORDER BY.

CASE: always goes in the SELECT clause. CASE must include the following components: WHEN, THEN, and END. ELSE is an optional component to catch cases that didn’t meet any of the other previous CASE conditions.

Question 5: What is subquery?

In the first subquery you wrote, you created a table that you could then query again in the FROM statement.

WITH: is often called a Common Table Expression or CTE.


The article below compares three of the most common types of SQL: SQLite, PostgreSQL, and MySQL.

If you are in charge of setting up a new database, it is important to have a thorough understanding of database normalization. There are essentially three ideas that are aimed at database normalization:

  1. Are the tables storing logical groupings of the data?
  2. Can I make changes in a single location, rather than in many tables for the same information?
  3. Can I access and manipulate data quickly and efficiently?

Some extra useful Links.


I am now 92% new to data science. Any advises to my blog are welcome! Feel free to contact me if you have any problem or doubts about the content. You can also find me hier LinkedIn.

Source: giphy.com
yu yang

Written by

yu yang

On my way to be optimistic and data Scientist