The Best BigQuery SQL Cheat Sheet for Beginners
A more fun approach to learning SQL-commands. By Jadwiga Wilkens.
Being new to coding with SQL-language remembering all important SQL-commands is not easy. Therefore I created a fun SQL-Command cheat sheet for commands used in BigQuery. For making the dry coding lines more fun, Josephine, the character drawn in the cheat sheet, is showing either the results of the displayed query or is acting based on it. A link to download a high-resolution version of the cheat sheet is at the end of the article.
The commands are explained with a basic table filled with data about fruits, namely apples, melons, and grapes. Basically, there are four columns; type, name, ripeness, and weight, covering the three most important data types: string for type and name, float (or integer) for weight, and date for ripeness.
For explaining the single commands and giving more details about how to use the code, single instances of the sheet are shown again in this post, going from up to down and from left to right.
Since there are only three types of fruits, the DISTINCT(type) command gives back three rows as melon, apple, and grape, depicted with only the outline of the fruits.
2. ORDER BY
First of all, to make code understandable and readable for other persons the
SELECT * (which means select all columns) statement should be avoided. But considering the lack of space writing out all column names was not a good option, still having in mind that good coding style looks different.
The ORDER BY statement is (almost) always put at the end of the code and is followed by the column(s) according to which the rows will be ordered. DESC meaning the biggest values or first letter being at the beginning of the alphabet first and ASC the smallest values or going from the back of the alphabet first. By giving two columns to order by, initially, the rows are sorted by the first given name, and then, if there are duplicates of the same value, these rows are then sorted by the second column name, which is depicted in the right picture.
Filters out rows that do not meet the requirements given in the WHERE statement. Here only the fruits are selected with ripeness date older than the current date, meaning they already went bad. So Josephine wants to throw them in the little trash bin drawn next to her.
Again, a good coding style requires to make people who did not write the code understand what the specific statements are doing. Using aliases for columns or table names that are not immediately clear or can be confusing helps a lot to comprehend statements. Keep in mind that the alias name cannot be used in the WHERE and JOIN statements, the original name has to be spelled out there. And for tables, once an alias is set up, only this alias can be used to refer to this table.
5. GROUP BY and HAVING
Running this query without the HAVING statement, only three rows will show up, like in the DISTINCT statement. But this time there is a second column, the COUNT(name) column which is named as amount where the number of items having the same type is counted. One could also write COUNT(ripeness) or COUNT(weight). When a condition on the amount column is set, a WHERE statement cannot be used. Instead, the HAVING statement is filtering out the rows where the condition is not fulfilled, here the alias amount could be used instead of COUNT(name). Now there is only one row with the amount being smaller than three, namely the grapes. Josephine has to reorder this type of fruit since there are only two left.
6. CASE WHEN and DECLARE
To make the CASE statement more readable, a DECLARE statement is used to define the variable today storing the current date. After declaring the name of the variable the type has to be given. A short list of other important types is written down in the Cheat Sheet. After the type definition, a default value can be given, otherwise, the variable has to be filled during the code which follows.
Now in the CASE statement, the ripeness entries are compared to the today variable. If ripeness is smaller than today, it means that the fruit already went bad, if ripeness is equal to today, the fruit should be eaten now and if ripeness is bigger than today, there is still time to eat the fruit.
Every CASE statement needs an END statement to declare the CASE block to be finished. After END an alias can be set which is highly recommended, here it is to_eat_or_not_to_eat, which explains itself when reading it and the entries together.
Here a new table is introduced, the calories_table. Some fruits and their calories per gram is stored, but not all fruits. As can be seen in the Venn-diagram drawing, apple, grape, and banana is included but not melon when it comes to the fruits table. Meaning that by left joining the fruits table with calories_table, the two items in the overlapping of the Venn-diagram are assigned to a cal_per_g value but the melons entries for cal_per_g are staying empty, a null value is displayed. And there is no banana row appearing in the result. Apparently, a friend of Josephine does not know that null means empty and thinks the melon she is holding in her hand has null calories, which is, of course, wrong.
Using a right join would be the other way around, the banana row would have null entries for name, ripeness, and weight, melon would not show up and apples and grapes would have all columns filled.
The full outer join would join every row of fruits and calories_table, but melon would have a null entry in the cal_per_g column and the banana row would look like in the right join.
Finally, the inner join would yield only apples and grapes with all columns filled.
By taking the cal_per_g column times the weight column in the SELECT statement, a new column is created with the calories per fruit. That is why Josephine can wonder about an apple only having 95 calories.
8. UNION and WITH … AS
If two tables have the exact same column structure, they can be appended to each other using a UNION statement. There are two types of UNIONs, one is UNION all where the whole two tables are put together. And there is UNION DISTINCT where the same rows appearing in both tables are not repeated in the result.
To store the resulting table, a WITH … AS statement can be used to define a new table, here fresh_stuff. It can be used like a stored table, only that the definition has to happen in the same SQL-document and before using it.
9. Easter eggs
Cross joining two tables can lead to a huge amount of rows, since a cross join puts every row from the first table together with every row from the second table, meaning that the number of rows from both tables is multiplied with each other.
And last but not least, a semicolon “;” denotes the end of a query, that is why Josephine has to stop and jumps over it while running away from the SQL-commands headline.
And here’s the link to the cheat sheet.