Before we start, if you haven’t read my previous article for SQL, it would be helpful if you read it first.
Explore table size
Let’s have a little recap about SQL:
Now we have a table fortune500
Let’s explore it step by step
- figure out how many rows are in
fortune500
2. Subtract the count of the non-NULL
ticker
values from the total number of rows; alias the difference as missing
.
Join tables
Part of exploring a database is figuring out how tables relate to each other. The company
and fortune500
tables don't have a formal relationship between them in the database, but this doesn't prevent you from joining them.
To join the tables, you need to find a column that they have in common where the values are consistent across the tables. Remember: just because two tables have a column with the same name, it doesn’t mean those columns necessarily contain compatible data. If you find more than one pair of columns with similar data, you may need to try joining with each in turn to see if you get the same number of results.
Join.
There’s two ways to use “Join”
- SELECT table_column1, table_column2…
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name; - SELECT table_column1, table_column2…
FROM table_name1
INNER JOIN table_name2
USING (column_name);
Different kind of Join
Read an entity relationship diagram
What is the most common stackoverflow
tag_type
? What companies have a tag
of that type
?
Coalesce
Evaluates the arguments in order and returns the current value of the first expression that did not initially evaluate to NULL.
For example:
SELECT COALESCE(NULL, NULL, ‘third_value’, ‘fourth_value’);
returns the third value because the third value is the first non-null value.
Case
In the fortune500
data, industry
contains some missing values. Use coalesce()
to use the value of sector
as the industry when industry
is NULL
. Then find the most common industry.
Use coalesce()
to select the first non-NULL
value from industry
, sector
, or 'Unknown'
as a fallback value.
Data type.
Two methods to assign data type.
SELECT value::new _ type;
SELECT CAST (value AS new _ type);
If you enjoy this article please click the thump up (Keep pressing can send multiple thump up!) and follow me, I’ll keep updating content about data science! Hope you like it.
Reference:
Christina Maimone: EXPLO RATO RY DATA ANALYSIS IN SQL, datacamp