SQL For Dummies (DataCamp Edition)

Peter Worcester
6 min readJun 14, 2019

--

Thanks to DataCamp’s SQL Fundamentals skill track, I was able to learn the fundamental skills necessary to interact with the majority of SQL dialects. The skills I have gained this week have empowered me to join tables, summarize data, and become a data science wizard. In this Medium article I’ll review what I learned from this SQL track.

Intro to SQL for Data Science

The job of a data scientists is to produce actionable insights from raw data. Much of this raw data — from medical records to baseball box scores — resides in collections of tables called relational databases. To be a successful data scientist, it’s necessary to know how to explore these data sets and extract your insights using a language called SQL. DataCamp’s “Intro to SQL” course taught me everything from selecting columns, filtering rows, aggregate functions, and sorting grouping and joining tables.

  1. Selecting columns

This chapter gave me a brief introduction to working with relational databases. I learned their structure, how to talk about them using database lingo, and how to begin exploratory analysis by using simple SQL commands to select and summarize columns from database tables. Below are a few commands I learned:

  • SELECT: used to select data from a database
  • DISTINCT: used to return only distinct values
  • COUNT(): returns the number of rows that matches a specified criteria

2. Filtering Rows

This chapter built on the first chapter by teaching me how to filter tables for rows satisfying a specified criteria. I learned how to use basic comparison operators, combine multiple criteria, and match patterns in text. Below are some of the commands I learned:

  • WHERE: used to filter records that fulfill a specified condition
  • BETWEEN: selects a value within a given range
  • WHERE IN: allows you to specify multiple values in a WHERE clause
  • LIKE: used in a WHERE clause to search for a specified pattern in a column
  • AS: used to rename a column or table with an alias

3. Aggregate Functions

This chapter taught me how to use aggrgate functions to summarize my data and gain useful insights. I learned about arithmetic in SQL and how to use aliases to make results more readable.

  • Aggregate functions: MIN() MAX() SUM() AVG() …

4. Sorting, grouping, and joins

This chapter provided a brief introduction to sorting and grouping results, and briefly introduced joins. Below are some commands I learned:

  • ORDER BY: used to sort the result-set in ascending or descending order
  • GROUP BY: groups rows that have the same values into summary rows
  • HAVING: similar to the WHERE clause but can be used with aggregate functions
  • JOIN: used to combine rows from two or more tables, based on a related column between them

Joining Data in SQL

After learning the basics of SQL I took this course to learn about using joins and relational set theory. I learned about the power of joining tables while exploring interesting case studies. I mastered inner and outer joins, as well a self-joins, semi-joins, anti-joins, and cross joins. Lastly, I was introduced to subqueries.

  1. Introduction to joins

In this chapter I was introduced to the concept of joining tables and explored how to enrich my queries with inner and self-joins. I also learned how to use the case statement to split up a field into different categories. Below are a few of the commands I learned:

  • INNER JOIN: selects records that have matching values in both tables
  • USING: shorthand when joining tables that have identical matching keys
  • CASE: used to create different output based on conditions

2. Outer joins and cross joins

In this chapter I learned different kinds of outer joins. I gained further insight into my data through the use of left, right, and full joins. Also, I learned cross joins.

  • LEFT JOIN: returns all records from the left table and the matched records from the right table
  • RIGHT JOIN: returns all records from the right table and the matched records from the left table
  • FULL JOIN: returns all records from both tables

3. Set Theory Clauses

In this chapter I learned about set theory with the help of Venn diagams and was introduced to UNION, UNION ALL, INTERSECT, and EXCEPT clauses. Below is a diagram that outlines these relationships.

  • UNION: used to combine the result-set of two or more SELECT statements
  • UNION ALL: combines the result set of two or more SELECT statements (allows duplicate values)
  • INTERSECT: returns the results of 2 or more SELECT statements but only returns the rows selected by all queries
  • EXCEPT: used to return all rows in the first SELECT statement that are not returned by the second SELECT statement

Below is a diagram that visually explains these relationships.

4. Subqueries

In the closing chapter I learned how to nest queries to add some finesse to my data insights.

Intermediate SQL

After learning the basics of SQL and how to join columns it’s now time to manipulate, transform, and make the most sense of this data. This course taught me several key functions necessary to wrangle, filter, and categorize information in a relational database. I learned more about CASE statements, subqueries, and was introduced to window functions.

  1. CASE

In this chapter I learned how to use the CASE WHEN statement to create categorical variables, aggregate data into a single column with multiple filtering conditions, and to calculate counts/percentages.

2. Short and Simple subqueries

In this chapter, I learned about sub queries in the SELECT, FROM, and WHERE clauses.

3. Correlated Queries, Nested Queries, and Common Table Expressions

In this chapter, I learned how to use nested and correlated subquereis to extract more complex data from a relational database. I learned about common table expressions (CTEs), and how to best construct queries using multiple CTEs.

4. Window Functions

In this final chapter, I learned about window functions and how to pass aggregate functions along a data set. I also learned about how to calculate running totals and partitioned averages.

SQL for Exploratory Data Analysis

Building on the skills gained from joining tables, using basic functions, grouping data, and using subqueries, I finally learned to to explore a database and the data in it. I used functions to aggregate, summarize, and analyze data without leaving the database. I learned problems to look for and strategies to clean up messy data.

  1. What’s in the database?

In this chapter I started by exploring a database by identifying the tables and foreign keys that link them. I looked for missing values, counted the number of observations, and joined tables to understand how they’re related.

2. Summarizing and aggregating numeric data

I built on functions like min and max to summarize numeric data in new ways. I added average, variance, correlation, and percentile functions to my toolkit and learned how to truncate and round numeric values too. I built on complex queries and saved these results by creating temporary tables.

3. Exploring categorical data and unstructured text.

In this chapter I learned how to deal with text in data. Namely, how to deal with inconsistencies in case, spacing, and delimiters. I created temporary tables to recode messy categorical data to standardized values I could count and aggregate.

4. Working with dates and timestamps

In this final chapter I learned how to aggregate date/time data by hour, day, month, or year and practiced this by constructing time series and finding gaps in them.

Conclusion

DataCamp does a terrific job of introducing key concepts and enforcing them through interesting case study problems. However, many of their practice questions were “fill-in-the-blank” and I didn’t find this particularly effective in reinforcing my SQL coding.

I would recommend taking this course if you have absolutely no technical experience with SQL and want to learn the basics of database management. I would not take this course if you already have experience with SQL before. There are more advanced courses out there that will better challenge your skills, especially if you are looking at earning a certification in SQL.

--

--