ON DATA ENGINEERING

The path to learning SQL and mastering it to become a Data Engineer

Julien Kervizic
Hacking Analytics

--

Photo by Campaign Creators on Unsplash

SQL is one of the key tools used by data engineers to model business logic, extract key performance metrics, and create reusable data structures. There are, however, different types of SQL to consider for data engineers: Basic, Advanced Modelling, Efficient, Big Data, and Programmatic. The path to learning SQL involves progressively learning these different types.

Basic SQL

What is “Basic SQL”

Learning “Basic SQL” is all about learning the key operations in SQL to manipulate the data such as aggregations, grain, and joins, for example.

Where to learn it

Basic SQL can be learned from websites such as W3C or looking for a more practical approach to learning from websites such as Datacamp or DataQuest. These websites allow us to get a decent grasp of SQL's core concepts, such as the different operations, functions, subqueries, and joins. Some of the core concepts in data engineering, such as working with the grain of a table/dataset, are often not as extensively discussed as they deserve.

Practice challenges

One of the main challenges of learning SQL is setting up the database and access to datasets. These days installing a local database has become quite easy, but it does require some time to set up the database. After that, the tables need to be created, and datasets uploaded onto them before they can become useable for practical learning.

For interviews

This type of knowledge is generally tested during screening interview questions, such as that of the histogram, to understand how candidates have grasped concepts such as granularity or joins. This type of interview question is also at the typical SQL knowledge level expected for fresh graduates embarking on data engineers' careers.

Advanced Modelling

Data engineers need to be able to model complex transformations. Learning some advanced analytical SQL helps model these types of behavior. Two main things help support this kind of use case 1) Advanced Queries 2) Data Models.

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com