The Gate to the World’s Data — SQL

Yash Gupta
Data Science Simplified
6 min readJul 15, 2024

Since I joined my workplace, there’s never been a day I haven’t seen magic happen with SQL. It’s unimaginable that you can pull so many data points on the fly and make aggregations that make data-driven decision-making seem so simple.

Thank you, SQL — for being our gatekeeper to almost every dataset because no better tool has such a great effort-result ratio.

Title image for Thumbnail

Here’s the previous article in the series of Analytical Tools reviews — on Excel if you missed it:

Since as an analyst, there are going to be a few instances that you will sit with SQL end-to-end on a project, I’ll let you in on how it feels to use SQL throughout these little bits. However, let me reassure you, that it does pack a punch.

In this article, we’ll cover the following things:

  1. We’ll touch upon what SQL is.
  2. Where does SQL come in, in an Analyst’s job every day?
  3. How difficult is SQL?
  4. What makes SQL so good?
  5. What should your focus be on, to improve in SQL?
  6. What’s next?

What is SQL?

‘Sequel’ as it is called or SQL for Structured Query Language is the language of the data warehouses! You use it to call a table sitting in a database in a data lake and just say “Psst! I want these data records” and you get it. It is simple like that!

Example syntax looks like this: SELECT * FROM table_name;

Since we’re only touching on the things SQL can do — I’ll let another old article of mine answer this question (while we move onto the real questions this article will try to answer)

Where does SQL come in, in an Analyst’s job every day?

To an analyst who works with Big Data, SQL is a boon, because basic computations that are not so complicated, barely take any time with SQL. However, it is a given, since we’re querying the data from the source itself and aggregating it in a tool like Excel which comes with certain limitations on the size of the data, it would be impossible.

You may be using SQL for the following tasks as an analyst (now or going ahead):

  1. Aggregating data to check consistency between data sources for reliability
  2. Instantaneously reporting numbers as per the latest available data in sources
  3. Prepare newer tables (analytical layers) to help ahead in analytics (Analytics Engineering)
  4. Work on preparing complex data schemas to prepare aggregate tables that serve as the single source of truth across data teams
  5. Preparing new attributes around your customers and automating it to get newer information around customer segments etc., if it doesn’t involve complicated calculations. Note: you can do complicated calculations in SQL, the only catch is that other tools may help you with the computing around it and enable you to go to the next level of in-depth calculations to assist your work as well, which might not be possible with SQL.
  6. Preparing smaller tables out of Big Data (aggregating at a dimensional level) to use as input in other data tools such as Python, Tableau, etc.

Using SQL will help you make a simple analysis easily and quicker than any other tool. The reason why I would do a simple analysis that needs a few numbers via SQL and not other tools is the trade-off between time and decision-making. To empower your management to make a data-driven decision — you must give them your analysis ASAP.

Additionally, reconciliation of the data doesn’t become a problem when you use data from the source itself.

Seems like a good deal of things SQL can do, but is it really difficult?

How difficult is SQL?

I took a 25-hour course from Udemy (link here) back in 2021 and it helped me become a pro at SQL today, 3 years down the line. The difficulty in SQL is just about ‘thinking’.

SQL is made up of simple syntaxes and a large community to help you through everything. Solid documentation (at least from Snowflake) helps too. There is no reason why SQL will be fundamentally difficult to get right, it may just need some practice.

Snowflake documentation sample: here

It is like Inception inside my head (great movie though!), you see a base table with certain rows and columns. You decide what are you trying to answer and slowly manipulate the tables you’re using to decide how you will arrive at the end output that will help you.

The ways tables change when you write a command and how aggregating and reaggregating them works, if you can visualize this, nothing better than this. That’s how I work with SQL and it works out pretty well for me.

The best is yet to come, and it adds to what you can understand about the difficulty levels here.

What makes SQL so good?

Let me give you an in of the difference between some commands from some tools. There are words such as dplyr in R Language, pandas and numpy in Python, etc. which are just names of packages that you use usually but you see — the names won’t make sense to you until you use them. SQL wins by miles here.

Its task is very simple — Take data from a table, and wrangle it to find something or make something else. So commands such as this can come through:

WITH BASE_TABLE_CTE AS 
(SELECT * FROM TABLE
WHERE condition = TRUE
GROUP BY dimension1, dimension2)
SELECT *, some_aggregation FROM BASE_TABLE_CTE
HAVING some_aggregation > 2;

Now to anyone who doesn’t know SQL, but knows why it is used, this will still make some amount of sense. That’s Simplicity and Explainability. Two things that make every analysis stand out and every data-driven decision easier to implement.

A bigger sample query

To an analyst, the ease of querying also comes with CTEs, when you don’t have to look through 2000 lines of a query to debug your output but check through the 20 to 40 lined CTE queries each, to find out where you may have made an error.

What should your focus be on, to improve in SQL?

SQL works best with reusable queries. So a couple of things that I would recommend out of experience (after you’ve got a hang of what you want to do with the data and the tables in the databases you’re working with):

  1. Know the basic tables and stick to them.
  2. As an analyst, you must learn how to use 2 things: Common Table Expressions & Window Functions, they’ll make your life easy.
  3. Try to re-use your queries as much as possible instead of writing things again and again.
  4. Learn new commands — HAVING, USING, PIVOT, UNPIVOT, ROW_NUMBER, etc.
  5. Automate using variables (if repeat querying is required over different periods or other dimensions etc.)
  6. Read the documentation and don’t make typos

What’s next?

In the next article, we’ll look into how to perform some artwork with data. I’ll give you my perspective on the visualization tool that’s making my life easier — Tableau!

The series can be found here:

Other articles that you may find interesting:

P.S. I don’t earn anything out of Medium since I write from India, but I’d be happier if you could add a few claps down here so it reaches a broader audience in case you find the series helpful. Happy learning!

Thank you for reading my article!

If you enjoyed it, please do leave a clap and I invite you to connect with me on LinkedIn for more insights and discussions!

LinkedIn:

https://www.linkedin.com/in/yash-gupta-dss/

You can also follow my blog for updates and more articles on the series. Looking forward to connecting with you!

--

--

Yash Gupta
Data Science Simplified

Lead Analyst at Lognormal Analytics and self-taught Data Scientist! Connect with me at - https://www.linkedin.com/in/yash-gupta-dss