Why do analysts like SQL

Dominic Imbuga
DevCNairobi
Published in
3 min readJul 8, 2021

Entity Relationship Diagram for Kiwanda database

On my transitioning journey from software engineering to machine learning engineering I met data. What is the first thing first in machine learning? Yes you guessed right, data. Companies today capture tons of data and expect employees in all departments to use it methodically in their decision making.

A lot of this data lives in databases and most of the worlds databases are accessed using Structured Query Language which is abbreviated as SQL. It has been around since the 1970s and it’s the most common method of accessing data and the database today.

SQL is a tool, it has a variety of functions that allows it's us to read, manipulate and change data. In this article we will primary focus on part of SQL used in data analysis, because SQL has so many uses in software engineering.

You don't need to be an analyst or have any prior experience with SQL knowledge in order to get value from this article, in infact this article is accessible to folks who work in marketing, finance, operations or anywhere in between .

There are lots of ways to do this, but ask any data scientist or analyst and they will tell you that bulk of their work happens in SQL.

SQL is popular for data analysis for a few reasons,

  • It's semantically easy to understand and learn
  • It can be used to access large amounts of data directly where it is stored, you don't have to copy data into other applications to view it and you don't have to worry about your spreadsheet application crushing because of data overload.
  • It's easy to audit and replicate , In a spreadsheet tool like Excell you have to click in each cell to know how they are calculated, with SQL you can just read the query from top to bottom, nothing is hidden.
  • SQL can run queries on multiple tables at once a cross large datasets. It's great for peroming the types of aggregation you might normally do in an excell table; Counts, Sums, minimums and Maximums ..etc but Excell maxises out at just over 1 million rows, with SQL you can querry across billions of rows at time

Maybe most importantly, SQL is incredible flexible , especially when compared to the out of the box dashboard products like Google analytics, In Google analytics you may be able to answer questions like which pages on my site receives alot of traffic and where is it coming from ? But you can use SQL filtering capabilities to answer much deeper more complex questions like how many viewers returned to my site between one and three weeks after their first visit? And what typically brings them back?

This features make SQL one of the primary data analysis tools for both beginners and experts.

For example , Using SQL, let's use data from Kiwanda a company that sells paper.There are 50 sales reps spread across Kenya in 4 regions .They sell three types of paper regular, poster and glossy. Their clients are primary large fortune 100 companies whom they attract by advertising on Facebook, Twitter and Google .

Kiwanda is not a real company, We've fabricated it and all the data for the sake of this article.The questions we will answer using Kiwanda are meant to simulate real world problems.

Using SQL we will be able help Kiwanda answer tricky questions like ;

  • Which of their product line is worst performing ?
  • Which of their marketing channel they should make a greater investment in?

Next article let’s get our hands dirty using this datasets and PostgreSQL to help Kiwanda use it’s data, as you know data is the new oil!

--

--