How to perform data analysis using SQL?

Mohamad Ashour
6 min readJun 10, 2023

--

SQL is short for Structured Query Language and as the name suggests, it is a standardized language that is used to query or retrieve data from a relational database. Furthermore, it can also be used to perform various operations on the data such as filtering rows and columns as well as simple data manipulation.

Basic syntax: SELECT and FROM

There are two required ingredients in any SQL query: SELECT and FROM—and they have to be in that order. SELECT indicates which columns you'd like to view, and FROM identifies the table that they live in.

Let’s start by looking at a couple columns from the laptop price data:

If you want to select every column in a table, you can use * instead of the column names:

SELECT * FROM laptop_data;

if you want to have spaces in column names, you need to always refer to those columns in double quotes.

If you’d like your results to look a bit more presentable, you can rename columns to include spaces. For example, if you want the ScreenResolution column to appear as Resolution in the results, you will have to type:

SELECT ScreenResolution as Resolution , 
OpSys as operating_stsyem
FROM laptop_data;

SQL LIMIT

As you might expect, the limit restricts how many rows the SQL query returns. The default value is 100; when this box is checked, it’s telling the database to only return the first 100 rows of the query.

SELECT * from laptop_data LIMIT 200;

it will give you 200 rows.

200 total rows using limit.

SQL WHERE

The SQL WHERE clause

Start by running a SELECT statement to re-familiarize yourself with the laptop price used in this tutorial.

SELECT * FROM laptop_price

Once you know how to view some data using SELECT and FROM, the next step is filtering the data using the WHERE clause. Here's what it looks like:

SELECT * from laptop_data WHERE Company = "Apple";

Note: the clauses always need to be in this order: SELECT, FROM, WHERE.

How does WHERE work?

The SQL WHERE clause works in a plain-English way: the above query does the same thing as SELECT * FROM laptop_price, except that the results will only include rows where the company column contains the value Apple.

SQL Comparison Operators

Comparison operators on numerical data

The most basic way to filter data is using comparison operators. The easiest way to understand them is to start by looking at a list of them:

Equal to=

Not equal to<> or !=

Greater than>

Less than<

Greater than or equal to>=

Less than or equal to<=

These comparison operators make the most sense when applied to numerical columns. For example, let’s use > to return only the rows where the price produced more than 30,000 (remember, the units in the data are already in thousands):

SELECT * 
from laptop_data
WHERE Price > 30000;

For example, let’s use != to return only the rows where the the Company not qual Apple :

SELECT * 
FROM laptop_data
WHERE Company != "Apple";

Arithmetic in SQL

You can perform arithmetic in SQL using the same operators you would in Excel: +, -, *, /. However, in SQL you can only perform arithmetic across columns on values in a given row. To clarify, you can only add values in multiple columns from the same row together using +—if you want to add values across multiple rows, you'll need to use aggregate functions, which are covered in the Intermediate SQL section of this tutorial.

SELECT Company , Price , Price *0.2 as price_after_tax 
FROM laptop_data;

SQL IS NULL

The IS NULL operator.

IS NULL is a logical operator in SQL that allows you to exclude rows with missing data from your results.

Some tables contain null values — cells with no data in them at all. This can be confusing for heavy Excel users, because the difference between a cell having no data and a cell containing a space isn’t meaningful in Excel.

SELECT * 
FROM laptop_data
WHERE Company is null;

WHERE artist = NULL will not work—you can't perform arithmetic on null values.

SQL ORDER BY

Sorting data with SQL ORDER BY

Once you’ve learned how to filter data, it’s time to learn how to sort data. The ORDER BY clause allows you to reorder your results based on the data in one or more columns. First, take a look at how the table is ordered by default:

SELECT * 
FROM laptop_data
ORDER by Price;
SELECT * 
FROM laptop_data
WHERE Company = "Apple"
ORDER by Price;

Ordering data by multiple columns

You can also order by mutiple columns. This is particularly useful if your data falls into categories and you’d like to organize rows by date, for example, but keep all of the results within a given category together.

SELECT * 
FROM laptop_data
WHERE Company = "Apple"
ORDER by Price DESC , Inches;

Finally, you can make your life a little easier by substituting numbers for column names in the ORDER BY clause. The numbers will correspond to the order in which you list columns in the SELECT clause.

SELECT * 
FROM laptop_data
WHERE Company = "Apple"
ORDER by 3;

3 is the number of Inches column.

Using comments

You can “comment out” pieces of code by adding combinations of characters. In other words, you can specify parts of your query that will not actually be treated like SQL code. It can be helpful to include comments that explain your thinking so that you can easily remember what you intended to do if you ever want to revisit your work. Commenting can also be useful if you want to test variations on your query while keeping all of your code intact.

You can use-- (two dashes) to comment out everything to the right of them on a given line:

SELECT * -- this is a comment 
FROM laptop_data
WHERE Company = "Apple"
ORDER by 3;

You can also leave comments across multiple lines using /* to begin the comment and */ to close it:

/* Here's a comment so long and descriptive that
it could only fit on multiple lines. Fortunately,
it, too, will not affect how this code runs. */
SELECT * -- this is a comment
FROM laptop_data
WHERE Company = "Apple"
ORDER by Price;

What’s next?

Congrats on completing the Basic SQL tutorial!

See you in the next article, we will discuss the intermediate SQL functions like aggregation functions and joins.

--

--

Mohamad Ashour

Mainly interested in the field of machine learning and data analysis with strong knowledge of many programming languages, data processing, and data mining