Data Stash
Published in

Data Stash

SQL for Data Analytics-1

Hello everyone,

Before starting this blog I want to give my quick introduction, I’m Nilay Chauhan currently pursuing a bachelor’s in computer engineering and I’m learning Data Science. I have always believed in learn, share, lead, and grow.

“If you want to be a good Data Scientist, Then first be the best Data Analyst”

Being a Data scientist you are supposed to have a great understanding of data, which will help you get high accuracy of your model.

According to me, the First step of becoming a Data Scientist is to learn Data Analytics using SQL. After that, you can go on and start learning data analytics using R, SAS, or Python. Because at some point in your life you will need the help of SQL for data analysis. Many big companies use SQL for Data Analytics.

Now, What is SQL?

SQL (Structured Query Language) is a programming language designed for managing data in a relational database. It’s been around since the 1970s and is the most common method of accessing data in databases today. SQL has a variety of functions that allow its users to read, manipulate, and change data.

There are five major advantages of using a traditional relational database which we interact with using SQL:

  1. SQL is easy to understand and learn.
  2. Traditional databases allow users to access data directly.
  3. Traditional databases allow users to audit and replicate the data.
  4. SQL is a great tool for analysing multiple tables at once.
  5. SQL is great for performing the types of aggregations that you might normally do in an Excel pivot table — sums, counts, minimums and maximums, etc. — but over much larger datasets and on multiple tables at the same time.

Why Businesses like Databases?

  1. Data Integrity is ensured — like the data you want the users to enter is entered and only certain people have permissions to enter data into the databases.
  2. Data can be accessed quickly — it allows you to obtain results quickly from the database.
  3. Data is easily shared — many people can access the data and use the data at the same time.

Some Important points about data stored in SQL (Traditional Relational) database:

  1. Data is stored in tables and you can think of those tables like excel spreadsheets.
  2. In any column, all the data must match in terms of the same datatype.
  3. Consistent column types are one of the main reasons for working with databases is fast.

Now, Let’s start with some of the basic SQL Commands:

CREATE TABLE

It is a statement which is used to create a new table in the database.

CREATE TABLE table_name (

column1 datatype,
column2 datatype,
column3 datatype,
….
);

DROP TABLE

It is a statement which is used to remove a table from the database.

DROP TABLE table_name;

SELECT/FROM

SELECT indicates which column you want to be given the data for.

FROM specifies from which table you want to select the columns.

SELECT * FROM table_name

* indicates all the columns

(SQL queries are not case sensitive)

LIMIT

The LIMIT statement is useful when you want to see just the first few rows of the table. This can be much faster than loading the entire database.

The LIMIT statement is always the very last part of the query.

An example of showing the first 5 rows showing all the columns:

SELECT *

FROM table_name

LIMIT 5;

ORDER BY

The ORDER BY statement allows us to sort our results using the data in any columns in ascending or descending order.

The ORDER BY keyword sorts the records in ascending order by default.

If you want to sort the records in descending order, use the DESC keyword.

SELECT *

FROM table_name

ORDER BY column_1 ASC|DESC;

WHERE

Using the WHERE statement we can display the result(Subset of tables) based on conditions that must be met.

You can think of this as filtering the data.

Common symbols used in WHERE are:

>, <, >=, <=, =, !=

You can also use WHERE statement with non-numeric data and while doing that you need to wrap the text-data with single-quotes.

When using it with non-numeric we use LIKE, NOT or IN operators.

SELECT *
FROM table_name
WHERE condition;

DERIVED COLUMNS

If you create a new column that is a combination of existing columns it is known as a derived column (or “calculated” or “computed” column). Usually, you want to give a name, or “alias,” to your new column using the AS keyword.

This derived column, and its alias, are generally only temporary, existing just for the duration of your query. The next time you run a query and access this table, the new column will not be there.

SELECT (column_1+column_2) AS total

FROM table_name;

LOGICAL OPERATORS

LIKE

This LIKE operator allows users to perform operations similar to using WHERE and =, but for cases when you might not know exactly what you are looking for.

SELECT *

FROM table_name

WHERE column_1 LIKE ‘C%’;

IN

This IN operator allows users to perform operations similar to using WHERE and =, but for more than one condition.

SELECT *

FROM table_name

WHERE column_1 IN (‘name1’, ’name2’);

NOT

This NOT operator is used with IN and LIKE to select all of the rows NOT LIKE or NOT IN a certain condition.

SELECT *

FROM table_name

WHERE column_1 NOT IN (‘name1’, ’name2’);

AND

AND operator shows a record if all the conditions separated by AND are TRUE.

SELECT *
FROM table_name
WHERE condition1 AND condition2 AND condition3;

OR

OR operator displays a record if any of the conditions separated by OR is TRUE.

SELECT *
FROM table_name
WHERE condition1 OR condition2 OR condition3;

BETWEEN

WE can make a cleaner statement using BETWEEN than we can be using AND. Particularly this is true when we are using the same column for different parts of our AND statement.

Instead of writing :

WHERE column >= 5 AND column <= 10

we can instead write:

WHERE column BETWEEN 5 AND 10

This was the first blog of my SQL FOR DATA ANALYTICS series, In the next part, I will be writing about SQL JOINS.

THANK YOU!!

BE SAFE!!

If you are also learning Data Science and if you want to write blogs in my publication you can DM me at LinkedIn/ Instagram or you can mail me.

You can connect with me at:

https://www.linkedin.com/in/nilay-chauhan-2756b4153/

--

--

--

A Partnership in Discovery

Recommended from Medium

Understanding AWS SQS — A Message Queue Service on cloud

Monolith vs Microservices Arch

Java Optional Is Not So Obvious

First solving in ‘leetcode’ site

Debt (Tech & Life) —Should you Worry?

5 Minute DevOps: Agile Rehab

Don’t tweet during intimate times.

Non Blocking I/O using libuv

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Nilay Chauhan

Nilay Chauhan

Data Scientist at Google — Kaggle team

More from Medium

Customer Segmentation using K-Means

DATA ANALYSIS WITH SQL Part 1.

Big Data Tools for Data Scientists

Data Transformation in Machine Learning