SQL for Data Analytics-1
Before starting this blog I want to give my quick introduction, I’m Nilay Chauhan currently pursuing a bachelor’s in computer engineering and I’m learning Data Science. I have always believed in learn, share, lead, and grow.
“If you want to be a good Data Scientist, Then first be the best Data Analyst”
Being a Data scientist you are supposed to have a great understanding of data, which will help you get high accuracy of your model.
According to me, the First step of becoming a Data Scientist is to learn Data Analytics using SQL. After that, you can go on and start learning data analytics using R, SAS, or Python. Because at some point in your life you will need the help of SQL for data analysis. Many big companies use SQL for Data Analytics.
Now, What is SQL?
SQL (Structured Query Language) is a programming language designed for managing data in a relational database. It’s been around since the 1970s and is the most common method of accessing data in databases today. SQL has a variety of functions that allow its users to read, manipulate, and change data.
There are five major advantages of using a traditional relational database which we interact with using SQL:
- SQL is easy to understand and learn.
- Traditional databases allow users to access data directly.
- Traditional databases allow users to audit and replicate the data.
- SQL is a great tool for analysing multiple tables at once.
- SQL is great for performing the types of aggregations that you might normally do in an Excel pivot table — sums, counts, minimums and maximums, etc. — but over much larger datasets and on multiple tables at the same time.
Why Businesses like Databases?
- Data Integrity is ensured — like the data you want the users to enter is entered and only certain people have permissions to enter data into the databases.
- Data can be accessed quickly — it allows you to obtain results quickly from the database.
- Data is easily shared — many people can access the data and use the data at the same time.
Some Important points about data stored in SQL (Traditional Relational) database:
- Data is stored in tables and you can think of those tables like excel spreadsheets.
- In any column, all the data must match in terms of the same datatype.
- Consistent column types are one of the main reasons for working with databases is fast.
Now, Let’s start with some of the basic SQL Commands:
It is a statement which is used to create a new table in the database.
CREATE TABLE table_name (
It is a statement which is used to remove a table from the database.
DROP TABLE table_name;
SELECT indicates which column you want to be given the data for.
FROM specifies from which table you want to select the columns.
SELECT * FROM table_name
* indicates all the columns
(SQL queries are not case sensitive)
The LIMIT statement is useful when you want to see just the first few rows of the table. This can be much faster than loading the entire database.
The LIMIT statement is always the very last part of the query.
An example of showing the first 5 rows showing all the columns:
The ORDER BY statement allows us to sort our results using the data in any columns in ascending or descending order.
The ORDER BY keyword sorts the records in ascending order by default.
If you want to sort the records in descending order, use the DESC keyword.
ORDER BY column_1 ASC|DESC;
Using the WHERE statement we can display the result(Subset of tables) based on conditions that must be met.
You can think of this as filtering the data.
Common symbols used in WHERE are:
>, <, >=, <=, =, !=
You can also use WHERE statement with non-numeric data and while doing that you need to wrap the text-data with single-quotes.
When using it with non-numeric we use LIKE, NOT or IN operators.
If you create a new column that is a combination of existing columns it is known as a derived column (or “calculated” or “computed” column). Usually, you want to give a name, or “alias,” to your new column using the AS keyword.
This derived column, and its alias, are generally only temporary, existing just for the duration of your query. The next time you run a query and access this table, the new column will not be there.
SELECT (column_1+column_2) AS total
This LIKE operator allows users to perform operations similar to using WHERE and =, but for cases when you might not know exactly what you are looking for.
WHERE column_1 LIKE ‘C%’;
This IN operator allows users to perform operations similar to using WHERE and =, but for more than one condition.
WHERE column_1 IN (‘name1’, ’name2’);
This NOT operator is used with IN and LIKE to select all of the rows NOT LIKE or NOT IN a certain condition.
WHERE column_1 NOT IN (‘name1’, ’name2’);
AND operator shows a record if all the conditions separated by AND are TRUE.
WHERE condition1 AND condition2 AND condition3;
OR operator displays a record if any of the conditions separated by OR is TRUE.
WHERE condition1 OR condition2 OR condition3;
WE can make a cleaner statement using BETWEEN than we can be using AND. Particularly this is true when we are using the same column for different parts of our AND statement.
Instead of writing :
WHERE column >= 5 AND column <= 10
we can instead write:
WHERE column BETWEEN 5 AND 10
This was the first blog of my SQL FOR DATA ANALYTICS series, In the next part, I will be writing about SQL JOINS.
If you are also learning Data Science and if you want to write blogs in my publication you can DM me at LinkedIn/ Instagram or you can mail me.
You can connect with me at: