The Startup
Published in

The Startup

SQL Approach to Perform Data Analysis and Data Science Part-1

What is SQL ?

SQL stands for structured query language A query language is a sort of programming language designed to facilitate retrieving specific information from databases .

  • SQL is a RDBMS (Relational Database Management Systems) which allows to store, retrieve or manipulate data, but in a more efficient way than DBMS.
  • Each column in a table is know as attribute and each row in table is know as record/tuple.

SQL can be devided 5 broad categories as follows;

Data Definition Language (DDL)

Data Manipulation Language (DML)

Data Query Language (DQL)

Data Control Language (DCL)

Transactional Control Language (TCL)

We will mainly be focusing more on Data Query Language (DQL) Query as it is widely used by Data scientists and data analyst to extract and analyse data:

Lets take an imaginary dataset ( employee_details ) for further explanation;

The dataset contains 21 rows and 9 columns

Dataset consist of following columns:

  • Employee_id: Unique id of each employee.
  • First_name: First name of the person.
  • Last_name: Last name of the person.
  • Email: Registered email address.
  • Phone_number: registered phone number.
  • Hire_date:date at which employee was hired.
  • Job_id: Department the employee is working in.
  • Manager_id: Manager_ID of person under which the employee is working.
  • Department_id: department_id at which employee is working.

SELECT CLAUSE

Select statement is used to fetch data from the table.

If you want to select all the fields available in the table, use the following syntax

SELECT * FROM TABLE_NAME;

where * is used to fetch all the columns from dataset and From Table_name Name of the table from where data set needs to be retrieved.

The following SQL statement selects the “column1”and “column2” from the table:

SELECT column1,column2,… FROM TABLE_NAME;

  • Display employee_id , first name , last name , salary of employees ?

WHERE CLAUSE

WHERE clause is used to filter records in table , it will extract only those rows where the condition satisfies.

SELECT * FROM table_name WHERE condition;

  • Display employee_id , first name , last name , salary of employees where manager_id = 100 ?

Operators in where clause

  • Comparison Operators ‘ =,<,>,>=,≤ ’
  • BETWEEN — between a certain range
  • LIKE — Pattern Matching
  • IN — Specify multiple possible values for a column

Compound Search Conditions

  • The compound conditions are made up of multiple simple conditions connected by AND or OR
  • There is no limit to the number of simple conditions that can be present in a single query

SELECT * FROM TABLE_NAME WHERE condition1 AND condition2 OR condition3

  • Display all records where job_id is ‘it_prog’ , salary is greater than 20,000 or department_id = 90 ?

Table of Contents

Steps to be taken for analysis of dataset in SQL

  1. Let’s First Understand the Dataset
  2. Aggregation Functions
  3. Slicing Data
  4. Limiting Data
  5. Sorting Data
  6. Filtering Patterns
  7. Groupings, Rolling up Data and Filtering in Groups

DISTINCT

The DISTINCT statement returns only distinct (unique) values.

  • Find out name of each department in the company ?

it also shows list of unique values in the column and no of unique value.

COUNT

The COUNT function returns the number of rows in the column or dataset.

  • Count number of employees in the company ?

Aggregation Functions

  • Aggregate Functions are used for calculations on multiple rows of a single column of a table and returning single value
  • The ISO standard defines 5 aggregate functions :

1. SUM

The SUM function gets total a set of values

SELECT SUM(column1) FROM table_name;

  • Find out total compensation paid by company to It_programmer ?

The resulted query displays the sum of salary of employees where job_id is IT_PROG

  • % — The percent sign represents zero, one, or multiple characters
  • _ — The underscore represents a single character

2. AVERAGE

The AVG function returns the average of a set of values

SELECT AVG(column1) FROM table_name;

  • Find out average salary of employees in the company ?

‘AS’ is used to rename a column or table name with an alias

3. MIN

Shows the lowest value of column using the MIN function.

SELECT MIN(column1) FROM table_name;

  • Find out minimum salary paid by the company to its employee ?

4. MAX

Shows the maximum value of column using the MAX function.

SELECT MAX(column1) FROM table_name;

  • Find out maximum salary paid by the company to its employee ?

5. COUNT

Count the total number of , you can use the count function as follows

SELECT COUNT(column1) FROM table_name;

  • Count number of employees in the company ?

Limiting Data

LIMIT

The LIMIT statement is used to retrieve records from one or more tables in a database and limit the number of records returned based on a limit value.

SELECT * FROM table_name LIMIT [no of rows];

  • Show top 5 record of employee_detail table ?

OFFSET

The offset keyword allows you to offset(skip) the no. of record returned by the LIMIT clause.

SELECT * FROM table_name LIMIT [no of rows] OFFSET [no of rows];

  • Show top 5 results of table after skiping first 3 rows?

Sorting Data

Order By

The ORDER BY keyword is used to sort the data in ascending(default) or descending use desc.

SELECT * FROM table_name order by column1;

  • Sort names of employees in descending order ?

Group By Clause

The GROUP BY statement groups rows that have the same values into summary , like “Find the number of employees managed by managers?”.

The GROUP BY is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns.

SELECT column1 FROM table_name GROUP BY column_name ;

  • Find the number of employees managed by managers along with there manager_id?
  • WHERE clause with conditions can be issued before the group-by clause in order to filter the records and then apply Group By feature

Having Clause

The HAVING clause is used because WHERE keyword could not be used with aggregate functions after using group by .

SELECT column1 FROM table_name GROUP BY column_name HAVING condition;

  • Find the department where the collective salary is more than 35000 each using aggregation with having clause as below ?

I am really glad we made it so far. In this article, I made use of MySQL workbench 8.0 to establish the examples.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store