SQL Approach to Perform Data Analysis and Data Science Part-1

Vervit Khandelwal
Jan 21 · 6 min read

What is SQL ?

SQL stands for structured query language A query language is a sort of programming language designed to facilitate retrieving specific information from databases .

  • SQL is a RDBMS (Relational Database Management Systems) which allows to store, retrieve or manipulate data, but in a more efficient way than DBMS.
  • Each column in a table is know as attribute and each row in table is know as record/tuple.

SQL can be devided 5 broad categories as follows;

Data Definition Language (DDL)

Data Manipulation Language (DML)

Data Query Language (DQL)

Data Control Language (DCL)

Transactional Control Language (TCL)

We will mainly be focusing more on Data Query Language (DQL) Query as it is widely used by Data scientists and data analyst to extract and analyse data:

The dataset contains 21 rows and 9 columns

Dataset consist of following columns:

  • Employee_id: Unique id of each employee.
  • First_name: First name of the person.
  • Last_name: Last name of the person.
  • Email: Registered email address.
  • Phone_number: registered phone number.
  • Hire_date:date at which employee was hired.
  • Job_id: Department the employee is working in.
  • Manager_id: Manager_ID of person under which the employee is working.
  • Department_id: department_id at which employee is working.

Select statement is used to fetch data from the table.

If you want to select all the fields available in the table, use the following syntax

SELECT * FROM TABLE_NAME;

where * is used to fetch all the columns from dataset and From Table_name Name of the table from where data set needs to be retrieved.

The following SQL statement selects the “column1”and “column2” from the table:

SELECT column1,column2,… FROM TABLE_NAME;

  • Display employee_id , first name , last name , salary of employees ?

WHERE clause is used to filter records in table , it will extract only those rows where the condition satisfies.

SELECT * FROM table_name WHERE condition;

  • Display employee_id , first name , last name , salary of employees where manager_id = 100 ?
  • Comparison Operators ‘ =,<,>,>=,≤ ’
  • BETWEEN — between a certain range
  • LIKE — Pattern Matching
  • IN — Specify multiple possible values for a column
  • The compound conditions are made up of multiple simple conditions connected by AND or OR
  • There is no limit to the number of simple conditions that can be present in a single query

SELECT * FROM TABLE_NAME WHERE condition1 AND condition2 OR condition3

  • Display all records where job_id is ‘it_prog’ , salary is greater than 20,000 or department_id = 90 ?

Table of Contents

  1. Let’s First Understand the Dataset
  2. Aggregation Functions
  3. Slicing Data
  4. Limiting Data
  5. Sorting Data
  6. Filtering Patterns
  7. Groupings, Rolling up Data and Filtering in Groups

The DISTINCT statement returns only distinct (unique) values.

  • Find out name of each department in the company ?

it also shows list of unique values in the column and no of unique value.

The COUNT function returns the number of rows in the column or dataset.

  • Count number of employees in the company ?

Aggregation Functions

  • Aggregate Functions are used for calculations on multiple rows of a single column of a table and returning single value
  • The ISO standard defines 5 aggregate functions :

The SUM function gets total a set of values

SELECT SUM(column1) FROM table_name;

  • Find out total compensation paid by company to It_programmer ?

The resulted query displays the sum of salary of employees where job_id is IT_PROG

  • % — The percent sign represents zero, one, or multiple characters
  • _ — The underscore represents a single character

The AVG function returns the average of a set of values

SELECT AVG(column1) FROM table_name;

  • Find out average salary of employees in the company ?

Shows the lowest value of column using the MIN function.

SELECT MIN(column1) FROM table_name;

  • Find out minimum salary paid by the company to its employee ?

Shows the maximum value of column using the MAX function.

SELECT MAX(column1) FROM table_name;

  • Find out maximum salary paid by the company to its employee ?

Count the total number of , you can use the count function as follows

SELECT COUNT(column1) FROM table_name;

  • Count number of employees in the company ?

Limiting Data

LIMIT

The LIMIT statement is used to retrieve records from one or more tables in a database and limit the number of records returned based on a limit value.

SELECT * FROM table_name LIMIT [no of rows];

  • Show top 5 record of employee_detail table ?

OFFSET

The offset keyword allows you to offset(skip) the no. of record returned by the LIMIT clause.

SELECT * FROM table_name LIMIT [no of rows] OFFSET [no of rows];

  • Show top 5 results of table after skiping first 3 rows?

Sorting Data

Order By

The ORDER BY keyword is used to sort the data in ascending(default) or descending use desc.

SELECT * FROM table_name order by column1;

  • Sort names of employees in descending order ?

Group By Clause

The GROUP BY statement groups rows that have the same values into summary , like “Find the number of employees managed by managers?”.

The GROUP BY is often used with aggregate functions (COUNT, MAX, MIN, SUM, AVG) to group the result-set by one or more columns.

SELECT column1 FROM table_name GROUP BY column_name ;

  • Find the number of employees managed by managers along with there manager_id?
  • WHERE clause with conditions can be issued before the group-by clause in order to filter the records and then apply Group By feature

Having Clause

The HAVING clause is used because WHERE keyword could not be used with aggregate functions after using group by .

SELECT column1 FROM table_name GROUP BY column_name HAVING condition;

  • Find the department where the collective salary is more than 35000 each using aggregation with having clause as below ?

I am really glad we made it so far. In this article, I made use of MySQL workbench 8.0 to establish the examples.

The Startup

Get smarter at building your thing. Join The Startup’s +792K followers.

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +792K followers.

Vervit Khandelwal

Written by

Aspiring Data Scientist Linkedin: https://www.linkedin.com/in/vervit-khandelwal/

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +792K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store