Everything you need to know about SQL for Data Science

akansha khandelwal
5 min readSep 15, 2021

--

Understanding and learning SQL can easily get complicated and overwhelming. In this blog, I will unpack the SQL concepts into understandable bits that will help you get up to speed on SQL in under 5 minutes.

SQL, as the name suggests, is a structured query language that allows us to talk to the database server. However, before we dive into SQL, let’s understand how SQL fits into data science and why spending time on it is worthwhile.

At the heart of any data science project is data and the capability to work with large datasets. This is where SQL helps. Most of the complex data in enterprises reside in relational databases. And while there are newer alternatives, the majority of enterprises today continue to use SQL. SQL is used as a front-end to many databases like MySQL, PostgreSQL, Oracle, Sybase.

With SQL, we create, maintain, and retrieve data from these databases. Further, we can integrate SQL with languages like Python and R to analyze the data and create desired models.

Now that we understand where SQL fits in the larger data science landscape, let’s dive in to understand the construct and the commands that we can use to build and work with databases.

SQL commands can be broadly categorized into the following categories. These help us create databases, draw insights from multiple sources and extract usable data.

  • Data Definition Language(DDL) — These commands help us create a SQL database and build logical relationships based on the schema we select. These are mostly one-time activities but it’s good to be familiar with these commands.
  • Data Manipulation Language(DML) — These help us is changing the data itself which resides in the database
  • Data Control Language (DCL) — We use this to grant and revoke access to the database and its contents

Now let’s look into each one of them in detail.

Data Definition Language (DDL)

They are a set of commands which are used to create and modify databases.

  1. CREATE command

Create Database:

For example, to create a company database, use the below command

create database company;

Create Table — to create tables in the database

Syntax:

CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,….]);

To create an employee table, we can use below query

CREATE TABLE `employees` (`employeeNumber` varchar(5) NOT NULL,`lastName` varchar(12) DEFAULT NULL,`firstName` varchar(12) DEFAULT NULL,`reportsTo` varchar(5) DEFAULT NULL,`jobTitle` varchar(25) DEFAULT NULL,PRIMARY KEY (`employeeNumber`));

As you might see in the above example, we have defined each column in the table with a data type.

Below are the data types that SQL supports

Numeric

INTEGER, SMALLINT, BIGINTNUMERIC(w,d), DECIMAL(w,d) — numbers with width w and d decimal placesREAL, DOUBLE PRECISION — machine and database dependentFLOAT(n) — floating-point number with n binary digits of precision

Character

CHARACTER(L) — a fixed-length character of length LCHARACTER VARYING(L) or VARCHAR(L) — supports the maximum length of L

Binary

BIT(L), BIT VARYING(L) — like corresponding charactersBINARY LARGE OBJECT(L) or BLOB(L)

Temporal

DATETIMETIMESTAMP

2. ALTER command

Alter commands are used to add/update/remove columns from the table.

Adding columns

Syntax :

ALTER TABLE table_name ADD column_name COLUMN-definition;

Eg: If we want to add an email address column to the table employee:

ALTER table employees add column email varchar(50);

Modifying columns

Syntax:

ALTER TABLE MODIFY(COLUMN DEFINITION….);

Eg: If we want to change the data type of the email column to varchar(255)

ALTER table employees add column email varchar(50);

Dropping columns

Syntax :

ALTER TABLE table_name DROP COLUMN column_name;

Eg: If we want to delete the email column

ALTER table employees drop column email

3. DROP command

Drop command is used to delete tables from the database

Syntax : DROP TABLE table_name;

Eg, to drop employees table

DROP TABLE employees

Data Manipulation Language (DML)

DML commands are responsible for performing all types of data modification in a database.

1.INSERT command

Insert statements are used to insert data into the tables.

Syntax:

INSERT INTO table_name (col1, col2, col3,…. col N) VALUES (value1, value2, value3, …. valueN);

alternatively,

INSERT INTO table_name VALUES (value1, value2, value3, …. valueN);

Eg, to insert data into employees table:

insert into employeesvalues(1102, ‘Alex’, ‘Gerard’, ‘1056’, ‘Sales Manager’),(1103, ‘Mary’, ‘Thomas’, ‘1102’, ‘Sales Executive’);

2. UPDATE command

Update statements are used to update or modify the value of a column in the table.

Syntax:

UPDATE table_name SET [column_name1= value1,…column_nameN = valueN] [WHERE CONDITION]

For example to update jobTitle of an employee we can use the below query:

UPDATE employeesSET jobTitle=’Senior Sales Manager’WHERE employeeNumber=1102;

3. DELETE command

Delete is used to remove one or more rows from a table.

Syntax:

DELETE FROM table_name [WHERE condition];

For example:

DELETE FROM employeesWHERE employeeNumber=1102;

4.SELECT Command

The select command is used to retrieve data from tables.

Syntax:

select (attributes)from (table)where (filter_condition)group by (attributes_to_be_grouped_upon)having (filter_condition_on_grouped_values)order by (values)limit (no_of_values_to_display);

Let’s look at few examples:

  • To display the entire table
select * from employees
  • To display only employee’s first name
select firstName from employees
  • To display details of employee whose employeeNumber is 1102
select * from employees where employeeNumber=1102;
  • We can also aggregate values, perform groupby and orderby operations in Select queries
select firstName from employees order by firstName;select country, count(customerNumber) from customers group by country;

Data Control Language (DCL)

DCL commands are used for granting rights and permissions to the database system. The DCL commands are:

1.Grant — The grant command grants a user the right to access the database.

Syntax:

GRANT <privileges> ON <database>.<object> TO ‘<user>’@’<host>’;

2.Revoke — The revoke command withdraws the access rights given by the grant command.

Syntax:

REVOKE <privileges> ON <database>.<object> FROM’<user>’@’<host>’;

Conclusion

While SQL dates back to the 1970s when Edgar Codd wrote about organizing data into databases, the language continues to remain relevant. It has passed the test of time and almost every organization utilizes SQL or a derivative of SQL. This blog intends to help you begin your journey to learn SQL. To prosper in the world of data, ensure you have SQL in your data science skill arsenal too.

--

--