Getting started with SQL! (Part 1)
As a data scientist or a data analyst, to analyze and predict is a secondary task, the most important skill is to manage, understand and process data. SQL is one way to make friends with data and databases. SQL helps us to quickly fetch and understand complex data. This blog will help you get started with the basics of SQL.
To begin with, let’s go through some must-know terminologies to understand database management systems better..
- Data is a collection of facts or set of values in the form of text, numbers, pictures, audio or videos. Data is a very important part of any organization and so needs to be secure and easily accessible.
- Database is called a repository of data. There are various types of databases, the one we are interested in is a relational database.
- Database management system (DBMS) is an application used to manage databases.
- Relational database stores data in the form of relations or tables. Tables consist of columns that represent properties or attributes of the entity and rows are the records or observations.
- Relational database management system (RDBMS) as the name suggests is an application used to manage relational databases and can be queried.
- Schema is a logical structure of a database that represents how data is stored and organized, and how the database entities are related. Schema does not physically store data but only gives us an overview of the shape, constraints and relationships of data.
- Keys are constraints or limitations applied to columns of a table. They are used to uniquely identify and fetch records from the table.
1. Candidate keys- column(s) that can uniquely identify records and can be considered as the primary key.
2. Primary key- key/column that has a unique value for each record. It cannot have null values. Each table can have only one primary key since it gives a unique identity to each record.
3. Alternate keys- candidate key(s) that were not taken as a primary key.
4. Foreign keys- column(s) one table that refers to the primary key of other.
Structured Query Language (SQL) is a standard language to interact with relational database management systems.
SQL commands can be classified into:
- Data Definition Language — DDL commands help define the structure of a database or its objects. They include operations like creating, modifying or deleting databases or tables. CREATE, ALTER, DROP, RENAME, TRUNCATE
- Data Manipulation Language — DML commands are used to modifications in a database. INSERT, DELETE, UPDATE
- Data Query Language — DQL statements are used to perform queries on data or simply to fetch data based on the required conditions. SELECT
- Data Control Language — deals with the permissions of a database system. DCL commands help grant or take back access from the database user. GRANT, REVOKE
- Transaction Control Language — TCL commands deal with transactions within a database and can be used only with the DML commands. COMMIT, ROLLBACK, SAVEPOINT
Let’s move forward with some basic implementation of SQL commands..
- Create a database
CREATE DATABASE database_name;
To check whether the database was created or to check which databases already exist, use the following-
SHOW DATABASES;
Once the database is created, run the following query to utilize a particular database in the succeeding queries-
USE database_name;
3. Create table(s)
One or more tables can be created in a database.
CREATE TABLE table_name(column_name_1 data_type1 constraint1,
column_name_2 data_type2 constraint2,
column_name_3 data_type3 constraint3,
...
);
Datatypes define the type of values that will be stored in the column. We can use datatypes like — int, float, decimal, double, char, varchar, datetime, year, etc.
Constraints are restrictions placed on column(s) to limit the type of values that can be stored in it. Commonly used constraints are — not null, unique, primary key, foreign key, default, index
To check if the table was created or to check the available tables in a database, use the following query-
SHOW TABLES;
2. Insert values
INSERT INTO table_name (column_name1, column_name2, column_name3..)
VALUES (value1, value2, value3..);
Above mentioned is one way to insert new records to the table.
Another way is given below, without using column names. While using this method, ensure that you use the same order as the columns in the table.
INSERT INTO table_name VALUES (value1, value2, value3..);
3. Select query
SELECT * FROM table_name;
Here, * is used to select or view all the columns in the table.
To view specific columns from the table use-
SELECT column_name1, column_name2.. FROM table_name;
4. Describe table
DESCRIBE table_name;
DESC table_name;
This command gives us an overview of the table structure — name of the column/field, type of data stored in the column, if the column can store null values, key type
5. Update records
To modify the existing records in the table we can use the update statement.
UPDATE table_name SET column_name1 = value1, column_name2 =value2,..
WHERE condition;
WHERE clause in the above statement specifies which records need to be updated. Don’t forget the WHERE clause if you need some specific changes, else it’ll update all records in the column.
6. Delete records
DELETE FROM table_name WHERE condition;
This will delete existing records from the table as per the specified condition.
Points to remember while writing SQL queries:
- Every query should end with a semicolon (;)
- SQL is NOT case-sensitive
Thank you for reading! Stay connected for the next blog in which we’ll take a look at retrieving and filtering data using SQL.